From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id B0D3E38582B3 for ; Mon, 5 Sep 2022 08:54:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0D3E38582B3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ej1-x62c.google.com with SMTP id nc14so15650206ejc.4 for ; Mon, 05 Sep 2022 01:54:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=4wUb639RXiBKzRKFrjQvvvAV8r9Ta4s9NAuIeJNzsgk=; b=GHp70zi3k+rrN1eJuvRpvQ+T5iokuMoF5HFVDBStcraLM1dMrcLHvpzky6NjdBrFSs GX95UFuHftnwa+D8c+9rnr6IseSbvPpsOOFXu04UlU4FyiweDUUtKFA+8zDTQhZits/p Y8qRYRmmzYPsM3/9wusHpucFz/NqtimdUTfV1l1WVTDe6msRETg5MzkDuQ8AozIRw26p V3VJyRrsdQczRneScTO6DSmcKaYWbKnq78Qfm/HlhW8Fzbt21E53YOulJh6pdLn4vfs4 SQue/Xr3I/4S6/c8vRGPIIiFggHZjY0qrvEC63HVz114qjhkBeuKbD2YS7nG+/UCw2qm wzlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=4wUb639RXiBKzRKFrjQvvvAV8r9Ta4s9NAuIeJNzsgk=; b=T8s+Aj8Tr4ahJM60r5lDWb3CNsNkeHvnK070sVyz+LcLBnCfBYZql2uNYQzcbevU12 zkz9VtQ2DF/EAQugbm7P0ou7qsenWWbffQJk3iA/PO/WOdrXgr9UCQg0CQvl3LosLrdV KcEFXIfNxYrvF2i1OmKUEl5/EIJG6JtwEp9/4S9xQnhNgTRUACekW4N9x27iWazaYeiE A812yrzJhfK6OQPXoAIGJNk72HqEP1MtYmKwLDRE0ZAfyB4cvnWm+hRjTijwbvE+3kHJ TAfUnFo7m6lkHjGtNxN0EoOvGFbohPfdk6DPi6Y/D5lKl3yZoA7KhA0O9adSsiIFHxVB 7fVQ== X-Gm-Message-State: ACgBeo0w4DTkAhfMNc/oq/aA0xbRI4h49VSxzGpM28EGQwvDTjrTCQoJ 2RoriwFKX9RkNaCuS+VzkW7+pLw+cYEgxq/zUrDqmg== X-Google-Smtp-Source: AA6agR42vF+i5AU8F6dW0CYBLJB5EX7Ga0EDHl+wgG9eYMQyGWpGywUbXbpwKhnxUdjhE+QCr4UyGTd2yI3IOPV3SIY= X-Received: by 2002:a17:907:60c7:b0:731:148b:c515 with SMTP id hv7-20020a17090760c700b00731148bc515mr35615424ejc.724.1662368080327; Mon, 05 Sep 2022 01:54:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Mon, 5 Sep 2022 14:24:04 +0530 Message-ID: Subject: Re: ICE after folding svld1rq to vec_perm_expr duing forwprop To: Richard Biener Cc: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni wrote: > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > wrote: > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > wrote: > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener wrote: > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > wrote: > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener wrote: > > > > > >> > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > >> wrote: > > > > > >> > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener w>> > > > > > > > >> > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > >> > > + then allow permuting two vectors as long as: > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > >> > > + The intent is to permute input vectors, and > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > >> > > + > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > >> > > + { > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > >> > > + return NULL_TREE; > > > > > >> > > + } > > > > > >> > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > >> > > explicitely only the case of a period that's same as the > > > > > >> > > element count in the input vectors. > > > > > >> > > > > > > > >> > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > >> > > pp_space (pp); > > > > > >> > > } > > > > > >> > > } > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > >> > > + pp_string (pp, ", ... "); > > > > > >> > > pp_right_brace (pp); > > > > > >> > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > >> > Well, it got created for the following case after folding: > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > >> > { > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> > } > > > > > >> > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > >> > v = {a, b, c, d} > > > > > >> > lhs = VEC_PERM_EXPR > > > > > >> > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > >> > need_ctor is thus true: > > > > > >> > lhs = {a, b, c, d, ...} > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > >> > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > >> but do we actually make sure to do this duplication? > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > constructor looks pretty bad. > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > I guess these are 2 different issues: > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > if we go thru > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > in fold_vec_perm. > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > > > following semantics: > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > Yeah. > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > Does that look reasonable ? > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > width vectors > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > know the length of the VLA vector then the sizes could be different > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > time constant, > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > ones in PR106360. > > > > > > Does it look OK ? > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > instead? > > > > > > > > I think the point was they are not necssarily the same when we > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > here. > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > _78 = (void *) ivtmp.21_73; > > > _92 = MEM [(uint8_t *)_78]; > > > _91 = {_92, 0}; > > > vect__1.6_90 = VIEW_CONVERT_EXPR(_91); > > > _88 = MEM [(uint8_t *)_78 + 16B]; > > > _87 = {_88, 0}; > > > vect__1.7_86 = VIEW_CONVERT_EXPR(_87); > > > vect__1.8_85 = VEC_PERM_EXPR > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > hit assert in fold_vec_perm. > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > when we used res_type = lhs_type. > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > arg0 is fixed length, and op2 is VLA. > Hi, > Shall it be OK to commit the above patch, that changes res_type to > have same elem type as arg0, and length as op2 ? > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > sel.series_p (0, 1, 0, 1) -> op0 and, > sel.series_p (0, 1, nelts, 1) -> op1 > if len(type) == len(arg0) ? ping Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > Well, it didn't hold true for following case when op2 is VLA: > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > simplify_permutation. > > > > > > The vec_perm pattern hits: > > > (if (sel.series_p (0, 1, 0, 1)) > > > { op0; } > > > > > > and folds it to: > > > lhs = {1, 2, 3, 4} > > > > > > which results in error during verify_gimple since lhs and rhs types > > > differ (VNx4SI, V4SI). > > > > > > I suppose we want to do the transforms: > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > only if input vectors and sel have same length ? > > > > > > Thanks, > > > Prathamesh > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -7845,6 +7845,12 @@ and, > > > > (with > > > > { > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > + > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > + > > > > machine_mode result_mode = TYPE_MODE (type); > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Richard