From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id 573923858D32 for ; Mon, 5 Sep 2022 11:03:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 573923858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x62b.google.com with SMTP id qh18so16316621ejb.7 for ; Mon, 05 Sep 2022 04:03:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=fsx4BUEcm1Ekj/Z85hno/waSHjea1BhESiOzxtzW0BM=; b=e5IxsMGf5YlXHI2ISE8ZthqdFW1yvoAQHpG/FG1GoNpwaR9efsDRYI4L0tZYFTynKg B3dSWVNA8tRK5lISzICGe0zVmpHk36IGSOkh+S43UfVYoDbVlLaV9txjlKETVeuUMEVd JDTlL619p2KzKi4XFsHQLCPYwifGh0u2Rc2IIIepZPKse4jxvU5dw6oVaPZRrtwPC2yY WH7IHtG2QyJslFGsALhCBMVP70FEYXiOq+Lo9fWnjreNm5KX9OF4WBWomdRtBuKrc6+T QBDtaKDb1Hz+CS5D6DN3UI6oexrI460BPIS6z08Cp1R8LEhiM87XI5nmdLTnj5M//LQs 9vpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=fsx4BUEcm1Ekj/Z85hno/waSHjea1BhESiOzxtzW0BM=; b=bwVBnvB6UF7oPjrvY3pR8Wvwp9weMeZccNLjEJ+wRfe9UJOAsunMwCvzJSxeAJ9r1e nPWv90LcXOM2EKfCs4FmcQoUTswK9QdTlk3QHPc63qrwHyygxiEDtnURmId26eo6uFZO ALy04lsO0HJB6BQHgVVGlI6MLoJNy/OYOPNtCalRVIChqF/rDZ3424yRzfI/H2zbxj/3 dzGPExFTqnrWjls9Oi1BGH0IO6hXBvDW9EOXBo9KwI25OoXWz/37l7JYaGvSL75IO1NT Xp8pMi90ttMkfR9MJJLj8PKInWpF3HHsp2+Z1xZcEi5JshcNV9TQfZo3sPpiBOgG9ZNC wjNQ== X-Gm-Message-State: ACgBeo1MLjK9b7hPqDQsQB1Nousv24L2fwoyOxTURoXVtsugTY+l+l5c WdJrEI3SA1m5NHQHOpZ58AEyAqqlBTsvcdHHCuVFGX+L X-Google-Smtp-Source: AA6agR589RdnGqdkrVVAdcpoAwks2vseWDT3mcK0FIQ8I1MOMOeq+JSlEj6xbGqLAmxzTYA2o75dhQe7Yons2qqaHUY= X-Received: by 2002:a17:907:3e12:b0:738:fd2f:df80 with SMTP id hp18-20020a1709073e1200b00738fd2fdf80mr36956882ejc.29.1662375793942; Mon, 05 Sep 2022 04:03:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 5 Sep 2022 13:03:01 +0200 Message-ID: Subject: Re: ICE after folding svld1rq to vec_perm_expr duing forwprop To: Prathamesh Kulkarni Cc: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Sep 5, 2022 at 11:27 AM Prathamesh Kulkarni wrote: > > On Mon, 5 Sept 2022 at 14:39, Richard Biener wrote: > > > > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni > > wrote: > > > > > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener wrote: > > > > > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > > > wrote: > > > > > > > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener wrote: > > > > > > > > >> > > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener w>> > > > > > > > > > > >> > > > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > > > > >> > > + > > > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > > > > >> > > + { > > > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > > > > >> > > + return NULL_TREE; > > > > > > > > >> > > + } > > > > > > > > >> > > > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > > > >> > > element count in the input vectors. > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > > > >> > > pp_space (pp); > > > > > > > > >> > > } > > > > > > > > >> > > } > > > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > > > >> > > pp_right_brace (pp); > > > > > > > > >> > > > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > > > > >> > Well, it got created for the following case after folding: > > > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > { > > > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> > } > > > > > > > > >> > > > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > > > >> > v = {a, b, c, d} > > > > > > > > >> > lhs = VEC_PERM_EXPR > > > > > > > > >> > > > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > > > > >> > need_ctor is thus true: > > > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > > > > >> > > > > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > > > constructor looks pretty bad. > > > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > > > > if we go thru > > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > > > > in fold_vec_perm. > > > > > > > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > > > > > > following semantics: > > > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > > > > Does that look reasonable ? > > > > > > > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > > > > width vectors > > > > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > > > > know the length of the VLA vector then the sizes could be different > > > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > > > > time constant, > > > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > > > > ones in PR106360. > > > > > > > > > Does it look OK ? > > > > > > > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > > > > instead? > > > > > > > > > > > > > > I think the point was they are not necssarily the same when we > > > > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > > > > here. > > > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > > > _92 = MEM [(uint8_t *)_78]; > > > > > > _91 = {_92, 0}; > > > > > > vect__1.6_90 = VIEW_CONVERT_EXPR(_91); > > > > > > _88 = MEM [(uint8_t *)_78 + 16B]; > > > > > > _87 = {_88, 0}; > > > > > > vect__1.7_86 = VIEW_CONVERT_EXPR(_87); > > > > > > vect__1.8_85 = VEC_PERM_EXPR > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > > > > hit assert in fold_vec_perm. > > > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > > > > when we used res_type = lhs_type. > > > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > > > > arg0 is fixed length, and op2 is VLA. > > > > Hi, > > > > Shall it be OK to commit the above patch, that changes res_type to > > > > have same elem type as arg0, and length as op2 ? > > > > > > > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > if len(type) == len(arg0) ? > > > ping > > > > I've lost track - what is the patch you are refering to? > This one: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599586.html > Only the tree-ssa-forwprop.cc bits (the other patch extends > fold_vec_perm and we don't want asserts in vec_perm pattern). Yes, the tree-ssa-forwprop.cc hunk of that patch is OK. Thanks, Richard. > Thanks, > Prathamesh > > > > Richard. > > > > > Thanks, > > > Prathamesh > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > > > > Well, it didn't hold true for following case when op2 is VLA: > > > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > > > > simplify_permutation. > > > > > > > > > > > > The vec_perm pattern hits: > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > { op0; } > > > > > > > > > > > > and folds it to: > > > > > > lhs = {1, 2, 3, 4} > > > > > > > > > > > > which results in error during verify_gimple since lhs and rhs types > > > > > > differ (VNx4SI, V4SI). > > > > > > > > > > > > I suppose we want to do the transforms: > > > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > > > only if input vectors and sel have same length ? > > > > > > > > > > > > Thanks, > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > > > > --- a/gcc/match.pd > > > > > > > +++ b/gcc/match.pd > > > > > > > @@ -7845,6 +7845,12 @@ and, > > > > > > > (with > > > > > > > { > > > > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > > > > + > > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > > > > + > > > > > > > machine_mode result_mode = TYPE_MODE (type); > > > > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard