From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98703 invoked by alias); 27 Aug 2019 09:59:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 98692 invoked by uid 89); 27 Aug 2019 09:59:01 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.7 required=5.0 tests=AWL,BAYES_00,KAM_NUMSUBJECT,SPF_PASS autolearn=no version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 27 Aug 2019 09:58:59 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C727D337; Tue, 27 Aug 2019 02:58:57 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 34FF23F718; Tue, 27 Aug 2019 02:58:57 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,Richard Biener , gcc Patches , richard.sandiford@arm.com Cc: Richard Biener , gcc Patches Subject: Re: [SVE] PR86753 References: Date: Tue, 27 Aug 2019 10:41:00 -0000 In-Reply-To: (Prathamesh Kulkarni's message of "Mon, 26 Aug 2019 18:29:59 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg01807.txt.bz2 Prathamesh Kulkarni writes: > On Mon, 26 Aug 2019 at 14:48, Richard Biener wrote: >> >> On Sun, Aug 25, 2019 at 11:13 PM Prathamesh Kulkarni >> wrote: >> > >> > On Fri, 23 Aug 2019 at 19:43, Richard Sandiford >> > wrote: >> > > >> > > Prathamesh Kulkarni writes: >> > > > On Fri, 23 Aug 2019 at 18:15, Richard Sandiford >> > > > wrote: >> > > >> >> > > >> Prathamesh Kulkarni writes: >> > > >> > On Thu, 22 Aug 2019 at 16:44, Richard Biener wrote: >> > > >> >> It looks a bit odd to me. I'd have expected it to work by generating >> > > >> >> the stmts as before in the vectorizer and then on the stmts we care >> > > >> >> invoke vn_visit_stmt that does both value-numbering and elimination. >> > > >> >> Alternatively you could ask the VN state to generate the stmt for >> > > >> >> you via vn_nary_build_or_lookup () (certainly that needs a bit more >> > > >> >> work). One complication might be availability if you don't value-number >> > > >> >> all stmts in the block, but well. I'm not sure constraining to a single >> > > >> >> block is necessary - I've thought of having a "CSE"ing gimple_build >> > > >> >> for some time (add & CSE new stmts onto a sequence), so one >> > > >> >> should keep this mode in mind when designing the one working on >> > > >> >> an existing BB. Note as you write it it depends on visiting the >> > > >> >> stmts in proper order - is that guaranteed when for example >> > > >> >> vectorizing SLP? >> > > >> > Hi, >> > > >> > Indeed, I wrote the function with assumption that, stmts would be >> > > >> > visited in proper order. >> > > >> > This doesn't affect SLP currently, because call to vn_visit_stmt in >> > > >> > vect_transform_stmt is >> > > >> > conditional on cond_to_vec_mask, which is only allocated inside >> > > >> > vect_transform_loop. >> > > >> > But I agree we could make it more general. >> > > >> > AFAIU, the idea of constraining VN to single block was to avoid using defs from >> > > >> > non-dominating scalar stmts during outer-loop vectorization. >> > > >> >> > > >> Maybe we could do the numbering in a separate walk immediately before >> > > >> the transform phase instead. >> > > > Um sorry, I didn't understand. Do you mean we should do dom based VN >> > > > just before transform phase >> > > > or run full VN ? >> > > >> > > No, I just meant that we could do a separate walk of the contents >> > > of the basic block: >> > > >> > > > @@ -8608,6 +8609,8 @@ vect_transform_loop (loop_vec_info loop_vinfo) >> > > > { >> > > > basic_block bb = bbs[i]; >> > > > stmt_vec_info stmt_info; >> > > > + vn_bb_init (bb); >> > > > + loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8); >> > > > >> > > >> > > ...here, rather than doing it on the fly during vect_transform_stmt >> > > itself. The walk should be gated on LOOP_VINFO_FULLY_MASKED_P so that >> > > others don't have to pay the compile-time penalty. (Same for >> > > cond_to_vec_mask itself really.) >> > Hi, >> > Does the attached patch look OK ? >> > In patch, I put call to vn_visit stmt in bb loop in >> > vect_transform_loop to avoid replicating logic for processing phi and >> > stmts. >> > AFAIU, vect_transform_loop_stmt is only called from bb loop, so >> > compile time penalty for checking cond_to_vec_mask >> > should be pretty small ? >> > If this is not OK, I will walk bb immediately before the bb loop. >> >> So if I understand correctly you never have vectorizable COND_EXPRs >> in SLP mode? Because we vectorize all SLP chains before entering >> the loop in vect_transform_loop where you VN existing scalar(!) stmts. On the "!": the idea behind the patch is to find cases in which a scalar condition is used in both a statement that needs to be masked for correctness reasons and a statement that we can choose to mask if we want to. It also tries (opportunisticly) to match the ?: order with other conditions. That's why it's operating on scalar values rather than vector values. In principle it could be done as a subpass before vectorisation rather than on the fly, when there aren't any vector stmts around. >> Then all this hew hash-table stuff should not be needed since this >> is what VN should provide you with. You of course need to visit >> generated condition stmts. And condition support is weak >> in VN due to it possibly having two operations in a single stmt. >> Bad GIMPLE IL. So I'm not sure VN is up to the task here or >> why you even need it given you are doing your own hashing? > Well, we thought of using VN for comparing operands for cases > operand_equal_p would not > work. Actually, VN seems not to be required for test-cases in PR > because both conditions > are _4 != 0 (_35 = _4 != 0 and in cond_expr), which works to match > with operand_equal_p. Right, that's why I was suggesting in the earlier thread that we treat value numbering as a follow-on. But... > Input to vectorizer is: > > [local count: 1063004407]: > # i_20 = PHI > # ivtmp_19 = PHI > _1 = (long unsigned int) i_20; > _2 = _1 * 4; > _3 = y_11(D) + _2; > _4 = *_3; > _5 = z_12(D) + _2; > _35 = _4 != 0; > iftmp.0_13 = .MASK_LOAD (_5, 32B, _35); > iftmp.0_8 = _4 != 0 ? iftmp.0_13 : 10; > > In prepare_load_store_mask, we record (ne_expr, _4, 0) -> vec_mask in > cond_to_vec_mask, > and in vectorizable_condition, we look up (ne_expr, _4, 0) which does > not require VN > since operands are same. > > Initially, I was trying to change the generated vectorized code: > > mask__35.8_43 = vect__4.7_41 != vect_cst__42; > vect_iftmp.12_50 = VEC_COND_EXPR vect_iftmp.11_47, vect_cst__49>; > > where both conditions are equivalent because vect_cst__42 and > vect_cst__48 are zero vectors but operand_equal_p failed to catch > those. > > Sorry, I mixed up later between scalar and vector stmts. > I wonder if we then need VN ? Perhaps there might be other cases where > operands of scalar > conditions may be equivalent but not match with operand_equal_p ? > In the attached patch, changing operator==, to compare using > operand_equal_p works for the tests. ...those are only very simple cases :-) The problem is that ifcvt doesn't leave the code free of redundancies, and pattern stmt generation can create redundancies too. Of course, we could fix those instead. E.g. for: void f (short *restrict x, short *restrict a, short *restrict b, short *restrict c) { for (int i = 0; i < 100; ++i) if (a[i] >= 1) x[i] = b[i] >= 1 ? a[i] : c[i]; } ifcvt produces: [local count: 1063004407]: # i_34 = PHI # ivtmp_5 = PHI _1 = (long unsigned int) i_34; _2 = _1 * 2; _3 = a_23(D) + _2; _4 = *_3; _7 = b_24(D) + _2; _49 = _4 > 0; _8 = .MASK_LOAD (_7, 16B, _49); _12 = _4 > 0; _13 = _8 > 0; _9 = _12 & _13; _10 = _4 > 0; _11 = _8 > 0; _27 = ~_11; _15 = _10 & _27; _14 = c_25(D) + _2; iftmp.0_26 = .MASK_LOAD (_14, 16B, _15); iftmp.0_19 = _9 ? _4 : iftmp.0_26; _17 = x_28(D) + _2; _50 = _4 > 0; .MASK_STORE (_17, 16B, _50, iftmp.0_19); i_30 = i_34 + 1; ivtmp_6 = ivtmp_5 - 1; if (ivtmp_6 != 0) goto ; [98.99%] else goto ; [1.01%] [local count: 1052266994]: goto ; [100.00%] which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0). Looking through the definition of an SSA name means that we can cope with these redundancies for single comparisons, but not for comparisons combined through & and |. This hurts most when trying to decide whether to invert comparisons. But maybe value numbering won't cope with that either :-) Thanks, Richard