On Thu, 15 Aug 2019 at 01:50, Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Aug 14, 2019 at 6:49 PM Richard Biener > > wrote: > >> > >> On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni > >> wrote: > >> > > >> > Hi, > >> > The attached patch tries to fix PR86753. > >> > > >> > For following test: > >> > void > >> > f1 (int *restrict x, int *restrict y, int *restrict z) > >> > { > >> > for (int i = 0; i < 100; ++i) > >> > x[i] = y[i] ? z[i] : 10; > >> > } > >> > > >> > vect dump shows: > >> > vect_cst__42 = { 0, ... }; > >> > vect_cst__48 = { 0, ... }; > >> > > >> > vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40); > >> > _4 = *_3; > >> > _5 = z_12(D) + _2; > >> > mask__35.8_43 = vect__4.7_41 != vect_cst__42; > >> > _35 = _4 != 0; > >> > vec_mask_and_46 = mask__35.8_43 & loop_mask_40; > >> > vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46); > >> > iftmp.0_13 = 0; > >> > vect_iftmp.12_50 = VEC_COND_EXPR >> > vect_iftmp.11_47, vect_cst__49>; > >> > > >> > and following code-gen: > >> > L2: > >> > ld1w z0.s, p2/z, [x1, x3, lsl 2] > >> > cmpne p1.s, p3/z, z0.s, #0 > >> > cmpne p0.s, p2/z, z0.s, #0 > >> > ld1w z0.s, p0/z, [x2, x3, lsl 2] > >> > sel z0.s, p1, z0.s, z1.s > >> > > >> > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions > >> > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42 > >> > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != vect_cst__48. > >> > > >> > I suppose in general for vec_cond_expr if T comes from masked load, > >> > which is conditional on C, then we could reuse the mask used in load, > >> > in vec_cond_expr ? > >> > > >> > The patch maintains a hash_map cond_to_vec_mask > >> > from vec_mask (with loop predicate applied). > >> > In prepare_load_store_mask, we record -> vec_mask & loop_mask, > >> > and in vectorizable_condition, we check if exists in > >> > cond_to_vec_mask > >> > and if found, the corresponding vec_mask is used as 1st operand of > >> > vec_cond_expr. > >> > > >> > is represented with cond_vmask_key, and the patch > >> > adds tree_cond_ops to represent condition operator and operands coming > >> > either from cond_expr > >> > or a gimple comparison stmt. If the stmt is not comparison, it returns > >> > and inserts that into cond_to_vec_mask. > >> > > >> > With patch, the redundant p1 is eliminated and sel uses p0 for above test. > >> > > >> > For following test: > >> > void > >> > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback) > >> > { > >> > for (int i = 0; i < 100; ++i) > >> > x[i] = y[i] ? z[i] : fallback; > >> > } > >> > > >> > input to vectorizer has operands swapped in cond_expr: > >> > _36 = _4 != 0; > >> > iftmp.0_14 = .MASK_LOAD (_5, 32B, _36); > >> > iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14; > >> > > >> > So we need to check for inverted condition in cond_to_vec_mask, > >> > and swap the operands. > >> > Does the patch look OK so far ? > >> > > >> > One major issue remaining with the patch is value numbering. > >> > Currently, it does value numbering for entire function using sccvn > >> > during start of vect pass, which is too expensive since we only need > >> > block based VN. I am looking into that. > >> > >> Why do you need it at all? We run VN on the if-converted loop bodies btw. > > This was my suggestion, but with the idea being to do the numbering > per-statement as we vectorise. We'll then see pattern statements too. > > That's important because we use pattern statements to set the right > vector boolean type (e.g. vect_recog_mask_conversion_pattern). > So some of the masks we care about don't exist after if converison. > > > Also I can't trivially see the equality of the masks and probably so > > can't VN. Is it that we just don't bother to apply loop_mask to > > VEC_COND but there's no harm if we do? > > Yeah. The idea of the optimisation is to decide when it's more profitable > to apply the loop mask, even though doing so isn't necessary. It would > be hard to do after vectorisation because the masks aren't equivalent. > We're relying on knowledge of how the vectoriser uses the result. Hi, Sorry for late response. This is an updated patch, that integrates block-based VN into vect pass. The patch (a) Exports visit_stmt (renamed to vn_visit_stmt), vn_bb_init to initialize VN state, and vn_bb_free to free it. (b) Calls vn_visit_stmt in vect_transform_stmt for value numbering stmts. We're only interested in obtaining value numbers, not eliminating redundancies. Does it look in the right direction ? I am not sure if the initialization in vn_bb_init is entirely correct. PS: The patch seems to regress fmla_2.c. I am looking into it. Thanks, Prathamesh > > Thanks, > Richard