On Mon, 9 Sep 2019 at 16:45, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > With patch, the only following FAIL remains for aarch64-sve.exp: > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > > scan-assembler-times \\tmovprfx\\t 6 > > which now contains 14. > > Should I adjust the test, assuming the change isn't a regression ? > > Well, it is kind-of a regression, but it really just means that the > integer code is now consistent with the floating-point code in having > an unnecessary MOVPRFX. So I think adjusting the count is fine. > Presumably any future fix for the existing redundant MOVPRFXs will > apply to the new ones as well. > > The patch looks good to me, just some very minor nits: > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type) > > > > /* Record that a fully-masked version of LOOP_VINFO would need MASKS to > > contain a sequence of NVECTORS masks that each control a vector of type > > - VECTYPE. */ > > + VECTYPE. SCALAR_MASK if non-null, represents the mask used for corresponding > > + load/store stmt. */ > > Should be two spaces between sentences. Maybe: > > VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND > these vector masks with the vector version of SCALAR_MASK. */ > > since the mask isn't necessarily for a load or store statement. > > > [...] > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, stmt_vec_info, > > says how the load or store is going to be implemented and GROUP_SIZE > > is the number of load or store statements in the containing group. > > If the access is a gather load or scatter store, GS_INFO describes > > - its arguments. > > + its arguments. SCALAR_MASK is the scalar mask used for corresponding > > + load or store stmt. > > Maybe: > > its arguments. If the load or store is conditional, SCALAR_MASK is the > condition under which it occurs. > > since SCALAR_MASK can be null here too. > > > [...] > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, > > /* Handle cond expr. */ > > for (j = 0; j < ncopies; j++) > > { > > + tree loop_mask = NULL_TREE; > > + bool swap_cond_operands = false; > > + > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > > + { > > + scalar_cond_masked_key cond (cond_expr, ncopies); > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > > + { > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); > > + } > > + else > > + { > > + cond.code = invert_tree_comparison (cond.code, > > + HONOR_NANS (TREE_TYPE (cond.op0))); > > Long line. Maybe just split it out into a separate assignment: > > bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); > cond.code = invert_tree_comparison (cond.code, honor_nans); > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > > + { > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); > > Long line here too. > > > [...] > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, > > } > > } > > } > > + > > + if (loop_mask) > > + { > > + if (COMPARISON_CLASS_P (vec_compare)) > > + { > > + tree tmp = make_ssa_name (vec_cmp_type); > > + gassign *g = gimple_build_assign (tmp, > > + TREE_CODE (vec_compare), > > + TREE_OPERAND (vec_compare, 0), > d> + TREE_OPERAND (vec_compare, 1)); > > Two long lines. > > > + vect_finish_stmt_generation (stmt_info, g, gsi); > > + vec_compare = tmp; > > + } > > + > > + tree tmp2 = make_ssa_name (vec_cmp_type); > > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask); > > Long line here too. > > > [...] > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > > index dc181524744..c4b2d8e8647 100644 > > --- a/gcc/tree-vectorizer.c > > +++ b/gcc/tree-vectorizer.c > > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt) > > { > > return new pass_ipa_increase_alignment (ctxt); > > } > > + > > +/* If code(T) is comparison op or def of comparison stmt, > > + extract it's operands. > > + Else return . */ > > + > > +void > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) > > +{ > > + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison) > > + { > > + this->code = TREE_CODE (t); > > + this->op0 = TREE_OPERAND (t, 0); > > + this->op1 = TREE_OPERAND (t, 1); > > + return; > > + } > > + > > + if (TREE_CODE (t) == SSA_NAME) > > + { > > + gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (t)); > > + if (stmt) > > + { > > Might as well do this as: > > if (TREE_CODE (t) == SSA_NAME) > if (gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (t))) > { > > The patch (as hoped) introduces some XPASSes: > > XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252 > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 > > Could you remove the associated xfails (and comments above them where > appropriate)? > > OK with those changes from my POV, but please give Richi a day or so > to object. > > Thanks for doing this. Thanks for the suggestions, I have updated the patch accordingly. Boostrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu. Richi, does the patch look OK to you ? Thanks, Prathamesh > > Richard