From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 93823 invoked by alias); 9 Sep 2019 20:56:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 93815 invoked by uid 89); 9 Sep 2019 20:56:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:XJ0J9XAARUdX, H*i:sk:BOo6ogT, H*i:EFLw67, H*f:sk:BOo6ogT X-HELO: mail-lf1-f49.google.com Received: from mail-lf1-f49.google.com (HELO mail-lf1-f49.google.com) (209.85.167.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 09 Sep 2019 20:56:13 +0000 Received: by mail-lf1-f49.google.com with SMTP id t8so11613003lfc.13 for ; Mon, 09 Sep 2019 13:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XrqnKMykRnFYaiAYbklcl9W3Xuj36fnnsZlYz8nJHNo=; b=LembeI6eUlohnUCwqgg81IkzfR2+ULYmSdtmqrjQCeP6696CEm2azv/zV5j1khAfKv 9kP6JCI3/XyesAUhdMYgERZeEt344huohkJKnpMkIFOtjSy7FGCU7aiaAexs5kwN8i/X NGksq04ndlSFlHQTzU5xiItELrJQYQqDNuqgax97ylTBYpmDpllL6cczo98/10OGuA3n Lcq77iT34+MK/EO5wBgZWrBMIWoWSbfvIswPIlfK3Rt6xnZ8HzedUTk1aNmTcJJVoiEH xASqKXdRHNbiRpUtBwOiyLB+4sUCilOBGx0qXnmXPERAPU/WkbsJeoKEVphGog3na9Z1 LO+A== MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Mon, 09 Sep 2019 20:56:00 -0000 Message-ID: Subject: Re: [SVE] PR86753 To: Richard Sandiford Cc: Richard Biener , gcc Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00620.txt.bz2 On Mon, 9 Sep 2019 at 22:06, Prathamesh Kulkarni wrote: > > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > > With patch, the only following FAIL remains for aarch64-sve.exp: > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > > > scan-assembler-times \\tmovprfx\\t 6 > > > which now contains 14. > > > Should I adjust the test, assuming the change isn't a regression ? > > > > Well, it is kind-of a regression, but it really just means that the > > integer code is now consistent with the floating-point code in having > > an unnecessary MOVPRFX. So I think adjusting the count is fine. > > Presumably any future fix for the existing redundant MOVPRFXs will > > apply to the new ones as well. > > > > The patch looks good to me, just some very minor nits: > > > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type) > > > > > > /* Record that a fully-masked version of LOOP_VINFO would need MASKS to > > > contain a sequence of NVECTORS masks that each control a vector of type > > > - VECTYPE. */ > > > + VECTYPE. SCALAR_MASK if non-null, represents the mask used for corresponding > > > + load/store stmt. */ > > > > Should be two spaces between sentences. Maybe: > > > > VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND > > these vector masks with the vector version of SCALAR_MASK. */ > > > > since the mask isn't necessarily for a load or store statement. > > > > > [...] > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, stmt_vec_info, > > > says how the load or store is going to be implemented and GROUP_SIZE > > > is the number of load or store statements in the containing group. > > > If the access is a gather load or scatter store, GS_INFO describes > > > - its arguments. > > > + its arguments. SCALAR_MASK is the scalar mask used for corresponding > > > + load or store stmt. > > > > Maybe: > > > > its arguments. If the load or store is conditional, SCALAR_MASK is the > > condition under which it occurs. > > > > since SCALAR_MASK can be null here too. > > > > > [...] > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, > > > /* Handle cond expr. */ > > > for (j = 0; j < ncopies; j++) > > > { > > > + tree loop_mask = NULL_TREE; > > > + bool swap_cond_operands = false; > > > + > > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > > > + { > > > + scalar_cond_masked_key cond (cond_expr, ncopies); > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > > > + { > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); > > > + } > > > + else > > > + { > > > + cond.code = invert_tree_comparison (cond.code, > > > + HONOR_NANS (TREE_TYPE (cond.op0))); > > > > Long line. Maybe just split it out into a separate assignment: > > > > bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); > > cond.code = invert_tree_comparison (cond.code, honor_nans); > > > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > > > + { > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); > > > > Long line here too. > > > > > [...] > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, > > > } > > > } > > > } > > > + > > > + if (loop_mask) > > > + { > > > + if (COMPARISON_CLASS_P (vec_compare)) > > > + { > > > + tree tmp = make_ssa_name (vec_cmp_type); > > > + gassign *g = gimple_build_assign (tmp, > > > + TREE_CODE (vec_compare), > > > + TREE_OPERAND (vec_compare, 0), > > d> + TREE_OPERAND (vec_compare, 1)); > > > > Two long lines. > > > > > + vect_finish_stmt_generation (stmt_info, g, gsi); > > > + vec_compare = tmp; > > > + } > > > + > > > + tree tmp2 = make_ssa_name (vec_cmp_type); > > > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask); > > > > Long line here too. > > > > > [...] > > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > > > index dc181524744..c4b2d8e8647 100644 > > > --- a/gcc/tree-vectorizer.c > > > +++ b/gcc/tree-vectorizer.c > > > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt) > > > { > > > return new pass_ipa_increase_alignment (ctxt); > > > } > > > + > > > +/* If code(T) is comparison op or def of comparison stmt, > > > + extract it's operands. > > > + Else return . */ > > > + > > > +void > > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) > > > +{ > > > + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison) > > > + { > > > + this->code = TREE_CODE (t); > > > + this->op0 = TREE_OPERAND (t, 0); > > > + this->op1 = TREE_OPERAND (t, 1); > > > + return; > > > + } > > > + > > > + if (TREE_CODE (t) == SSA_NAME) > > > + { > > > + gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (t)); > > > + if (stmt) > > > + { > > > > Might as well do this as: > > > > if (TREE_CODE (t) == SSA_NAME) > > if (gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (t))) > > { > > > > The patch (as hoped) introduces some XPASSes: > > > > XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252 > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 > > > > Could you remove the associated xfails (and comments above them where > > appropriate)? > > > > OK with those changes from my POV, but please give Richi a day or so > > to object. > > > > Thanks for doing this. > Thanks for the suggestions, I have updated the patch accordingly. > Boostrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu. > Richi, does the patch look OK to you ? Hi, Bootstrap+test passes for x86_64-unknown-linux-gnu and aarch64-linux-gnu. On x86_64, there's a "strange" failure of c-c++-common/builtins.c, log shows: /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test FAIL: c-c++-common/builtins.c -Wc++-compat (test for excess errors) Excess errors: /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test Which shouldn't really happen since the test doesn't seem relevant to patch, and only passes -O2 which shouldn't enable the vectorizer ? Manually testing it results in PASS with: make check-gcc RUNTESTFLAGS="dg.exp=builtins.c" Would it be OK to ignore the FAIL during reg-test ? Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Richard