From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id 6DB7F38515FD for ; Wed, 26 May 2021 12:41:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6DB7F38515FD Received: by mail-ed1-x52c.google.com with SMTP id i13so1259125edb.9 for ; Wed, 26 May 2021 05:41:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q6dN6jkAqVyE76EROpM3PdAAl2ai9G7uDno3kPwcd6A=; b=JBFAuGjjpJAgBTZoAf7TZY+oBzjAyFFtAEdAI8eYEv0lvnQEUdvIgP3so+f8v9e6nq fvLwdgf+jrS99QfudA79rUdVS1AZ4hK2B7KghqSD9f4jgSLmknNBZbb/+h0bZdQxiLJq iFmLHHWdIFauRaJgR6ils0v7YA5mNnkufwsgKPvKvwrxxBHRc+xqBKW8L5jsKUXSmXgw mEv0h+OQ3MkvY9+ouQdSdq/pBOSvuYqikmiY7lpx2BXluNsP/cSO/zZhbOJVArQx+h9K 1SBY5ou0Tc+Jmj7DC/JO2h+BD0JY6AAHTun2kww/cSB72xf4IEcRu7ZH/udlMIHvRB2C amOA== X-Gm-Message-State: AOAM5316XJVnal1afuOU/hYyY8916cBahOzEOcJ5KXMEh5rf/DvWZJxN POJc17LYAouJsQAm5UMkWeHMjVVF92luw99e3cg= X-Google-Smtp-Source: ABdhPJzyfXIRggyfmrks4xCEyNud3xW+GcIk3K2pX3M83KIqdAD8XiEQ2VW9jlUIYrvNkxsxHQ4KMWrY3COq3aapU9E= X-Received: by 2002:a05:6402:5174:: with SMTP id d20mr38189237ede.248.1622032906446; Wed, 26 May 2021 05:41:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 26 May 2021 14:41:35 +0200 Message-ID: Subject: Re: [PATCH] Extend is_cond_scalar_reduction to handle nop_expr after/before scalar reduction.[PR98365] To: Hongtao Liu Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2021 12:41:49 -0000 On Wed, May 26, 2021 at 7:06 AM Hongtao Liu wrote: > > On Tue, May 25, 2021 at 6:24 PM Richard Biener > wrote: > > > > On Mon, May 24, 2021 at 11:52 AM Hongtao Liu wrote: > > > > > > Hi: > > > Details described in PR. > > > Bootstrapped and regtest on > > > x86_64-linux-gnu{-m32,}/x86_64-linux-gnu{-m32\ > > > -march=cascadelake,-march=cascadelake} > > > Ok for trunk? > > > > +static tree > > +strip_nop_cond_scalar_reduction (bool has_nop, tree op) > > +{ > > + if (!has_nop) > > + return op; > > + > > + if (TREE_CODE (op) != SSA_NAME) > > + return NULL_TREE; > > + > > + gimple* stmt = SSA_NAME_DEF_STMT (op); > > + if (!stmt > > + || gimple_code (stmt) != GIMPLE_ASSIGN > > + || gimple_has_volatile_ops (stmt) > > + || gimple_assign_rhs_code (stmt) != NOP_EXPR) > > + return NULL_TREE; > > + > > + return gimple_assign_rhs1 (stmt); > > > > this allows arbitrary conversions where the comment suggests you > > only want to allow conversions to the same precision but different sign. > > Sth like > > > > gassign *stmt = safe_dyn_cast (SSA_NAME_DEF_STMT (op)); > > if (!stmt > > || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)) > > || !tree_nop_conversion_p (TREE_TYPE (op), TREE_TYPE > > (gimple_assign_rhs1 (stmt)))) > > return NULL_TREE; > > > > + if (gimple_bb (stmt) != gimple_bb (*nop_reduc) > > + || gimple_code (stmt) != GIMPLE_ASSIGN > > + || gimple_has_volatile_ops (stmt)) > > + return false; > > > > !is_gimple_assign (stmt) instead of gimple_code (stmt) != GIMPLE_ASSIGN > > > > the gimple_has_volatile_ops check is superfluous given you restrict > > the assign code. > > > > + /* Check that R_NOP1 is used in nop_stmt or in PHI only. */ > > + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, r_nop1) > > + { > > + gimple *use_stmt = USE_STMT (use_p); > > + if (is_gimple_debug (use_stmt)) > > + continue; > > + if (use_stmt == SSA_NAME_DEF_STMT (r_op1)) > > + continue; > > + if (gimple_code (use_stmt) != GIMPLE_PHI) > > + return false; > > > > can the last check be use_stmt == phi since we should have the > > PHI readily available? > > > > @@ -1735,6 +1822,23 @@ convert_scalar_cond_reduction (gimple *reduc, > > gimple_stmt_iterator *gsi, > > rhs = fold_build2 (gimple_assign_rhs_code (reduc), > > TREE_TYPE (rhs1), op0, tmp); > > > > + if (has_nop) > > + { > > + /* Create assignment nop_rhs = op0 +/- _ifc_. */ > > + tree nop_rhs = make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_nop_"); > > + gimple* new_assign2 = gimple_build_assign (nop_rhs, rhs); > > + gsi_insert_before (gsi, new_assign2, GSI_SAME_STMT); > > + /* Rebuild rhs for nop_expr. */ > > + rhs = fold_build1 (NOP_EXPR, > > + TREE_TYPE (gimple_assign_lhs (nop_reduc)), > > + nop_rhs); > > + > > + /* Delete nop_reduc. */ > > + stmt_it = gsi_for_stmt (nop_reduc); > > + gsi_remove (&stmt_it, true); > > + release_defs (nop_reduc); > > + } > > + > > > > hmm, the whole function could be cleaned up with sth like > > > > /* Build rhs for unconditional increment/decrement. */ > > gimple_seq stmts = NULL; > > rhs = gimple_build (&stmts, gimple_assing_rhs_code (reduc), > > TREE_TYPE (rhs1), op0, tmp); > > if (has_nop) > > rhs = gimple_convert (&stmts, TREE_TYPE (gimple_assign_lhs > > (nop_reduc)), rhs); > > gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > > > > plus in the caller moving the > > > > new_stmt = gimple_build_assign (res, rhs); > > gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); > > > > to the else branch as well as the folding done on new_stmt (maybe return > > new_stmt instead of rhs from convert_scalar_cond_reduction. > Eventually, we needed to assign rhs to res, and with an extra mov stmt > from rhs to res, the vectorizer failed. > the only difference in 166t.ifcvt between successfully vectorization > and failed vectorization is below > char * _24; > char _25; > unsigned char _ifc__29; > + unsigned char _30; > > [local count: 118111600]: > if (n_10(D) != 0) > @@ -70,7 +71,8 @@ char foo2 (char * a, char * c, int n) > _5 = c_14(D) + _1; > _6 = *_5; > _ifc__29 = _3 == _6 ? 1 : 0; > - cnt_7 = cnt_18 + _ifc__29; > + _30 = cnt_18 + _ifc__29; > + cnt_7 = _30; > i_16 = i_20 + 1; > if (n_10(D) != i_16) > goto ; [89.00%] > @@ -110,7 +112,7 @@ char foo2 (char * a, char * c, int n) > goto ; [100.00%] > > [local count: 105119324]: > - # cnt_19 = PHI > + # cnt_19 = PHI <_30(3), cnt_27(15)> > _21 = (char) cnt_19; > > if we want to eliminate the extra move, gimple_build and > gimple_convert is not suitable since they create a new lhs, is there > any interface like gimple_build_assign but accept stmts? Hmm, OK. So what you could do is replacing all uses of 'res' with 'rhs', alternatively replacing the last generated stmt LHS by 'res'. Both is a bit hackish of course. Usually the vectorizer just ignores copies like this but the reduction matching is unnecessarily strict here (but it's also a bit awkward to fix there). There's redundant_ssa_names which seems to be designed to handle propagating those out and there's the do_rpo_vn run, so I wonder why the stmt remains. Now, for redundant_ssa_names you'd need to push a std::pair (res, rhs) to it in case rhs is an SSA name - does that help? Richard. > > > > Richard. > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/pr98365 > > > * tree-if-conv.c (strip_nop_cond_scalar_reduction): New function. > > > (is_cond_scalar_reduction): Handle nop_expr in cond scalar reduction. > > > (convert_scalar_cond_reduction): Ditto. > > > (predicate_scalar_phi): Ditto. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/pr98365 > > > * gcc.target/i386/pr98365.c: New test. > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao