From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 2FADC385843E for ; Thu, 31 Mar 2022 13:26:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FADC385843E Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 07F1821A96; Thu, 31 Mar 2022 13:26:06 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id F3FC2A3B87; Thu, 31 Mar 2022 13:26:05 +0000 (UTC) Date: Thu, 31 Mar 2022 15:26:05 +0200 (CEST) From: Richard Biener To: Richard Sandiford cc: gcc-patches@gcc.gnu.org, Jakub Jelinek , Jan Hubicka Subject: Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first In-Reply-To: Message-ID: <73465r1r-22p-p279-901n-n3pr603139o9@fhfr.qr> References: <20220321151031.9B0CB133B6@imap2.suse-dmz.suse.de> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2022 13:26:11 -0000 On Thu, 31 Mar 2022, Richard Sandiford wrote: > Richard Biener writes: > > The following makes sure that when we build the versioning condition > > for vectorization including the cost model check, we check for the > > cost model and branch over other versioning checks. That is what > > the cost modeling assumes, since the cost model check is the only > > one accounted for in the scalar outside cost. Currently we emit > > all checks as straight-line code combined with bitwise ops which > > can result in surprising ordering of checks in the final assembly. > > Yeah, this had bugged me too, and meant that we made some bad > decisions in some of the local benchmarks we use. Was just afraid > to poke at it, since it seemed like a deliberate decision. :-) I guess it was rather giving up because of the way our loop-versioning infrastructure works ... > > Since loop_version accepts only a single versioning condition > > the splitting is done after the fact. > > > > The result is a 1.5% speedup of 416.gamess on x86_64 when compiling > > with -Ofast and tuning for generic or skylake. That's not enough > > to recover from the slowdown when vectorizing but it now cuts off > > the expensive alias versioning test. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > OK for trunk? > > > > For the rest of the regression my plan is to somehow factor in > > the evolution of the number of iterations in the outer loop > > (which is {1, +, 1}) to somehow bump the static profitability > > estimate and together with the "cheap" cost model check never > > execute the vectorized version (well, it is actually never executed, > > but only because the alias check fails). > > > > Thanks, > > Richard. > > > > 2022-03-21 Richard Biener > > > > PR tree-optimization/104912 > > * tree-vect-loop-manip.cc (vect_loop_versioning): Split > > the cost model check to a separate BB to make sure it is > > checked first and not combined with other version checks. > > --- > > gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++--- > > 1 file changed, 50 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > index a7bbc916bbc..8ef333eb31b 100644 > > --- a/gcc/tree-vect-loop-manip.cc > > +++ b/gcc/tree-vect-loop-manip.cc > > @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo, > > cond_expr = expr; > > } > > > > + tree cost_name = NULL_TREE; > > + if (cond_expr > > + && !integer_truep (cond_expr) > > + && (version_niter > > + || version_align > > + || version_alias > > + || version_simd_if_cond)) > > + cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), > > + &cond_expr_stmt_list, > > + is_gimple_val, NULL_TREE); > > + > > if (version_niter) > > vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr); > > > > if (cond_expr) > > - cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), > > - &cond_expr_stmt_list, > > - is_gimple_condexpr, NULL_TREE); > > + { > > + gimple_seq tem = NULL; > > + cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), > > + &tem, > > + is_gimple_condexpr, NULL_TREE); > > + gimple_seq_add_seq (&cond_expr_stmt_list, tem); > > + } > > > > if (version_align) > > vect_create_cond_for_align_checks (loop_vinfo, &cond_expr, > > @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo, > > update_ssa (TODO_update_ssa); > > } > > > > + /* Split the cost model check off to a separate BB. Costing assumes > > + this is the only thing we perform when we enter the scalar loop. */ > > Maybe “…from a failed cost decision” or something? Might sounds out > of context like it applied more generally. Fixed. > > + if (cost_name) > > + { > > + gimple *def = SSA_NAME_DEF_STMT (cost_name); > > I realise it should only happen very rarely if at all, but is it > absolutely guaranteed that the cost condition doesn't fold to a > constant? OK, better be safe than sorry - I added a SSA_NAME check. > > + /* All uses of the cost check are 'true' after the check we > > + are going to insert. */ > > + replace_uses_by (cost_name, boolean_true_node); > > + /* And we're going to build the new single use of it. */ > > + gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node, > > + NULL_TREE, NULL_TREE); > > + edge e = split_block (gimple_bb (def), def); > > + gimple_stmt_iterator gsi = gsi_for_stmt (def); > > + gsi_insert_after (&gsi, cond, GSI_NEW_STMT); > > + edge true_e, false_e; > > + extract_true_false_edges_from_block (e->dest, &true_e, &false_e); > > + e->flags &= ~EDGE_FALLTHRU; > > + e->flags |= EDGE_TRUE_VALUE; > > + edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE); > > + e->probability = prob; > > + e2->probability = prob.invert (); > > Is reusing prob the right thing to do? Wouldn't the path to the vector > loop end up with a probability of PROB^2? We're statically using profile_probability::likely () for the vector path. But you are right, and we scale loop frequencies with prop. So when we replace if () { with-prob A } with if () if () { with-prob A } then we probably want both true edges to have a probability of sqrt (prob) and the false edges sqrt (~prob)? It _looks_ like profile_probability::split (prob) would do the trick. Here we replace if (X) with if (X && Y) with overall likely() probability. So something like (incremental): @@ -3446,15 +3446,21 @@ vect_loop_versioning (loop_vec_info loop_vinfo, } tree cost_name = NULL_TREE; + profile_probability prob2 = profile_probability::uninitialized (); if (cond_expr && !integer_truep (cond_expr) && (version_niter || version_align || version_alias || version_simd_if_cond)) - cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), - &cond_expr_stmt_list, - is_gimple_val, NULL_TREE); + { + cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), + &cond_expr_stmt_list, + is_gimple_val, NULL_TREE); + /* Split prob () into two so that the overall probability of passing + both the cost-model and versioning checks is the orig prob. */ + prob2 = prob.split (prob); + } if (version_niter) vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr); @@ -3688,8 +3695,8 @@ vect_loop_versioning (loop_vec_info loop_vinfo, e->flags &= ~EDGE_FALLTHRU; e->flags |= EDGE_TRUE_VALUE; edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE); - e->probability = prob; - e2->probability = prob.invert (); + e->probability = prob2; + e2->probability = prob2.invert (); set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src); auto_vec adj; for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest); might do the trick? Honza? > > + set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src); > > + auto_vec adj; > > + for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest); > > + son; > > + son = next_dom_son (CDI_DOMINATORS, son)) > > + if (EDGE_COUNT (son->preds) > 1) > > + adj.safe_push (son); > > + for (auto son : adj) > > + set_immediate_dominator (CDI_DOMINATORS, son, e->src); > > Seems unfortunate that so much bespoke code is needed to do an operation > like this, but that's not a very constructive comment, sorry. :-) Yeah :/ > Anyway, LGTM, but I don't think I'd really be able to spot problems. Let's see what Honza thinks about the probability issue. Thanks, Richard. > Thanks, > Richard > > > + } > > + > > if (version_niter) > > { > > /* The versioned loop could be infinite, we need to clear existing > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)