From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 2FADC385843E
 for <gcc-patches@gcc.gnu.org>; Thu, 31 Mar 2022 13:26:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FADC385843E
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id 07F1821A96;
 Thu, 31 Mar 2022 13:26:06 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id F3FC2A3B87;
 Thu, 31 Mar 2022 13:26:05 +0000 (UTC)
Date: Thu, 31 Mar 2022 15:26:05 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
cc: gcc-patches@gcc.gnu.org, Jakub Jelinek <jakub@redhat.com>, 
 Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [PATCH] tree-optimization/104912 - ensure cost model is checked
 first
In-Reply-To: <mptpmm2wa8h.fsf@arm.com>
Message-ID: <73465r1r-22p-p279-901n-n3pr603139o9@fhfr.qr>
References: <20220321151031.9B0CB133B6@imap2.suse-dmz.suse.de>
 <mptpmm2wa8h.fsf@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Mar 2022 13:26:11 -0000

On Thu, 31 Mar 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > The following makes sure that when we build the versioning condition
> > for vectorization including the cost model check, we check for the
> > cost model and branch over other versioning checks.  That is what
> > the cost modeling assumes, since the cost model check is the only
> > one accounted for in the scalar outside cost.  Currently we emit
> > all checks as straight-line code combined with bitwise ops which
> > can result in surprising ordering of checks in the final assembly.
> 
> Yeah, this had bugged me too, and meant that we made some bad
> decisions in some of the local benchmarks we use.  Was just afraid
> to poke at it, since it seemed like a deliberate decision. :-)

I guess it was rather giving up because of the way our loop-versioning
infrastructure works ...

> > Since loop_version accepts only a single versioning condition
> > the splitting is done after the fact.
> >
> > The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> > with -Ofast and tuning for generic or skylake.  That's not enough
> > to recover from the slowdown when vectorizing but it now cuts off
> > the expensive alias versioning test.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK for trunk?
> >
> > For the rest of the regression my plan is to somehow factor in
> > the evolution of the number of iterations in the outer loop
> > (which is {1, +, 1}) to somehow bump the static profitability
> > estimate and together with the "cheap" cost model check never
> > execute the vectorized version (well, it is actually never executed,
> > but only because the alias check fails).
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-21  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/104912
> > 	* tree-vect-loop-manip.cc (vect_loop_versioning): Split
> > 	the cost model check to a separate BB to make sure it is
> > 	checked first and not combined with other version checks.
> > ---
> >  gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++---
> >  1 file changed, 50 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index a7bbc916bbc..8ef333eb31b 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >  	cond_expr = expr;
> >      }
> >  
> > +  tree cost_name = NULL_TREE;
> > +  if (cond_expr
> > +      && !integer_truep (cond_expr)
> > +      && (version_niter
> > +	  || version_align
> > +	  || version_alias
> > +	  || version_simd_if_cond))
> > +    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > +						    &cond_expr_stmt_list,
> > +						    is_gimple_val, NULL_TREE);
> > +
> >    if (version_niter)
> >      vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
> >  
> >    if (cond_expr)
> > -    cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > -					&cond_expr_stmt_list,
> > -					is_gimple_condexpr, NULL_TREE);
> > +    {
> > +      gimple_seq tem = NULL;
> > +      cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > +					  &tem,
> > +					  is_gimple_condexpr, NULL_TREE);
> > +      gimple_seq_add_seq (&cond_expr_stmt_list, tem);
> > +    }
> >  
> >    if (version_align)
> >      vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> > @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >        update_ssa (TODO_update_ssa);
> >      }
> >  
> > +  /* Split the cost model check off to a separate BB.  Costing assumes
> > +     this is the only thing we perform when we enter the scalar loop.  */
> 
> Maybe “…from a failed cost decision” or something?  Might sounds out
> of context like it applied more generally.

Fixed.

> > +  if (cost_name)
> > +    {
> > +      gimple *def = SSA_NAME_DEF_STMT (cost_name);
> 
> I realise it should only happen very rarely if at all, but is it
> absolutely guaranteed that the cost condition doesn't fold to a
> constant?

OK, better be safe than sorry - I added a SSA_NAME check.

> > +      /* All uses of the cost check are 'true' after the check we
> > +	 are going to insert.  */
> > +      replace_uses_by (cost_name, boolean_true_node);
> > +      /* And we're going to build the new single use of it.  */
> > +      gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
> > +				       NULL_TREE, NULL_TREE);
> > +      edge e = split_block (gimple_bb (def), def);
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (def);
> > +      gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> > +      edge true_e, false_e;
> > +      extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> > +      e->flags &= ~EDGE_FALLTHRU;
> > +      e->flags |= EDGE_TRUE_VALUE;
> > +      edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> > +      e->probability = prob;
> > +      e2->probability = prob.invert ();
> 
> Is reusing prob the right thing to do?  Wouldn't the path to the vector
> loop end up with a probability of PROB^2?

We're statically using profile_probability::likely () for the
vector path.  But you are right, and we scale loop frequencies with
prop.  So when we replace if () { with-prob A } with
if () if () { with-prob A } then we probably want both true edges
to have a probability of sqrt (prob) and the false edges sqrt (~prob)?

It _looks_ like profile_probability::split (prob) would do the
trick.  Here we replace if (X) with if (X && Y) with overall
likely() probability.

So something like (incremental):

@@ -3446,15 +3446,21 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
     }
 
   tree cost_name = NULL_TREE;
+  profile_probability prob2 = profile_probability::uninitialized ();
   if (cond_expr
       && !integer_truep (cond_expr)
       && (version_niter
          || version_align
          || version_alias
          || version_simd_if_cond))
-    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr 
(cond_expr),
-                                                   &cond_expr_stmt_list,
-                                                   is_gimple_val, 
NULL_TREE);
+    {
+      cost_name = cond_expr = force_gimple_operand_1 (unshare_expr 
(cond_expr),
+                                                     
&cond_expr_stmt_list,
+                                                     is_gimple_val, 
NULL_TREE);
+      /* Split prob () into two so that the overall probability of 
passing
+         both the cost-model and versioning checks is the orig prob.  */
+      prob2 = prob.split (prob);
+    }
 
   if (version_niter)
     vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
@@ -3688,8 +3695,8 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
       e->flags &= ~EDGE_FALLTHRU;
       e->flags |= EDGE_TRUE_VALUE;
       edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
-      e->probability = prob;
-      e2->probability = prob.invert ();
+      e->probability = prob2;
+      e2->probability = prob2.invert ();
       set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
       auto_vec<basic_block, 3> adj;
       for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);

might do the trick?  Honza?

> > +      set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
> > +      auto_vec<basic_block, 3> adj;
> > +      for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
> > +	   son;
> > +	   son = next_dom_son (CDI_DOMINATORS, son))
> > +	if (EDGE_COUNT (son->preds) > 1)
> > +	  adj.safe_push (son);
> > +      for (auto son : adj)
> > +	set_immediate_dominator (CDI_DOMINATORS, son, e->src);
> 
> Seems unfortunate that so much bespoke code is needed to do an operation
> like this, but that's not a very constructive comment, sorry. :-)

Yeah :/

> Anyway, LGTM, but I don't think I'd really be able to spot problems.

Let's see what Honza thinks about the probability issue.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > +    }
> > +
> >    if (version_niter)
> >      {
> >        /* The versioned loop could be infinite, we need to clear existing
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)