From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c])
	by sourceware.org (Postfix) with ESMTPS id 76D4D3858C50
	for <gcc-patches@gcc.gnu.org>; Mon, 17 Oct 2022 06:55:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76D4D3858C50
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 667D12275B;
	Mon, 17 Oct 2022 06:55:40 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1665989740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=BsLRDAvOXiITm3j98bC8gTp33azkvn1bs/4gID7UF3U=;
	b=yqARqt9KdK/VjAFjBMJrJpJ/vlvXkIa37pVLLdRdX5ENvAo1qnncfxqlo7/CpIQBQF8JH8
	+NH3cHijLDEoujBz8obZgNcBJc2VnGkQT26M90EQS7GVU+1DEIqHq/ro6QAGjhE9U3oA8x
	pvD3KxtCyBAJBwHn+5xilZ2VO/QTmYw=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1665989740;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=BsLRDAvOXiITm3j98bC8gTp33azkvn1bs/4gID7UF3U=;
	b=jsSRkOavmCPw1b4Q6o+3yTIMx7fjQB+QqXftW027vR5N5wfvB4tGhzjYwtQjYhqxNVJkge
	kVb0KZU4JtiQqgDg==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 5E7B62C141;
	Mon, 17 Oct 2022 06:55:40 +0000 (UTC)
Date: Mon, 17 Oct 2022 06:55:40 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
cc: Jason Merrill <jason@redhat.com>, Jan Hubicka <jh@suse.cz>, 
    gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654]
In-Reply-To: <Y0mraujtvgSDjNu4@tucnak>
Message-ID: <nycvar.YFH.7.77.849.2210170639510.18337@jbgna.fhfr.qr>
References: <Y0Pd0i4FCMyx6ukZ@tucnak> <d7676364-0574-c430-c422-804b82f473a3@redhat.com> <Y0SMXH8qOTErysVJ@tucnak> <Y0VxcOxwjGbN6rKl@tucnak> <479fa663-fb3e-a90d-ae7a-0fdd5acbfa00@redhat.com> <Y0e1QH++UvHO7MtJ@tucnak> <nycvar.YFH.7.77.849.2210141103070.18337@jbgna.fhfr.qr>
 <Y0mraujtvgSDjNu4@tucnak>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, 14 Oct 2022, Jakub Jelinek wrote:

> On Fri, Oct 14, 2022 at 11:27:07AM +0000, Richard Biener wrote:
> > > --- gcc/function.h.jj	2022-10-10 11:57:40.163722972 +0200
> > > +++ gcc/function.h	2022-10-12 19:48:28.887554771 +0200
> > > @@ -438,6 +438,10 @@ struct GTY(()) function {
> > >  
> > >    /* Set if there are any OMP_TARGET regions in the function.  */
> > >    unsigned int has_omp_target : 1;
> > > +
> > > +  /* Set for artificial function created for [[assume (cond)]].
> > > +     These should be GIMPLE optimized, but not expanded to RTL.  */
> > > +  unsigned int assume_function : 1;
> > 
> > I wonder if we should have this along force_output in the symtab
> > node and let the symtab code decide whether to expand?
> 
> I actually first had a flag on the symtab node but as the patch shows,
> when it needs to be tested, more frequently I have access to struct function
> than to cgraph node.

I see.

> > > --- gcc/gimplify.cc.jj	2022-10-10 11:57:40.165722944 +0200
> > > +++ gcc/gimplify.cc	2022-10-12 19:48:28.890554730 +0200
> > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple
> > >  						     fndecl, 0));
> > >  	      return GS_OK;
> > >  	    }
> > > -	  /* FIXME: Otherwise expand it specially.  */
> > > +	  /* If not optimizing, ignore the assumptions.  */
> > > +	  if (!optimize)
> > > +	    {
> > > +	      *expr_p = NULL_TREE;
> > > +	      return GS_ALL_DONE;
> > > +	    }
> > > +	  /* Temporarily, until gimple lowering, transform
> > > +	     .ASSUME (cond);
> > > +	     into:
> > > +	     guard = .ASSUME ();
> > > +	     if (guard) goto label_true; else label_false;
> > > +	     label_true:;
> > > +	     {
> > > +	       guard = cond;
> > > +	     }
> > > +	     label_false:;
> > > +	     .ASSUME (guard);
> > > +	     such that gimple lowering can outline the condition into
> > > +	     a separate function easily.  */
> > 
> > So the idea to use lambdas and/or nested functions (for OMP)
> > didn't work out or is more complicated?
> 
> Yes, that didn't work out.  Both lambda creation and nested function
> handling produce big structures with everything while for the assumptions
> it is better to have separate scalars if possible, lambda creation has
> various language imposed restrictions, diagnostics etc. and isn't
> available in C and I think the outlining in the patch is pretty simple and
> short.
> 
> > I wonder if, instead of using the above intermediate form we
> > can have a new structued GIMPLE code with sub-statements
> > 
> >  .ASSUME
> >    {
> >      condition;
> >    }
> 
> That is what I wrote in the patch description as alternative:
> "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't                                                                                                                
> extra clean but it is just something to hold it from gimplifier until                                                                                                                 
> gimple low pass; it reassembles if (condition_never_true) { cond; };                                                                                                                  
> an alternative would be introduce GOMP_ASSUME statement that would have                                                                                                               
> the guard var as operand and the GIMPLE_BIND as body, but for the                                                                                                                     
> few passes (tree-nested and omp lowering) in between that looked like                                                                                                                 
> an overkill to me)"
> I can certainly implement that easily.

I'd prefer that, it looks possibly less messy.

> > ?  There's gimple_statement_omp conveniently available as base and
> > IIRC you had the requirement to implement some OMP assume as well?
> 
> For OpenMP assumptions we right now implement just the holds clause
> of assume and implement it the same way as assume/gnu::assume attributes.
> 
> > Of ocurse a different stmt class with body would work as well here,
> > maybe we can even use a gbind with a special flag.
> > 
> > The outlining code can then be ajusted to outline a single BIND?
> 
> It already is adjusting a single bind (of course with everything nested in
> it).
> 
> > It probably won't simplify much that way.
> 
> > > +static tree
> > > +create_assumption_fn (location_t loc)
> > > +{
> > > +  tree name = clone_function_name_numbered (current_function_decl, "_assume");
> > > +  /* For now, will be changed later.  */
> > 
> > ?
> 
> I need to create the FUNCTION_DECL early and only later on discover
> the used automatic vars (for which I need the destination function)
> and only once those are discovered I can create the right
> function type for the function.
> 
> > > +  tree type = TREE_TYPE (current_function_decl);
> 
> > > +  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
> > > +    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
> > > +  DECL_FUNCTION_SPECIFIC_TARGET (decl)
> > > +    = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
> > > +  DECL_FUNCTION_VERSIONED (decl)
> > > +    = DECL_FUNCTION_VERSIONED (current_function_decl);
> > 
> > what does it mean to copy DECL_FUNCTION_VERSIONED here?
> 
> This was a copy and paste from elsewhere (I think OpenMP code).
> I guess I can nuke it and even better add some testcase coverage
> for various nasty things like assume in multi-versioned functions.
>
> > > +  DECL_ARGUMENTS (lad.id.dst_fn) = parms;
> > > +  TREE_TYPE (lad.id.dst_fn) = build_function_type (boolean_type_node, parmt);
> > 
> > ah, here's the type.  Maybe use error_mark_node as transitional type?
> 
> Will see if that works.
> > 
> > > +  cgraph_node::add_new_function (lad.id.dst_fn, false);
> > > +
> > > +  for (unsigned i = 0; i < sz; ++i)
> > > +    {
> > > +      tree v = lad.decls[i];
> > > +      if (TREE_CODE (v) == SSA_NAME)
> > > +	release_ssa_name (v);
> > > +    }
> > > +
> > > +  stmt = gsi_stmt (*gsi);
> > > +  lab = as_a <glabel *> (stmt);
> > > +  gcc_assert (gimple_label_label (lab) == l1
> > > +	      || gimple_label_label (lab) == l2);
> > > +  gsi_remove (gsi, true);
> > > +  stmt = gsi_stmt (*gsi);
> > > +  gcc_assert (gimple_call_internal_p (stmt, IFN_ASSUME)
> > > +	      && gimple_call_num_args (stmt) == 1
> > > +	      && gimple_call_arg (stmt, 0) == guard);
> > > +  data->cannot_fallthru = false;
> > > +  gcall *call = gimple_build_call_internal_vec (IFN_ASSUME, vargs);
> > > +  gimple_set_location (call, loc);
> > > +  gsi_replace (gsi, call, true);
> > > +}
> > >  
> > >  /* Lower statement GSI.  DATA is passed through the recursion.  We try to
> > >     track the fallthruness of statements and get rid of unreachable return
> > > @@ -354,6 +738,13 @@ lower_stmt (gimple_stmt_iterator *gsi, s
> > >  	tree decl = gimple_call_fndecl (stmt);
> > >  	unsigned i;
> > >  
> > > +	if (gimple_call_internal_p (stmt, IFN_ASSUME)
> > > +	    && gimple_call_num_args (stmt) == 0)
> > > +	  {
> > > +	    lower_assumption (gsi, data);
> > > +	    return;
> > > +	  }
> > > +
> > >  	for (i = 0; i < gimple_call_num_args (stmt); i++)
> > >  	  {
> > >  	    tree arg = gimple_call_arg (stmt, i);
> > > --- gcc/tree-ssa-ccp.cc.jj	2022-10-10 11:57:40.203722414 +0200
> > > +++ gcc/tree-ssa-ccp.cc	2022-10-12 19:48:28.891554716 +0200
> > > @@ -4253,6 +4253,12 @@ pass_fold_builtins::execute (function *f
> > >  	    }
> > >  
> > >  	  callee = gimple_call_fndecl (stmt);
> > > +	  if (!callee
> > > +	      && gimple_call_internal_p (stmt, IFN_ASSUME))
> > > +	    {
> > > +	      gsi_remove (&i, true);
> > > +	      continue;
> > > +	    }
> > >  	  if (!callee || !fndecl_built_in_p (callee, BUILT_IN_NORMAL))
> > >  	    {
> > >  	      gsi_next (&i);
> > > --- gcc/lto-streamer-out.cc.jj	2022-10-10 11:57:40.202722428 +0200
> > > +++ gcc/lto-streamer-out.cc	2022-10-12 19:48:28.891554716 +0200
> > > @@ -2278,6 +2278,7 @@ output_struct_function_base (struct outp
> > >    bp_pack_value (&bp, fn->calls_eh_return, 1);
> > >    bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
> > >    bp_pack_value (&bp, fn->has_simduid_loops, 1);
> > > +  bp_pack_value (&bp, fn->assume_function, 1);
> > >    bp_pack_value (&bp, fn->va_list_fpr_size, 8);
> > >    bp_pack_value (&bp, fn->va_list_gpr_size, 8);
> > >    bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8);
> > > --- gcc/lto-streamer-in.cc.jj	2022-10-10 11:57:40.201722442 +0200
> > > +++ gcc/lto-streamer-in.cc	2022-10-12 19:48:28.891554716 +0200
> > > @@ -1318,6 +1318,7 @@ input_struct_function_base (struct funct
> > >    fn->calls_eh_return = bp_unpack_value (&bp, 1);
> > >    fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
> > >    fn->has_simduid_loops = bp_unpack_value (&bp, 1);
> > > +  fn->assume_function = bp_unpack_value (&bp, 1);
> > >    fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
> > >    fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
> > >    fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8);
> > > --- gcc/cgraphunit.cc.jj	2022-10-10 11:57:40.152723125 +0200
> > > +++ gcc/cgraphunit.cc	2022-10-12 19:48:28.892554703 +0200
> > > @@ -1882,6 +1882,16 @@ cgraph_node::expand (void)
> > >    ggc_collect ();
> > >    timevar_pop (TV_REST_OF_COMPILATION);
> > >  
> > > +  if (DECL_STRUCT_FUNCTION (decl)
> > > +      && DECL_STRUCT_FUNCTION (decl)->assume_function)
> > > +    {
> > > +      /* Assume functions aren't expanded into RTL, on the other side
> > > +	 we don't want to release their body.  */
> > > +      if (cfun)
> > > +	pop_cfun ();
> > > +      return;
> > > +    }
> > > +
> > >    /* Make sure that BE didn't give up on compiling.  */
> > >    gcc_assert (TREE_ASM_WRITTEN (decl));
> > >    if (cfun)
> > > --- gcc/cfgexpand.cc.jj	2022-10-10 11:57:40.152723125 +0200
> > > +++ gcc/cfgexpand.cc	2022-10-12 19:48:28.893554689 +0200
> > > @@ -6597,6 +6597,14 @@ pass_expand::execute (function *fun)
> > >    rtx_insn *var_seq, *var_ret_seq;
> > >    unsigned i;
> > >  
> > > +  if (cfun->assume_function)
> > > +    {
> > > +      /* Assume functions should not be expanded to RTL.  */
> > 
> > can we avoid getting here in the first place?  I think we don't need
> > any of the post-pass_all_optimizations[_g] passes?
> 
> I'm afraid not without revamping passes.def, because
> to easily cat the pass queue from certain point onwards, we
> need all the remaining passes to be wrapped with
> PUSH_INSERT_PASSES_WITHIN.
> So, if we e.g. wanted to cut out everything from pass_tm_init
> onwards, we'd need to wrap:
>   NEXT_PASS (pass_tm_init);
>   PUSH_INSERT_PASSES_WITHIN (pass_tm_init)
>       NEXT_PASS (pass_tm_mark);
>       NEXT_PASS (pass_tm_memopt);
>       NEXT_PASS (pass_tm_edges);
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_simduid_cleanup);
>   NEXT_PASS (pass_vtable_verify);
>   NEXT_PASS (pass_lower_vaarg);
>   NEXT_PASS (pass_lower_vector);
>   NEXT_PASS (pass_lower_complex_O0);
>   NEXT_PASS (pass_sancov_O0);
>   NEXT_PASS (pass_lower_switch_O0);
>   NEXT_PASS (pass_asan_O0);
>   NEXT_PASS (pass_tsan_O0);
>   NEXT_PASS (pass_sanopt);
>   NEXT_PASS (pass_cleanup_eh);
>   NEXT_PASS (pass_lower_resx);
>   NEXT_PASS (pass_nrv);
>   NEXT_PASS (pass_gimple_isel);
>   NEXT_PASS (pass_harden_conditional_branches);
>   NEXT_PASS (pass_harden_compares);
>   NEXT_PASS (pass_warn_access, /*early=*/false);
>   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
>   NEXT_PASS (pass_warn_function_noreturn);
> 
>   NEXT_PASS (pass_expand);
> in some wrapper pass with a gate (either also including
> pass_rest_of_compilation but that would mean undesirable
> reindentation of everything there, or just
> the above ones and have assume_function punt in the 2
> or 1 gates).

Ah, they are all in all_passes :/  Maybe we can add something
like TODO_discard_function (or a property) that will not discard
the function but stop compiling it?  I wonder if all cleanup
is properly done for the function -  I suppose we want to keep the
body around for callers indefinitely.

> What I had in the patch was just skip pass_expand
> and pass_rest_of_compilation, perhaps another possibility
> to do the former would be to define a gate on pass_expand.

Or some gate in the pass manager, like in override_gate_status
check fun->properties & PROP_suspended and have some
pass_suspend_assume add that property for all assume function
bodies.

In case you like any of the above give it a shot, otherwise what
you have isn't too bad, I just wondered if there's a nicer way.


> > > --- gcc/tree-vectorizer.cc.jj	2022-10-10 11:57:40.204722400 +0200
> > > +++ gcc/tree-vectorizer.cc	2022-10-12 19:48:28.894554675 +0200
> > > @@ -1213,6 +1213,10 @@ public:
> > >    /* opt_pass methods: */
> > >    bool gate (function *fun) final override
> > >      {
> > > +      /* Vectorization makes range analysis of assume functions
> > > +	 harder, not easier.  */
> > > +      if (fun->assume_function)
> > > +	return false;
> > >        return flag_tree_loop_vectorize || fun->has_force_vectorize_loops;
> > >      }
> > >  
> > > @@ -1490,7 +1494,14 @@ public:
> > >  
> > >    /* opt_pass methods: */
> > >    opt_pass * clone () final override { return new pass_slp_vectorize (m_ctxt); }
> > > -  bool gate (function *) final override { return flag_tree_slp_vectorize != 0; }
> > > +  bool gate (function *fun) final override
> > > +  {
> > > +    /* Vectorization makes range analysis of assume functions harder,
> > > +       not easier.  */
> > 
> > Can we split out these kind of considerations from the initial patch?
> 
> Sure.
> > 
> > > +    if (fun->assume_function)
> > > +      return false;
> > > +    return flag_tree_slp_vectorize != 0;
> > > +  }
> > >    unsigned int execute (function *) final override;
> > >  
> > >  }; // class pass_slp_vectorize
> > > --- gcc/ipa-icf.cc.jj	2022-10-10 11:57:40.201722442 +0200
> > > +++ gcc/ipa-icf.cc	2022-10-12 19:48:28.894554675 +0200
> > > @@ -1517,6 +1517,9 @@ sem_function::parse (cgraph_node *node,
> > >    if (!func || (!node->has_gimple_body_p () && !node->thunk))
> > >      return NULL;
> > >  
> > > +  if (func->assume_function)
> > > +    return NULL;
> > > +
> > 
> > Do we want implicit noipa attribute on the assume functions?  Or do
> > we need to IPA-CP into them?  I suppose the ranger code can use
> > contextual code from the .ASSUME call for things like
> > assume (side-effect(), i == 1);
> 
> Most of normal IPA optimizations are disabled for them because
> they aren't normally called, all we do is take their address
> and pass it to .ASSUME.  IPA-ICF was an exception and I had to
> disable it because when it triggered it decided to create a thunk
> which failed to assemble.
> But implicit noipa surely is an option; though of course we want
> inlining etc. to happen into those functions.
> And eventually some kind of IPA SRA of their arguments but with
> different behavior from normal IPA-SRA.

I suppose for now adding noipa is easiest, we'd still inline into
the body of course.

> > Reading some of the patch I guessed you wanted to handle nested
> > assumes.  So - is
> > 
> > [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]]
> > 
> > a thing?
> 
> This is not valid, assume can be just on an empty statement.
> But with GNU statement expressions it is a thing and I should add it to
> testsuite coverage.
> 
> void
> foo (int a, int b)
> {
>   [[assume (a == 4 && ({ [[assume (b == 3)]]; b != 2 }))]];
> }
> is valid.
> I think the code should handle it fine (outline the outer and the new
> outlined function enters pass queue with all_lowering_passes
> and so will see pass_lower_cf again and hopefully work.  Will see
> how it goes when I tweak the patch.

Thanks,
Richard.