From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id AF68C3858D38
	for <gcc-patches@gcc.gnu.org>; Fri, 14 Oct 2022 18:34:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AF68C3858D38
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1665772451;
	h=from:from:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:in-reply-to:in-reply-to:  references:references;
	bh=7VcFqZrJ5iG3Rj7psH6XUsYRQM04OtloEJ1WOUdUmYw=;
	b=Kk/cOlKQX2EhL+pMcg/uE1w8r5lsL86BR5V0KA5WQUJGxWBBmlTwswagO5nLhxR0M1gmkr
	wl6pSiqL1iFm0sSwhOXXKKLXtngeeLPJm1NW/vVfbui1FXOOB2E/W7gKuiozma+2BwvIuB
	s7siqj6BM+sgb3p1tAyBKvlCycdGpLs=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-163-5ujEYWuGMDOP0bc5lNkZ4g-1; Fri, 14 Oct 2022 14:34:10 -0400
X-MC-Unique: 5ujEYWuGMDOP0bc5lNkZ4g-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B2A4E833B05;
	Fri, 14 Oct 2022 18:34:09 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.55])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 4E2591401699;
	Fri, 14 Oct 2022 18:34:09 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
	by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 29EIXe8B2636752
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
	Fri, 14 Oct 2022 20:33:46 +0200
Received: (from jakub@localhost)
	by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 29EIXJhZ2636750;
	Fri, 14 Oct 2022 20:33:19 +0200
Date: Fri, 14 Oct 2022 20:33:14 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>
Cc: Jason Merrill <jason@redhat.com>, Jan Hubicka <jh@suse.cz>,
        gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654]
Message-ID: <Y0mraujtvgSDjNu4@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <Y0Pd0i4FCMyx6ukZ@tucnak>
 <d7676364-0574-c430-c422-804b82f473a3@redhat.com>
 <Y0SMXH8qOTErysVJ@tucnak>
 <Y0VxcOxwjGbN6rKl@tucnak>
 <479fa663-fb3e-a90d-ae7a-0fdd5acbfa00@redhat.com>
 <Y0e1QH++UvHO7MtJ@tucnak>
 <nycvar.YFH.7.77.849.2210141103070.18337@jbgna.fhfr.qr>
MIME-Version: 1.0
In-Reply-To: <nycvar.YFH.7.77.849.2210141103070.18337@jbgna.fhfr.qr>
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, Oct 14, 2022 at 11:27:07AM +0000, Richard Biener wrote:
> > --- gcc/function.h.jj	2022-10-10 11:57:40.163722972 +0200
> > +++ gcc/function.h	2022-10-12 19:48:28.887554771 +0200
> > @@ -438,6 +438,10 @@ struct GTY(()) function {
> >  
> >    /* Set if there are any OMP_TARGET regions in the function.  */
> >    unsigned int has_omp_target : 1;
> > +
> > +  /* Set for artificial function created for [[assume (cond)]].
> > +     These should be GIMPLE optimized, but not expanded to RTL.  */
> > +  unsigned int assume_function : 1;
> 
> I wonder if we should have this along force_output in the symtab
> node and let the symtab code decide whether to expand?

I actually first had a flag on the symtab node but as the patch shows,
when it needs to be tested, more frequently I have access to struct function
than to cgraph node.

> > --- gcc/gimplify.cc.jj	2022-10-10 11:57:40.165722944 +0200
> > +++ gcc/gimplify.cc	2022-10-12 19:48:28.890554730 +0200
> > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple
> >  						     fndecl, 0));
> >  	      return GS_OK;
> >  	    }
> > -	  /* FIXME: Otherwise expand it specially.  */
> > +	  /* If not optimizing, ignore the assumptions.  */
> > +	  if (!optimize)
> > +	    {
> > +	      *expr_p = NULL_TREE;
> > +	      return GS_ALL_DONE;
> > +	    }
> > +	  /* Temporarily, until gimple lowering, transform
> > +	     .ASSUME (cond);
> > +	     into:
> > +	     guard = .ASSUME ();
> > +	     if (guard) goto label_true; else label_false;
> > +	     label_true:;
> > +	     {
> > +	       guard = cond;
> > +	     }
> > +	     label_false:;
> > +	     .ASSUME (guard);
> > +	     such that gimple lowering can outline the condition into
> > +	     a separate function easily.  */
> 
> So the idea to use lambdas and/or nested functions (for OMP)
> didn't work out or is more complicated?

Yes, that didn't work out.  Both lambda creation and nested function
handling produce big structures with everything while for the assumptions
it is better to have separate scalars if possible, lambda creation has
various language imposed restrictions, diagnostics etc. and isn't
available in C and I think the outlining in the patch is pretty simple and
short.

> I wonder if, instead of using the above intermediate form we
> can have a new structued GIMPLE code with sub-statements
> 
>  .ASSUME
>    {
>      condition;
>    }

That is what I wrote in the patch description as alternative:
"with the condition wrapped into a GIMPLE_BIND (I admit the above isn't                                                                                                                
extra clean but it is just something to hold it from gimplifier until                                                                                                                 
gimple low pass; it reassembles if (condition_never_true) { cond; };                                                                                                                  
an alternative would be introduce GOMP_ASSUME statement that would have                                                                                                               
the guard var as operand and the GIMPLE_BIND as body, but for the                                                                                                                     
few passes (tree-nested and omp lowering) in between that looked like                                                                                                                 
an overkill to me)"
I can certainly implement that easily.

> ?  There's gimple_statement_omp conveniently available as base and
> IIRC you had the requirement to implement some OMP assume as well?

For OpenMP assumptions we right now implement just the holds clause
of assume and implement it the same way as assume/gnu::assume attributes.

> Of ocurse a different stmt class with body would work as well here,
> maybe we can even use a gbind with a special flag.
> 
> The outlining code can then be ajusted to outline a single BIND?

It already is adjusting a single bind (of course with everything nested in
it).

> It probably won't simplify much that way.

> > +static tree
> > +create_assumption_fn (location_t loc)
> > +{
> > +  tree name = clone_function_name_numbered (current_function_decl, "_assume");
> > +  /* For now, will be changed later.  */
> 
> ?

I need to create the FUNCTION_DECL early and only later on discover
the used automatic vars (for which I need the destination function)
and only once those are discovered I can create the right
function type for the function.

> > +  tree type = TREE_TYPE (current_function_decl);

> > +  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
> > +    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
> > +  DECL_FUNCTION_SPECIFIC_TARGET (decl)
> > +    = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
> > +  DECL_FUNCTION_VERSIONED (decl)
> > +    = DECL_FUNCTION_VERSIONED (current_function_decl);
> 
> what does it mean to copy DECL_FUNCTION_VERSIONED here?

This was a copy and paste from elsewhere (I think OpenMP code).
I guess I can nuke it and even better add some testcase coverage
for various nasty things like assume in multi-versioned functions.
> > +  DECL_ARGUMENTS (lad.id.dst_fn) = parms;
> > +  TREE_TYPE (lad.id.dst_fn) = build_function_type (boolean_type_node, parmt);
> 
> ah, here's the type.  Maybe use error_mark_node as transitional type?

Will see if that works.
> 
> > +  cgraph_node::add_new_function (lad.id.dst_fn, false);
> > +
> > +  for (unsigned i = 0; i < sz; ++i)
> > +    {
> > +      tree v = lad.decls[i];
> > +      if (TREE_CODE (v) == SSA_NAME)
> > +	release_ssa_name (v);
> > +    }
> > +
> > +  stmt = gsi_stmt (*gsi);
> > +  lab = as_a <glabel *> (stmt);
> > +  gcc_assert (gimple_label_label (lab) == l1
> > +	      || gimple_label_label (lab) == l2);
> > +  gsi_remove (gsi, true);
> > +  stmt = gsi_stmt (*gsi);
> > +  gcc_assert (gimple_call_internal_p (stmt, IFN_ASSUME)
> > +	      && gimple_call_num_args (stmt) == 1
> > +	      && gimple_call_arg (stmt, 0) == guard);
> > +  data->cannot_fallthru = false;
> > +  gcall *call = gimple_build_call_internal_vec (IFN_ASSUME, vargs);
> > +  gimple_set_location (call, loc);
> > +  gsi_replace (gsi, call, true);
> > +}
> >  
> >  /* Lower statement GSI.  DATA is passed through the recursion.  We try to
> >     track the fallthruness of statements and get rid of unreachable return
> > @@ -354,6 +738,13 @@ lower_stmt (gimple_stmt_iterator *gsi, s
> >  	tree decl = gimple_call_fndecl (stmt);
> >  	unsigned i;
> >  
> > +	if (gimple_call_internal_p (stmt, IFN_ASSUME)
> > +	    && gimple_call_num_args (stmt) == 0)
> > +	  {
> > +	    lower_assumption (gsi, data);
> > +	    return;
> > +	  }
> > +
> >  	for (i = 0; i < gimple_call_num_args (stmt); i++)
> >  	  {
> >  	    tree arg = gimple_call_arg (stmt, i);
> > --- gcc/tree-ssa-ccp.cc.jj	2022-10-10 11:57:40.203722414 +0200
> > +++ gcc/tree-ssa-ccp.cc	2022-10-12 19:48:28.891554716 +0200
> > @@ -4253,6 +4253,12 @@ pass_fold_builtins::execute (function *f
> >  	    }
> >  
> >  	  callee = gimple_call_fndecl (stmt);
> > +	  if (!callee
> > +	      && gimple_call_internal_p (stmt, IFN_ASSUME))
> > +	    {
> > +	      gsi_remove (&i, true);
> > +	      continue;
> > +	    }
> >  	  if (!callee || !fndecl_built_in_p (callee, BUILT_IN_NORMAL))
> >  	    {
> >  	      gsi_next (&i);
> > --- gcc/lto-streamer-out.cc.jj	2022-10-10 11:57:40.202722428 +0200
> > +++ gcc/lto-streamer-out.cc	2022-10-12 19:48:28.891554716 +0200
> > @@ -2278,6 +2278,7 @@ output_struct_function_base (struct outp
> >    bp_pack_value (&bp, fn->calls_eh_return, 1);
> >    bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
> >    bp_pack_value (&bp, fn->has_simduid_loops, 1);
> > +  bp_pack_value (&bp, fn->assume_function, 1);
> >    bp_pack_value (&bp, fn->va_list_fpr_size, 8);
> >    bp_pack_value (&bp, fn->va_list_gpr_size, 8);
> >    bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8);
> > --- gcc/lto-streamer-in.cc.jj	2022-10-10 11:57:40.201722442 +0200
> > +++ gcc/lto-streamer-in.cc	2022-10-12 19:48:28.891554716 +0200
> > @@ -1318,6 +1318,7 @@ input_struct_function_base (struct funct
> >    fn->calls_eh_return = bp_unpack_value (&bp, 1);
> >    fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
> >    fn->has_simduid_loops = bp_unpack_value (&bp, 1);
> > +  fn->assume_function = bp_unpack_value (&bp, 1);
> >    fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
> >    fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
> >    fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8);
> > --- gcc/cgraphunit.cc.jj	2022-10-10 11:57:40.152723125 +0200
> > +++ gcc/cgraphunit.cc	2022-10-12 19:48:28.892554703 +0200
> > @@ -1882,6 +1882,16 @@ cgraph_node::expand (void)
> >    ggc_collect ();
> >    timevar_pop (TV_REST_OF_COMPILATION);
> >  
> > +  if (DECL_STRUCT_FUNCTION (decl)
> > +      && DECL_STRUCT_FUNCTION (decl)->assume_function)
> > +    {
> > +      /* Assume functions aren't expanded into RTL, on the other side
> > +	 we don't want to release their body.  */
> > +      if (cfun)
> > +	pop_cfun ();
> > +      return;
> > +    }
> > +
> >    /* Make sure that BE didn't give up on compiling.  */
> >    gcc_assert (TREE_ASM_WRITTEN (decl));
> >    if (cfun)
> > --- gcc/cfgexpand.cc.jj	2022-10-10 11:57:40.152723125 +0200
> > +++ gcc/cfgexpand.cc	2022-10-12 19:48:28.893554689 +0200
> > @@ -6597,6 +6597,14 @@ pass_expand::execute (function *fun)
> >    rtx_insn *var_seq, *var_ret_seq;
> >    unsigned i;
> >  
> > +  if (cfun->assume_function)
> > +    {
> > +      /* Assume functions should not be expanded to RTL.  */
> 
> can we avoid getting here in the first place?  I think we don't need
> any of the post-pass_all_optimizations[_g] passes?

I'm afraid not without revamping passes.def, because
to easily cat the pass queue from certain point onwards, we
need all the remaining passes to be wrapped with
PUSH_INSERT_PASSES_WITHIN.
So, if we e.g. wanted to cut out everything from pass_tm_init
onwards, we'd need to wrap:
  NEXT_PASS (pass_tm_init);
  PUSH_INSERT_PASSES_WITHIN (pass_tm_init)
      NEXT_PASS (pass_tm_mark);
      NEXT_PASS (pass_tm_memopt);
      NEXT_PASS (pass_tm_edges);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_simduid_cleanup);
  NEXT_PASS (pass_vtable_verify);
  NEXT_PASS (pass_lower_vaarg);
  NEXT_PASS (pass_lower_vector);
  NEXT_PASS (pass_lower_complex_O0);
  NEXT_PASS (pass_sancov_O0);
  NEXT_PASS (pass_lower_switch_O0);
  NEXT_PASS (pass_asan_O0);
  NEXT_PASS (pass_tsan_O0);
  NEXT_PASS (pass_sanopt);
  NEXT_PASS (pass_cleanup_eh);
  NEXT_PASS (pass_lower_resx);
  NEXT_PASS (pass_nrv);
  NEXT_PASS (pass_gimple_isel);
  NEXT_PASS (pass_harden_conditional_branches);
  NEXT_PASS (pass_harden_compares);
  NEXT_PASS (pass_warn_access, /*early=*/false);
  NEXT_PASS (pass_cleanup_cfg_post_optimizing);
  NEXT_PASS (pass_warn_function_noreturn);

  NEXT_PASS (pass_expand);
in some wrapper pass with a gate (either also including
pass_rest_of_compilation but that would mean undesirable
reindentation of everything there, or just
the above ones and have assume_function punt in the 2
or 1 gates).

What I had in the patch was just skip pass_expand
and pass_rest_of_compilation, perhaps another possibility
to do the former would be to define a gate on pass_expand.
> > --- gcc/tree-vectorizer.cc.jj	2022-10-10 11:57:40.204722400 +0200
> > +++ gcc/tree-vectorizer.cc	2022-10-12 19:48:28.894554675 +0200
> > @@ -1213,6 +1213,10 @@ public:
> >    /* opt_pass methods: */
> >    bool gate (function *fun) final override
> >      {
> > +      /* Vectorization makes range analysis of assume functions
> > +	 harder, not easier.  */
> > +      if (fun->assume_function)
> > +	return false;
> >        return flag_tree_loop_vectorize || fun->has_force_vectorize_loops;
> >      }
> >  
> > @@ -1490,7 +1494,14 @@ public:
> >  
> >    /* opt_pass methods: */
> >    opt_pass * clone () final override { return new pass_slp_vectorize (m_ctxt); }
> > -  bool gate (function *) final override { return flag_tree_slp_vectorize != 0; }
> > +  bool gate (function *fun) final override
> > +  {
> > +    /* Vectorization makes range analysis of assume functions harder,
> > +       not easier.  */
> 
> Can we split out these kind of considerations from the initial patch?

Sure.
> 
> > +    if (fun->assume_function)
> > +      return false;
> > +    return flag_tree_slp_vectorize != 0;
> > +  }
> >    unsigned int execute (function *) final override;
> >  
> >  }; // class pass_slp_vectorize
> > --- gcc/ipa-icf.cc.jj	2022-10-10 11:57:40.201722442 +0200
> > +++ gcc/ipa-icf.cc	2022-10-12 19:48:28.894554675 +0200
> > @@ -1517,6 +1517,9 @@ sem_function::parse (cgraph_node *node,
> >    if (!func || (!node->has_gimple_body_p () && !node->thunk))
> >      return NULL;
> >  
> > +  if (func->assume_function)
> > +    return NULL;
> > +
> 
> Do we want implicit noipa attribute on the assume functions?  Or do
> we need to IPA-CP into them?  I suppose the ranger code can use
> contextual code from the .ASSUME call for things like
> assume (side-effect(), i == 1);

Most of normal IPA optimizations are disabled for them because
they aren't normally called, all we do is take their address
and pass it to .ASSUME.  IPA-ICF was an exception and I had to
disable it because when it triggered it decided to create a thunk
which failed to assemble.
But implicit noipa surely is an option; though of course we want
inlining etc. to happen into those functions.
And eventually some kind of IPA SRA of their arguments but with
different behavior from normal IPA-SRA.

> Reading some of the patch I guessed you wanted to handle nested
> assumes.  So - is
> 
> [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]]
> 
> a thing?

This is not valid, assume can be just on an empty statement.
But with GNU statement expressions it is a thing and I should add it to
testsuite coverage.

void
foo (int a, int b)
{
  [[assume (a == 4 && ({ [[assume (b == 3)]]; b != 2 }))]];
}
is valid.
I think the code should handle it fine (outline the outer and the new
outlined function enters pass queue with all_lowering_passes
and so will see pass_lower_cf again and hopefully work.  Will see
how it goes when I tweak the patch.

	Jakub