From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 76D4D3858C50 for ; Mon, 17 Oct 2022 06:55:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76D4D3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 667D12275B; Mon, 17 Oct 2022 06:55:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1665989740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BsLRDAvOXiITm3j98bC8gTp33azkvn1bs/4gID7UF3U=; b=yqARqt9KdK/VjAFjBMJrJpJ/vlvXkIa37pVLLdRdX5ENvAo1qnncfxqlo7/CpIQBQF8JH8 +NH3cHijLDEoujBz8obZgNcBJc2VnGkQT26M90EQS7GVU+1DEIqHq/ro6QAGjhE9U3oA8x pvD3KxtCyBAJBwHn+5xilZ2VO/QTmYw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1665989740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BsLRDAvOXiITm3j98bC8gTp33azkvn1bs/4gID7UF3U=; b=jsSRkOavmCPw1b4Q6o+3yTIMx7fjQB+QqXftW027vR5N5wfvB4tGhzjYwtQjYhqxNVJkge kVb0KZU4JtiQqgDg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5E7B62C141; Mon, 17 Oct 2022 06:55:40 +0000 (UTC) Date: Mon, 17 Oct 2022 06:55:40 +0000 (UTC) From: Richard Biener To: Jakub Jelinek cc: Jason Merrill , Jan Hubicka , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654] In-Reply-To: Message-ID: References: <479fa663-fb3e-a90d-ae7a-0fdd5acbfa00@redhat.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 14 Oct 2022, Jakub Jelinek wrote: > On Fri, Oct 14, 2022 at 11:27:07AM +0000, Richard Biener wrote: > > > --- gcc/function.h.jj 2022-10-10 11:57:40.163722972 +0200 > > > +++ gcc/function.h 2022-10-12 19:48:28.887554771 +0200 > > > @@ -438,6 +438,10 @@ struct GTY(()) function { > > > > > > /* Set if there are any OMP_TARGET regions in the function. */ > > > unsigned int has_omp_target : 1; > > > + > > > + /* Set for artificial function created for [[assume (cond)]]. > > > + These should be GIMPLE optimized, but not expanded to RTL. */ > > > + unsigned int assume_function : 1; > > > > I wonder if we should have this along force_output in the symtab > > node and let the symtab code decide whether to expand? > > I actually first had a flag on the symtab node but as the patch shows, > when it needs to be tested, more frequently I have access to struct function > than to cgraph node. I see. > > > --- gcc/gimplify.cc.jj 2022-10-10 11:57:40.165722944 +0200 > > > +++ gcc/gimplify.cc 2022-10-12 19:48:28.890554730 +0200 > > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple > > > fndecl, 0)); > > > return GS_OK; > > > } > > > - /* FIXME: Otherwise expand it specially. */ > > > + /* If not optimizing, ignore the assumptions. */ > > > + if (!optimize) > > > + { > > > + *expr_p = NULL_TREE; > > > + return GS_ALL_DONE; > > > + } > > > + /* Temporarily, until gimple lowering, transform > > > + .ASSUME (cond); > > > + into: > > > + guard = .ASSUME (); > > > + if (guard) goto label_true; else label_false; > > > + label_true:; > > > + { > > > + guard = cond; > > > + } > > > + label_false:; > > > + .ASSUME (guard); > > > + such that gimple lowering can outline the condition into > > > + a separate function easily. */ > > > > So the idea to use lambdas and/or nested functions (for OMP) > > didn't work out or is more complicated? > > Yes, that didn't work out. Both lambda creation and nested function > handling produce big structures with everything while for the assumptions > it is better to have separate scalars if possible, lambda creation has > various language imposed restrictions, diagnostics etc. and isn't > available in C and I think the outlining in the patch is pretty simple and > short. > > > I wonder if, instead of using the above intermediate form we > > can have a new structued GIMPLE code with sub-statements > > > > .ASSUME > > { > > condition; > > } > > That is what I wrote in the patch description as alternative: > "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't > extra clean but it is just something to hold it from gimplifier until > gimple low pass; it reassembles if (condition_never_true) { cond; }; > an alternative would be introduce GOMP_ASSUME statement that would have > the guard var as operand and the GIMPLE_BIND as body, but for the > few passes (tree-nested and omp lowering) in between that looked like > an overkill to me)" > I can certainly implement that easily. I'd prefer that, it looks possibly less messy. > > ? There's gimple_statement_omp conveniently available as base and > > IIRC you had the requirement to implement some OMP assume as well? > > For OpenMP assumptions we right now implement just the holds clause > of assume and implement it the same way as assume/gnu::assume attributes. > > > Of ocurse a different stmt class with body would work as well here, > > maybe we can even use a gbind with a special flag. > > > > The outlining code can then be ajusted to outline a single BIND? > > It already is adjusting a single bind (of course with everything nested in > it). > > > It probably won't simplify much that way. > > > > +static tree > > > +create_assumption_fn (location_t loc) > > > +{ > > > + tree name = clone_function_name_numbered (current_function_decl, "_assume"); > > > + /* For now, will be changed later. */ > > > > ? > > I need to create the FUNCTION_DECL early and only later on discover > the used automatic vars (for which I need the destination function) > and only once those are discovered I can create the right > function type for the function. > > > > + tree type = TREE_TYPE (current_function_decl); > > > > + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl) > > > + = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl); > > > + DECL_FUNCTION_SPECIFIC_TARGET (decl) > > > + = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); > > > + DECL_FUNCTION_VERSIONED (decl) > > > + = DECL_FUNCTION_VERSIONED (current_function_decl); > > > > what does it mean to copy DECL_FUNCTION_VERSIONED here? > > This was a copy and paste from elsewhere (I think OpenMP code). > I guess I can nuke it and even better add some testcase coverage > for various nasty things like assume in multi-versioned functions. > > > > + DECL_ARGUMENTS (lad.id.dst_fn) = parms; > > > + TREE_TYPE (lad.id.dst_fn) = build_function_type (boolean_type_node, parmt); > > > > ah, here's the type. Maybe use error_mark_node as transitional type? > > Will see if that works. > > > > > + cgraph_node::add_new_function (lad.id.dst_fn, false); > > > + > > > + for (unsigned i = 0; i < sz; ++i) > > > + { > > > + tree v = lad.decls[i]; > > > + if (TREE_CODE (v) == SSA_NAME) > > > + release_ssa_name (v); > > > + } > > > + > > > + stmt = gsi_stmt (*gsi); > > > + lab = as_a (stmt); > > > + gcc_assert (gimple_label_label (lab) == l1 > > > + || gimple_label_label (lab) == l2); > > > + gsi_remove (gsi, true); > > > + stmt = gsi_stmt (*gsi); > > > + gcc_assert (gimple_call_internal_p (stmt, IFN_ASSUME) > > > + && gimple_call_num_args (stmt) == 1 > > > + && gimple_call_arg (stmt, 0) == guard); > > > + data->cannot_fallthru = false; > > > + gcall *call = gimple_build_call_internal_vec (IFN_ASSUME, vargs); > > > + gimple_set_location (call, loc); > > > + gsi_replace (gsi, call, true); > > > +} > > > > > > /* Lower statement GSI. DATA is passed through the recursion. We try to > > > track the fallthruness of statements and get rid of unreachable return > > > @@ -354,6 +738,13 @@ lower_stmt (gimple_stmt_iterator *gsi, s > > > tree decl = gimple_call_fndecl (stmt); > > > unsigned i; > > > > > > + if (gimple_call_internal_p (stmt, IFN_ASSUME) > > > + && gimple_call_num_args (stmt) == 0) > > > + { > > > + lower_assumption (gsi, data); > > > + return; > > > + } > > > + > > > for (i = 0; i < gimple_call_num_args (stmt); i++) > > > { > > > tree arg = gimple_call_arg (stmt, i); > > > --- gcc/tree-ssa-ccp.cc.jj 2022-10-10 11:57:40.203722414 +0200 > > > +++ gcc/tree-ssa-ccp.cc 2022-10-12 19:48:28.891554716 +0200 > > > @@ -4253,6 +4253,12 @@ pass_fold_builtins::execute (function *f > > > } > > > > > > callee = gimple_call_fndecl (stmt); > > > + if (!callee > > > + && gimple_call_internal_p (stmt, IFN_ASSUME)) > > > + { > > > + gsi_remove (&i, true); > > > + continue; > > > + } > > > if (!callee || !fndecl_built_in_p (callee, BUILT_IN_NORMAL)) > > > { > > > gsi_next (&i); > > > --- gcc/lto-streamer-out.cc.jj 2022-10-10 11:57:40.202722428 +0200 > > > +++ gcc/lto-streamer-out.cc 2022-10-12 19:48:28.891554716 +0200 > > > @@ -2278,6 +2278,7 @@ output_struct_function_base (struct outp > > > bp_pack_value (&bp, fn->calls_eh_return, 1); > > > bp_pack_value (&bp, fn->has_force_vectorize_loops, 1); > > > bp_pack_value (&bp, fn->has_simduid_loops, 1); > > > + bp_pack_value (&bp, fn->assume_function, 1); > > > bp_pack_value (&bp, fn->va_list_fpr_size, 8); > > > bp_pack_value (&bp, fn->va_list_gpr_size, 8); > > > bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8); > > > --- gcc/lto-streamer-in.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > > +++ gcc/lto-streamer-in.cc 2022-10-12 19:48:28.891554716 +0200 > > > @@ -1318,6 +1318,7 @@ input_struct_function_base (struct funct > > > fn->calls_eh_return = bp_unpack_value (&bp, 1); > > > fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1); > > > fn->has_simduid_loops = bp_unpack_value (&bp, 1); > > > + fn->assume_function = bp_unpack_value (&bp, 1); > > > fn->va_list_fpr_size = bp_unpack_value (&bp, 8); > > > fn->va_list_gpr_size = bp_unpack_value (&bp, 8); > > > fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8); > > > --- gcc/cgraphunit.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > > +++ gcc/cgraphunit.cc 2022-10-12 19:48:28.892554703 +0200 > > > @@ -1882,6 +1882,16 @@ cgraph_node::expand (void) > > > ggc_collect (); > > > timevar_pop (TV_REST_OF_COMPILATION); > > > > > > + if (DECL_STRUCT_FUNCTION (decl) > > > + && DECL_STRUCT_FUNCTION (decl)->assume_function) > > > + { > > > + /* Assume functions aren't expanded into RTL, on the other side > > > + we don't want to release their body. */ > > > + if (cfun) > > > + pop_cfun (); > > > + return; > > > + } > > > + > > > /* Make sure that BE didn't give up on compiling. */ > > > gcc_assert (TREE_ASM_WRITTEN (decl)); > > > if (cfun) > > > --- gcc/cfgexpand.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > > +++ gcc/cfgexpand.cc 2022-10-12 19:48:28.893554689 +0200 > > > @@ -6597,6 +6597,14 @@ pass_expand::execute (function *fun) > > > rtx_insn *var_seq, *var_ret_seq; > > > unsigned i; > > > > > > + if (cfun->assume_function) > > > + { > > > + /* Assume functions should not be expanded to RTL. */ > > > > can we avoid getting here in the first place? I think we don't need > > any of the post-pass_all_optimizations[_g] passes? > > I'm afraid not without revamping passes.def, because > to easily cat the pass queue from certain point onwards, we > need all the remaining passes to be wrapped with > PUSH_INSERT_PASSES_WITHIN. > So, if we e.g. wanted to cut out everything from pass_tm_init > onwards, we'd need to wrap: > NEXT_PASS (pass_tm_init); > PUSH_INSERT_PASSES_WITHIN (pass_tm_init) > NEXT_PASS (pass_tm_mark); > NEXT_PASS (pass_tm_memopt); > NEXT_PASS (pass_tm_edges); > POP_INSERT_PASSES () > NEXT_PASS (pass_simduid_cleanup); > NEXT_PASS (pass_vtable_verify); > NEXT_PASS (pass_lower_vaarg); > NEXT_PASS (pass_lower_vector); > NEXT_PASS (pass_lower_complex_O0); > NEXT_PASS (pass_sancov_O0); > NEXT_PASS (pass_lower_switch_O0); > NEXT_PASS (pass_asan_O0); > NEXT_PASS (pass_tsan_O0); > NEXT_PASS (pass_sanopt); > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_lower_resx); > NEXT_PASS (pass_nrv); > NEXT_PASS (pass_gimple_isel); > NEXT_PASS (pass_harden_conditional_branches); > NEXT_PASS (pass_harden_compares); > NEXT_PASS (pass_warn_access, /*early=*/false); > NEXT_PASS (pass_cleanup_cfg_post_optimizing); > NEXT_PASS (pass_warn_function_noreturn); > > NEXT_PASS (pass_expand); > in some wrapper pass with a gate (either also including > pass_rest_of_compilation but that would mean undesirable > reindentation of everything there, or just > the above ones and have assume_function punt in the 2 > or 1 gates). Ah, they are all in all_passes :/ Maybe we can add something like TODO_discard_function (or a property) that will not discard the function but stop compiling it? I wonder if all cleanup is properly done for the function - I suppose we want to keep the body around for callers indefinitely. > What I had in the patch was just skip pass_expand > and pass_rest_of_compilation, perhaps another possibility > to do the former would be to define a gate on pass_expand. Or some gate in the pass manager, like in override_gate_status check fun->properties & PROP_suspended and have some pass_suspend_assume add that property for all assume function bodies. In case you like any of the above give it a shot, otherwise what you have isn't too bad, I just wondered if there's a nicer way. > > > --- gcc/tree-vectorizer.cc.jj 2022-10-10 11:57:40.204722400 +0200 > > > +++ gcc/tree-vectorizer.cc 2022-10-12 19:48:28.894554675 +0200 > > > @@ -1213,6 +1213,10 @@ public: > > > /* opt_pass methods: */ > > > bool gate (function *fun) final override > > > { > > > + /* Vectorization makes range analysis of assume functions > > > + harder, not easier. */ > > > + if (fun->assume_function) > > > + return false; > > > return flag_tree_loop_vectorize || fun->has_force_vectorize_loops; > > > } > > > > > > @@ -1490,7 +1494,14 @@ public: > > > > > > /* opt_pass methods: */ > > > opt_pass * clone () final override { return new pass_slp_vectorize (m_ctxt); } > > > - bool gate (function *) final override { return flag_tree_slp_vectorize != 0; } > > > + bool gate (function *fun) final override > > > + { > > > + /* Vectorization makes range analysis of assume functions harder, > > > + not easier. */ > > > > Can we split out these kind of considerations from the initial patch? > > Sure. > > > > > + if (fun->assume_function) > > > + return false; > > > + return flag_tree_slp_vectorize != 0; > > > + } > > > unsigned int execute (function *) final override; > > > > > > }; // class pass_slp_vectorize > > > --- gcc/ipa-icf.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > > +++ gcc/ipa-icf.cc 2022-10-12 19:48:28.894554675 +0200 > > > @@ -1517,6 +1517,9 @@ sem_function::parse (cgraph_node *node, > > > if (!func || (!node->has_gimple_body_p () && !node->thunk)) > > > return NULL; > > > > > > + if (func->assume_function) > > > + return NULL; > > > + > > > > Do we want implicit noipa attribute on the assume functions? Or do > > we need to IPA-CP into them? I suppose the ranger code can use > > contextual code from the .ASSUME call for things like > > assume (side-effect(), i == 1); > > Most of normal IPA optimizations are disabled for them because > they aren't normally called, all we do is take their address > and pass it to .ASSUME. IPA-ICF was an exception and I had to > disable it because when it triggered it decided to create a thunk > which failed to assemble. > But implicit noipa surely is an option; though of course we want > inlining etc. to happen into those functions. > And eventually some kind of IPA SRA of their arguments but with > different behavior from normal IPA-SRA. I suppose for now adding noipa is easiest, we'd still inline into the body of course. > > Reading some of the patch I guessed you wanted to handle nested > > assumes. So - is > > > > [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]] > > > > a thing? > > This is not valid, assume can be just on an empty statement. > But with GNU statement expressions it is a thing and I should add it to > testsuite coverage. > > void > foo (int a, int b) > { > [[assume (a == 4 && ({ [[assume (b == 3)]]; b != 2 }))]]; > } > is valid. > I think the code should handle it fine (outline the outer and the new > outlined function enters pass queue with all_lowering_passes > and so will see pass_lower_cf again and hopefully work. Will see > how it goes when I tweak the patch. Thanks, Richard.