From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id AF68C3858D38 for ; Fri, 14 Oct 2022 18:34:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AF68C3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665772451; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=7VcFqZrJ5iG3Rj7psH6XUsYRQM04OtloEJ1WOUdUmYw=; b=Kk/cOlKQX2EhL+pMcg/uE1w8r5lsL86BR5V0KA5WQUJGxWBBmlTwswagO5nLhxR0M1gmkr wl6pSiqL1iFm0sSwhOXXKKLXtngeeLPJm1NW/vVfbui1FXOOB2E/W7gKuiozma+2BwvIuB s7siqj6BM+sgb3p1tAyBKvlCycdGpLs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-163-5ujEYWuGMDOP0bc5lNkZ4g-1; Fri, 14 Oct 2022 14:34:10 -0400 X-MC-Unique: 5ujEYWuGMDOP0bc5lNkZ4g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B2A4E833B05; Fri, 14 Oct 2022 18:34:09 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.55]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4E2591401699; Fri, 14 Oct 2022 18:34:09 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 29EIXe8B2636752 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 14 Oct 2022 20:33:46 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 29EIXJhZ2636750; Fri, 14 Oct 2022 20:33:19 +0200 Date: Fri, 14 Oct 2022 20:33:14 +0200 From: Jakub Jelinek To: Richard Biener Cc: Jason Merrill , Jan Hubicka , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654] Message-ID: Reply-To: Jakub Jelinek References: <479fa663-fb3e-a90d-ae7a-0fdd5acbfa00@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Oct 14, 2022 at 11:27:07AM +0000, Richard Biener wrote: > > --- gcc/function.h.jj 2022-10-10 11:57:40.163722972 +0200 > > +++ gcc/function.h 2022-10-12 19:48:28.887554771 +0200 > > @@ -438,6 +438,10 @@ struct GTY(()) function { > > > > /* Set if there are any OMP_TARGET regions in the function. */ > > unsigned int has_omp_target : 1; > > + > > + /* Set for artificial function created for [[assume (cond)]]. > > + These should be GIMPLE optimized, but not expanded to RTL. */ > > + unsigned int assume_function : 1; > > I wonder if we should have this along force_output in the symtab > node and let the symtab code decide whether to expand? I actually first had a flag on the symtab node but as the patch shows, when it needs to be tested, more frequently I have access to struct function than to cgraph node. > > --- gcc/gimplify.cc.jj 2022-10-10 11:57:40.165722944 +0200 > > +++ gcc/gimplify.cc 2022-10-12 19:48:28.890554730 +0200 > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple > > fndecl, 0)); > > return GS_OK; > > } > > - /* FIXME: Otherwise expand it specially. */ > > + /* If not optimizing, ignore the assumptions. */ > > + if (!optimize) > > + { > > + *expr_p = NULL_TREE; > > + return GS_ALL_DONE; > > + } > > + /* Temporarily, until gimple lowering, transform > > + .ASSUME (cond); > > + into: > > + guard = .ASSUME (); > > + if (guard) goto label_true; else label_false; > > + label_true:; > > + { > > + guard = cond; > > + } > > + label_false:; > > + .ASSUME (guard); > > + such that gimple lowering can outline the condition into > > + a separate function easily. */ > > So the idea to use lambdas and/or nested functions (for OMP) > didn't work out or is more complicated? Yes, that didn't work out. Both lambda creation and nested function handling produce big structures with everything while for the assumptions it is better to have separate scalars if possible, lambda creation has various language imposed restrictions, diagnostics etc. and isn't available in C and I think the outlining in the patch is pretty simple and short. > I wonder if, instead of using the above intermediate form we > can have a new structued GIMPLE code with sub-statements > > .ASSUME > { > condition; > } That is what I wrote in the patch description as alternative: "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't extra clean but it is just something to hold it from gimplifier until gimple low pass; it reassembles if (condition_never_true) { cond; }; an alternative would be introduce GOMP_ASSUME statement that would have the guard var as operand and the GIMPLE_BIND as body, but for the few passes (tree-nested and omp lowering) in between that looked like an overkill to me)" I can certainly implement that easily. > ? There's gimple_statement_omp conveniently available as base and > IIRC you had the requirement to implement some OMP assume as well? For OpenMP assumptions we right now implement just the holds clause of assume and implement it the same way as assume/gnu::assume attributes. > Of ocurse a different stmt class with body would work as well here, > maybe we can even use a gbind with a special flag. > > The outlining code can then be ajusted to outline a single BIND? It already is adjusting a single bind (of course with everything nested in it). > It probably won't simplify much that way. > > +static tree > > +create_assumption_fn (location_t loc) > > +{ > > + tree name = clone_function_name_numbered (current_function_decl, "_assume"); > > + /* For now, will be changed later. */ > > ? I need to create the FUNCTION_DECL early and only later on discover the used automatic vars (for which I need the destination function) and only once those are discovered I can create the right function type for the function. > > + tree type = TREE_TYPE (current_function_decl); > > + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl) > > + = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl); > > + DECL_FUNCTION_SPECIFIC_TARGET (decl) > > + = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); > > + DECL_FUNCTION_VERSIONED (decl) > > + = DECL_FUNCTION_VERSIONED (current_function_decl); > > what does it mean to copy DECL_FUNCTION_VERSIONED here? This was a copy and paste from elsewhere (I think OpenMP code). I guess I can nuke it and even better add some testcase coverage for various nasty things like assume in multi-versioned functions. > > + DECL_ARGUMENTS (lad.id.dst_fn) = parms; > > + TREE_TYPE (lad.id.dst_fn) = build_function_type (boolean_type_node, parmt); > > ah, here's the type. Maybe use error_mark_node as transitional type? Will see if that works. > > > + cgraph_node::add_new_function (lad.id.dst_fn, false); > > + > > + for (unsigned i = 0; i < sz; ++i) > > + { > > + tree v = lad.decls[i]; > > + if (TREE_CODE (v) == SSA_NAME) > > + release_ssa_name (v); > > + } > > + > > + stmt = gsi_stmt (*gsi); > > + lab = as_a (stmt); > > + gcc_assert (gimple_label_label (lab) == l1 > > + || gimple_label_label (lab) == l2); > > + gsi_remove (gsi, true); > > + stmt = gsi_stmt (*gsi); > > + gcc_assert (gimple_call_internal_p (stmt, IFN_ASSUME) > > + && gimple_call_num_args (stmt) == 1 > > + && gimple_call_arg (stmt, 0) == guard); > > + data->cannot_fallthru = false; > > + gcall *call = gimple_build_call_internal_vec (IFN_ASSUME, vargs); > > + gimple_set_location (call, loc); > > + gsi_replace (gsi, call, true); > > +} > > > > /* Lower statement GSI. DATA is passed through the recursion. We try to > > track the fallthruness of statements and get rid of unreachable return > > @@ -354,6 +738,13 @@ lower_stmt (gimple_stmt_iterator *gsi, s > > tree decl = gimple_call_fndecl (stmt); > > unsigned i; > > > > + if (gimple_call_internal_p (stmt, IFN_ASSUME) > > + && gimple_call_num_args (stmt) == 0) > > + { > > + lower_assumption (gsi, data); > > + return; > > + } > > + > > for (i = 0; i < gimple_call_num_args (stmt); i++) > > { > > tree arg = gimple_call_arg (stmt, i); > > --- gcc/tree-ssa-ccp.cc.jj 2022-10-10 11:57:40.203722414 +0200 > > +++ gcc/tree-ssa-ccp.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -4253,6 +4253,12 @@ pass_fold_builtins::execute (function *f > > } > > > > callee = gimple_call_fndecl (stmt); > > + if (!callee > > + && gimple_call_internal_p (stmt, IFN_ASSUME)) > > + { > > + gsi_remove (&i, true); > > + continue; > > + } > > if (!callee || !fndecl_built_in_p (callee, BUILT_IN_NORMAL)) > > { > > gsi_next (&i); > > --- gcc/lto-streamer-out.cc.jj 2022-10-10 11:57:40.202722428 +0200 > > +++ gcc/lto-streamer-out.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -2278,6 +2278,7 @@ output_struct_function_base (struct outp > > bp_pack_value (&bp, fn->calls_eh_return, 1); > > bp_pack_value (&bp, fn->has_force_vectorize_loops, 1); > > bp_pack_value (&bp, fn->has_simduid_loops, 1); > > + bp_pack_value (&bp, fn->assume_function, 1); > > bp_pack_value (&bp, fn->va_list_fpr_size, 8); > > bp_pack_value (&bp, fn->va_list_gpr_size, 8); > > bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8); > > --- gcc/lto-streamer-in.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > +++ gcc/lto-streamer-in.cc 2022-10-12 19:48:28.891554716 +0200 > > @@ -1318,6 +1318,7 @@ input_struct_function_base (struct funct > > fn->calls_eh_return = bp_unpack_value (&bp, 1); > > fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1); > > fn->has_simduid_loops = bp_unpack_value (&bp, 1); > > + fn->assume_function = bp_unpack_value (&bp, 1); > > fn->va_list_fpr_size = bp_unpack_value (&bp, 8); > > fn->va_list_gpr_size = bp_unpack_value (&bp, 8); > > fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8); > > --- gcc/cgraphunit.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > +++ gcc/cgraphunit.cc 2022-10-12 19:48:28.892554703 +0200 > > @@ -1882,6 +1882,16 @@ cgraph_node::expand (void) > > ggc_collect (); > > timevar_pop (TV_REST_OF_COMPILATION); > > > > + if (DECL_STRUCT_FUNCTION (decl) > > + && DECL_STRUCT_FUNCTION (decl)->assume_function) > > + { > > + /* Assume functions aren't expanded into RTL, on the other side > > + we don't want to release their body. */ > > + if (cfun) > > + pop_cfun (); > > + return; > > + } > > + > > /* Make sure that BE didn't give up on compiling. */ > > gcc_assert (TREE_ASM_WRITTEN (decl)); > > if (cfun) > > --- gcc/cfgexpand.cc.jj 2022-10-10 11:57:40.152723125 +0200 > > +++ gcc/cfgexpand.cc 2022-10-12 19:48:28.893554689 +0200 > > @@ -6597,6 +6597,14 @@ pass_expand::execute (function *fun) > > rtx_insn *var_seq, *var_ret_seq; > > unsigned i; > > > > + if (cfun->assume_function) > > + { > > + /* Assume functions should not be expanded to RTL. */ > > can we avoid getting here in the first place? I think we don't need > any of the post-pass_all_optimizations[_g] passes? I'm afraid not without revamping passes.def, because to easily cat the pass queue from certain point onwards, we need all the remaining passes to be wrapped with PUSH_INSERT_PASSES_WITHIN. So, if we e.g. wanted to cut out everything from pass_tm_init onwards, we'd need to wrap: NEXT_PASS (pass_tm_init); PUSH_INSERT_PASSES_WITHIN (pass_tm_init) NEXT_PASS (pass_tm_mark); NEXT_PASS (pass_tm_memopt); NEXT_PASS (pass_tm_edges); POP_INSERT_PASSES () NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_vtable_verify); NEXT_PASS (pass_lower_vaarg); NEXT_PASS (pass_lower_vector); NEXT_PASS (pass_lower_complex_O0); NEXT_PASS (pass_sancov_O0); NEXT_PASS (pass_lower_switch_O0); NEXT_PASS (pass_asan_O0); NEXT_PASS (pass_tsan_O0); NEXT_PASS (pass_sanopt); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); NEXT_PASS (pass_gimple_isel); NEXT_PASS (pass_harden_conditional_branches); NEXT_PASS (pass_harden_compares); NEXT_PASS (pass_warn_access, /*early=*/false); NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); NEXT_PASS (pass_expand); in some wrapper pass with a gate (either also including pass_rest_of_compilation but that would mean undesirable reindentation of everything there, or just the above ones and have assume_function punt in the 2 or 1 gates). What I had in the patch was just skip pass_expand and pass_rest_of_compilation, perhaps another possibility to do the former would be to define a gate on pass_expand. > > --- gcc/tree-vectorizer.cc.jj 2022-10-10 11:57:40.204722400 +0200 > > +++ gcc/tree-vectorizer.cc 2022-10-12 19:48:28.894554675 +0200 > > @@ -1213,6 +1213,10 @@ public: > > /* opt_pass methods: */ > > bool gate (function *fun) final override > > { > > + /* Vectorization makes range analysis of assume functions > > + harder, not easier. */ > > + if (fun->assume_function) > > + return false; > > return flag_tree_loop_vectorize || fun->has_force_vectorize_loops; > > } > > > > @@ -1490,7 +1494,14 @@ public: > > > > /* opt_pass methods: */ > > opt_pass * clone () final override { return new pass_slp_vectorize (m_ctxt); } > > - bool gate (function *) final override { return flag_tree_slp_vectorize != 0; } > > + bool gate (function *fun) final override > > + { > > + /* Vectorization makes range analysis of assume functions harder, > > + not easier. */ > > Can we split out these kind of considerations from the initial patch? Sure. > > > + if (fun->assume_function) > > + return false; > > + return flag_tree_slp_vectorize != 0; > > + } > > unsigned int execute (function *) final override; > > > > }; // class pass_slp_vectorize > > --- gcc/ipa-icf.cc.jj 2022-10-10 11:57:40.201722442 +0200 > > +++ gcc/ipa-icf.cc 2022-10-12 19:48:28.894554675 +0200 > > @@ -1517,6 +1517,9 @@ sem_function::parse (cgraph_node *node, > > if (!func || (!node->has_gimple_body_p () && !node->thunk)) > > return NULL; > > > > + if (func->assume_function) > > + return NULL; > > + > > Do we want implicit noipa attribute on the assume functions? Or do > we need to IPA-CP into them? I suppose the ranger code can use > contextual code from the .ASSUME call for things like > assume (side-effect(), i == 1); Most of normal IPA optimizations are disabled for them because they aren't normally called, all we do is take their address and pass it to .ASSUME. IPA-ICF was an exception and I had to disable it because when it triggered it decided to create a thunk which failed to assemble. But implicit noipa surely is an option; though of course we want inlining etc. to happen into those functions. And eventually some kind of IPA SRA of their arguments but with different behavior from normal IPA-SRA. > Reading some of the patch I guessed you wanted to handle nested > assumes. So - is > > [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]] > > a thing? This is not valid, assume can be just on an empty statement. But with GNU statement expressions it is a thing and I should add it to testsuite coverage. void foo (int a, int b) { [[assume (a == 4 && ({ [[assume (b == 3)]]; b != 2 }))]]; } is valid. I think the code should handle it fine (outline the outer and the new outlined function enters pass queue with all_lowering_passes and so will see pass_lower_cf again and hopefully work. Will see how it goes when I tweak the patch. Jakub