From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id C68A13850205 for ; Wed, 14 Sep 2022 08:09:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C68A13850205 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 8F8EE221B3; Wed, 14 Sep 2022 08:09:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1663142948; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WmIcjxImflmIF4dFCvjeZ1CvJsBihW+cmf9oy/wv/F8=; b=Wa6ZhMM7EIB80RnZwfZPSOuhiLjYNrUWAgEBuR4ZMG5c9Rsi9B4wuI+20mceSgTQa8ika2 3vh/Gel/W7Z7lhjRMjjiec5X+vLGP1/sf4JQJphLKQeOX4jJreEVvVomsRqCn/DTIXjBvQ MAyFe9tadlMJJiqt4d7PXc/pAF4Z/js= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1663142948; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WmIcjxImflmIF4dFCvjeZ1CvJsBihW+cmf9oy/wv/F8=; b=K0NEztXoKqY8TjNoStfse4RaQPob4w+TX9RH0rl/FcvrfSaoe7a1Aty0vwcLD5oh8MRkU6 vAO6OVYlcL4mNrDA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 324422C141; Wed, 14 Sep 2022 08:09:08 +0000 (UTC) Date: Wed, 14 Sep 2022 08:09:08 +0000 (UTC) From: Richard Biener To: Jakub Jelinek cc: Andrew Stubbs , gcc-patches@gcc.gnu.org Subject: Re: [PATCH 3/3] vect: inbranch SIMD clones In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 9 Sep 2022, Jakub Jelinek wrote: > On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote: > > > > There has been support for generating "inbranch" SIMD clones for a long time, > > but nothing actually uses them (as far as I can see). > > Thanks for working on this. > > Note, there is another case where the inbranch SIMD clones could be used > and I even thought it is implemented, but apparently it isn't or it doesn't > work: > #ifndef TYPE > #define TYPE int > #endif > > /* A simple function that will be cloned. */ > #pragma omp declare simd inbranch > TYPE __attribute__((noinline)) > foo (TYPE a) > { > return a + 1; > } > > /* Check that "inbranch" clones are called correctly. */ > > void __attribute__((noinline)) > masked (TYPE * __restrict a, TYPE * __restrict b, int size) > { > #pragma omp simd > for (int i = 0; i < size; i++) > b[i] = foo(a[i]); > } > > Here, IMHO we should use the inbranch clone for vectorization (better > than not vectorizing it, worse than when we'd have a notinbranch clone) > and just use mask of all ones. > But sure, it can be done incrementally, just mentioning it for completeness. > > > This patch add supports for a sub-set of possible cases (those using > > mask_mode == VOIDmode). The other cases fail to vectorize, just as before, > > so there should be no regressions. > > > > The sub-set of support should cover all cases needed by amdgcn, at present. > > > > gcc/ChangeLog: > > > > * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type > > for mask arguments also. > > * tree-if-conv.cc: Include cgraph.h. > > (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. > > (predicate_statements): Pass the predicate to SIMD functions. > > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls > > to clones with mask arguments, in some cases. > > Generate the mask vector arguments. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-simd-clone-16.c: New test. > > * gcc.dg/vect/vect-simd-clone-16b.c: New test. > > * gcc.dg/vect/vect-simd-clone-16c.c: New test. > > * gcc.dg/vect/vect-simd-clone-16d.c: New test. > > * gcc.dg/vect/vect-simd-clone-16e.c: New test. > > * gcc.dg/vect/vect-simd-clone-16f.c: New test. > > * gcc.dg/vect/vect-simd-clone-17.c: New test. > > * gcc.dg/vect/vect-simd-clone-17b.c: New test. > > * gcc.dg/vect/vect-simd-clone-17c.c: New test. > > * gcc.dg/vect/vect-simd-clone-17d.c: New test. > > * gcc.dg/vect/vect-simd-clone-17e.c: New test. > > * gcc.dg/vect/vect-simd-clone-17f.c: New test. > > * gcc.dg/vect/vect-simd-clone-18.c: New test. > > * gcc.dg/vect/vect-simd-clone-18b.c: New test. > > * gcc.dg/vect/vect-simd-clone-18c.c: New test. > > * gcc.dg/vect/vect-simd-clone-18d.c: New test. > > * gcc.dg/vect/vect-simd-clone-18e.c: New test. > > * gcc.dg/vect/vect-simd-clone-18f.c: New test. > > > --- a/gcc/tree-if-conv.cc > > +++ b/gcc/tree-if-conv.cc > > @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) > > tree fndecl = gimple_call_fndecl (stmt); > > if (fndecl) > > { > > + /* We can vectorize some builtins and functions with SIMD > > + clones. */ > > int flags = gimple_call_flags (stmt); > > + struct cgraph_node *node = cgraph_node::get (fndecl); > > if ((flags & ECF_CONST) > > && !(flags & ECF_LOOPING_CONST_OR_PURE) > > - /* We can only vectorize some builtins at the moment, > > - so restrict if-conversion to those. */ > > && fndecl_built_in_p (fndecl)) > > return true; > > + else if (node && node->simd_clones != NULL) > > + { > > + need_to_predicate = true; > > I think it would be worth it to check that at least one of the > node->simd_clones clones has ->inbranch set, because if all calls > are declare simd notinbranch, then predicating the loop will be just a > wasted effort. > > > + return true; > > + } > > } > > return false; > > } > > @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop) > > gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); > > update_stmt (stmt); > > } > > + > > + /* Add a predicate parameter to functions that have a SIMD clone. > > + This will cause the vectorizer to match the "in branch" clone > > + variants because they also have the extra parameter, and serves > > + to build the mask vector in a natural way. */ > > + gcall *call = dyn_cast (gsi_stmt (gsi)); > > + if (call && !gimple_call_internal_p (call)) > > + { > > + tree orig_fndecl = gimple_call_fndecl (call); > > + int orig_nargs = gimple_call_num_args (call); > > + auto_vec args; > > + for (int i=0; i < orig_nargs; i++) > > + args.safe_push (gimple_call_arg (call, i)); > > + args.safe_push (cond); > > + > > + /* Replace the call with a new one that has the extra > > + parameter. The FUNCTION_DECL remains unchanged so that > > + the vectorizer can find the SIMD clones. This call will > > + either be deleted or replaced at that time, so the > > + mismatch is short-lived and we can live with it. */ > > + gcall *new_call = gimple_build_call_vec (orig_fndecl, args); > > + gimple_call_set_lhs (new_call, gimple_call_lhs (call)); > > + gsi_replace (&gsi, new_call, true); > > I think this is way too dangerous to represent conditional calls that way, > there is nothing to distinguish those from non-conditional calls. > I think I'd prefer (but please see what Richi thinks too) to represent > the conditional calls as a call to a new internal function, say > IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original > call had, plus 2 extra ones first (or 3?), one that would be saved copy of > original gimple_call_fn (i.e. usually &fndecl), another one that would be the > condition (and dunno about whether we need also something to represent > gimple_call_fntype, or whether we simply should punt during ifcvt > on conditional calls where gimple_call_fntype is incompatible with > the function type of fndecl. Another question is about > gimple_call_chain. Punt or copy it over to the ifn and back. Are nested functions a thing for OpenMP? But yes, punt on them for now. I agree that a conditional call should be explicit, but the above is only transitional between if-conversion and vectorization, right? Do we support indirect calls here? As Jakub says one possibility is to do .IFN_COND/MASK_CALL (fn-addr, condition/mask, ...) another would be fnptr = cond ? fn : &nop_call; (*fnptr) (...); thus replace the called function with conditional "nop". How to exactly represent that NOP probably isn't too important when it's transitional until vectorization only, even NULL might work there. Of course having the function address in a computation might confuse parts of the vectorizer. OTOH it allows to keep the original call (and the chain and any other info we'd have to preserve otherwise). Richard.