From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id A407738582BF for ; Fri, 9 Sep 2022 14:31:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A407738582BF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662733897; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=rq0wqmvOO2PQt6q3mPOQRDDt99T+ZP63JPXcLbkKTfY=; b=YxaqBTcSeO+UttZ0QjVfxV+mGKSp5DtXvdGmRLOoO9Zo/O4XobW8eIqLDM/anEJ5yBow7B 0VhH17g97WIJafUiVZHUnjI5L4/2cC8l/x1Lm2ls8XJAbuogmzyX2e7BsMhWmOz19bV67i Z2iWIrty7Ifxmb1hP6TZUWa4X4gZGYU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-327-RTM0bLTJMVK0XLrNLhv1Zg-1; Fri, 09 Sep 2022 10:31:35 -0400 X-MC-Unique: RTM0bLTJMVK0XLrNLhv1Zg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7032438164D9; Fri, 9 Sep 2022 14:31:35 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1B019492C3B; Fri, 9 Sep 2022 14:31:34 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 289EVPUx1597145 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 9 Sep 2022 16:31:25 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 289EVNnc1597144; Fri, 9 Sep 2022 16:31:23 +0200 Date: Fri, 9 Sep 2022 16:31:23 +0200 From: Jakub Jelinek To: Andrew Stubbs , Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 3/3] vect: inbranch SIMD clones Message-ID: Reply-To: Jakub Jelinek References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote: > > There has been support for generating "inbranch" SIMD clones for a long time, > but nothing actually uses them (as far as I can see). Thanks for working on this. Note, there is another case where the inbranch SIMD clones could be used and I even thought it is implemented, but apparently it isn't or it doesn't work: #ifndef TYPE #define TYPE int #endif /* A simple function that will be cloned. */ #pragma omp declare simd inbranch TYPE __attribute__((noinline)) foo (TYPE a) { return a + 1; } /* Check that "inbranch" clones are called correctly. */ void __attribute__((noinline)) masked (TYPE * __restrict a, TYPE * __restrict b, int size) { #pragma omp simd for (int i = 0; i < size; i++) b[i] = foo(a[i]); } Here, IMHO we should use the inbranch clone for vectorization (better than not vectorizing it, worse than when we'd have a notinbranch clone) and just use mask of all ones. But sure, it can be done incrementally, just mentioning it for completeness. > This patch add supports for a sub-set of possible cases (those using > mask_mode == VOIDmode). The other cases fail to vectorize, just as before, > so there should be no regressions. > > The sub-set of support should cover all cases needed by amdgcn, at present. > > gcc/ChangeLog: > > * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type > for mask arguments also. > * tree-if-conv.cc: Include cgraph.h. > (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. > (predicate_statements): Pass the predicate to SIMD functions. > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls > to clones with mask arguments, in some cases. > Generate the mask vector arguments. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-simd-clone-16.c: New test. > * gcc.dg/vect/vect-simd-clone-16b.c: New test. > * gcc.dg/vect/vect-simd-clone-16c.c: New test. > * gcc.dg/vect/vect-simd-clone-16d.c: New test. > * gcc.dg/vect/vect-simd-clone-16e.c: New test. > * gcc.dg/vect/vect-simd-clone-16f.c: New test. > * gcc.dg/vect/vect-simd-clone-17.c: New test. > * gcc.dg/vect/vect-simd-clone-17b.c: New test. > * gcc.dg/vect/vect-simd-clone-17c.c: New test. > * gcc.dg/vect/vect-simd-clone-17d.c: New test. > * gcc.dg/vect/vect-simd-clone-17e.c: New test. > * gcc.dg/vect/vect-simd-clone-17f.c: New test. > * gcc.dg/vect/vect-simd-clone-18.c: New test. > * gcc.dg/vect/vect-simd-clone-18b.c: New test. > * gcc.dg/vect/vect-simd-clone-18c.c: New test. > * gcc.dg/vect/vect-simd-clone-18d.c: New test. > * gcc.dg/vect/vect-simd-clone-18e.c: New test. > * gcc.dg/vect/vect-simd-clone-18f.c: New test. > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) > tree fndecl = gimple_call_fndecl (stmt); > if (fndecl) > { > + /* We can vectorize some builtins and functions with SIMD > + clones. */ > int flags = gimple_call_flags (stmt); > + struct cgraph_node *node = cgraph_node::get (fndecl); > if ((flags & ECF_CONST) > && !(flags & ECF_LOOPING_CONST_OR_PURE) > - /* We can only vectorize some builtins at the moment, > - so restrict if-conversion to those. */ > && fndecl_built_in_p (fndecl)) > return true; > + else if (node && node->simd_clones != NULL) > + { > + need_to_predicate = true; I think it would be worth it to check that at least one of the node->simd_clones clones has ->inbranch set, because if all calls are declare simd notinbranch, then predicating the loop will be just a wasted effort. > + return true; > + } > } > return false; > } > @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop) > gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); > update_stmt (stmt); > } > + > + /* Add a predicate parameter to functions that have a SIMD clone. > + This will cause the vectorizer to match the "in branch" clone > + variants because they also have the extra parameter, and serves > + to build the mask vector in a natural way. */ > + gcall *call = dyn_cast (gsi_stmt (gsi)); > + if (call && !gimple_call_internal_p (call)) > + { > + tree orig_fndecl = gimple_call_fndecl (call); > + int orig_nargs = gimple_call_num_args (call); > + auto_vec args; > + for (int i=0; i < orig_nargs; i++) > + args.safe_push (gimple_call_arg (call, i)); > + args.safe_push (cond); > + > + /* Replace the call with a new one that has the extra > + parameter. The FUNCTION_DECL remains unchanged so that > + the vectorizer can find the SIMD clones. This call will > + either be deleted or replaced at that time, so the > + mismatch is short-lived and we can live with it. */ > + gcall *new_call = gimple_build_call_vec (orig_fndecl, args); > + gimple_call_set_lhs (new_call, gimple_call_lhs (call)); > + gsi_replace (&gsi, new_call, true); I think this is way too dangerous to represent conditional calls that way, there is nothing to distinguish those from non-conditional calls. I think I'd prefer (but please see what Richi thinks too) to represent the conditional calls as a call to a new internal function, say IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original call had, plus 2 extra ones first (or 3?), one that would be saved copy of original gimple_call_fn (i.e. usually &fndecl), another one that would be the condition (and dunno about whether we need also something to represent gimple_call_fntype, or whether we simply should punt during ifcvt on conditional calls where gimple_call_fntype is incompatible with the function type of fndecl. Another question is about gimple_call_chain. Punt or copy it over to the ifn and back. Jakub