From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 92105 invoked by alias); 30 Jul 2015 10:01:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 92094 invoked by uid 89); 30 Jul 2015 10:01:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 30 Jul 2015 10:01:20 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id F0133135D5 for ; Thu, 30 Jul 2015 10:01:18 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-30.ams2.redhat.com [10.36.116.30]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6UA1HGj020133 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 30 Jul 2015 06:01:18 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.9/8.14.9) with ESMTP id t6UA1F1Z023655; Thu, 30 Jul 2015 12:01:16 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.9/8.14.9/Submit) id t6UA1FBk023654; Thu, 30 Jul 2015 12:01:15 +0200 Date: Thu, 30 Jul 2015 10:30:00 -0000 From: Jakub Jelinek To: Aldy Hernandez Cc: gcc-patches Subject: Re: [gomp4.1] fold ordered depend(sink) clauses Message-ID: <20150730100114.GO1780@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <55B96647.20805@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55B96647.20805@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg02550.txt.bz2 On Wed, Jul 29, 2015 at 04:48:23PM -0700, Aldy Hernandez wrote: > @@ -7490,8 +7503,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) > == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt))); > gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) > == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt))); > - gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH > - (OMP_FOR_INIT (for_stmt))); > + gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH > + (OMP_FOR_INIT (for_stmt))); > + gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH > + (OMP_FOR_INIT (for_stmt))); > + gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH > + (OMP_FOR_INIT (for_stmt))); > for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++) > { > t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i); I think the above should be guarded with tree c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED); if (c && OMP_CLAUSE_ORDERED_EXPR (c)) The most common case is that ordered(n) is not present, so we should optimize for that. > @@ -7501,10 +7518,10 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) > gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl)) > || POINTER_TYPE_P (TREE_TYPE (decl))); > if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt)) > - gimplify_omp_ctxp->iter_vars.quick_push > + gimplify_omp_ctxp->loop_iter_var.quick_push > (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i)); > else > - gimplify_omp_ctxp->iter_vars.quick_push (decl); > + gimplify_omp_ctxp->loop_iter_var.quick_push (decl); > > /* Make sure the iteration variable is private. */ > tree c = NULL_TREE; And all these etc. pushes too, simply remember is_doacross in some bool variable. > @@ -8387,6 +8435,228 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p) > return GS_ALL_DONE; > } > Function comment missing. > +static gimple > +gimplify_omp_ordered (tree expr, gimple_seq body) > +{ > + The basic algorithm is to create a sink vector whose first > + element is the GCD of all the first elements, and whose remaining > + elements are the minimum of the subsequent columns. > + > + We ignore dependence vectors whose first element is zero because > + such dependencies are known to be executed by the same thread. > + > + ?? ^^^^ Is this only applicable for collapse(1) loops? If so, how > + ?? to handle collapse(N) loops where N > 1? For collapse(N) N > 1, you can't ignore first iter var with 0 offset, you can only ignore if N first iter vars have 0 offset. Pretty much for the purpose of the algorithm, you compute "first element" for collapse(N) N > 1 and ordered(5-N) for (iv1 = ..; iv1 < ..; iv1 += step1) for (iv2 = ..; iv2 < ..; iv2 += step2) for (iv3 = ..; iv3 < ..; iv3 += step3) for (iv4 = ..; iv4 < ..; iv4 += step4) depend(iv1 + off1, iv2 + off2, iv3 + off3, iv4 + off4) as: off(N) + off(N-1)*step(N) + iv(N-2)*step(N)*step(N-1)... (to be checked the case if the directions differ). So basically, you want to precompute if you'd add some loop counter in between loop(N) and loop(N+1) that would be initially zero and incremented by step(N). The GCD is then performed on this, it is compared to zero, and then finally split again into the several offsets. The "is this invalid iteration" check (is offN divisible by stepN) needs to be done before the merging of course. Except, now that I think, it is not that easy. Because we have another test for "is this invalid iteration", in particularly one performed at runtime on each of the depends. Say if offN is negative and loop(N) is forward loop (thus stepN is positive (note, for backward loop stepN still might be huge positive value for unsigned iterators)), the check would be if (ivN + offN >= initialN), for forward loop with positive offN if (ivN + offN < endvalueN) etc. Now, not sure if by computing the GCD and merging several depend clauses into one we preserve those tests or not. All in all, I think it might be really better to do the depend clause merging during omp lowering in omp-low.c, where you can call extract_omp_for_data and inspect the iteration variables, have the steps computed for you etc. That is the spot where we probably want to emit some GOMP_depend_sink call (and GOMP_depend_source) and prepare arguments for it. > +void > +funk () > +{ > +#pragma omp parallel for ordered(2) > + for (i=0; i < N; i++) > + for (j=0; j < N; ++j) > + { > +/* We should keep the (sink:i,j-2) by virtue of it the i+0. The > + remaining clauses get folded with a GCD of -2 for `i' and a minimum > + of -1 for 'j'. */ I think we shouldn't keep the useless one (sink: i, j-2) (or keep invalid ones). Of course, if we remove all depend clauses, we need to remove the GIMPLE_OMP_ORDERED stmt altogether. Jakub