From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-404289-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 92105 invoked by alias); 30 Jul 2015 10:01:22 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 92094 invoked by uid 89); 30 Jul 2015 10:01:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 30 Jul 2015 10:01:20 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (Postfix) with ESMTPS id F0133135D5	for <gcc-patches@gcc.gnu.org>; Thu, 30 Jul 2015 10:01:18 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-116-30.ams2.redhat.com [10.36.116.30])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6UA1HGj020133	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);	Thu, 30 Jul 2015 06:01:18 -0400
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.14.9/8.14.9) with ESMTP id t6UA1F1Z023655;	Thu, 30 Jul 2015 12:01:16 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.14.9/8.14.9/Submit) id t6UA1FBk023654;	Thu, 30 Jul 2015 12:01:15 +0200
Date: Thu, 30 Jul 2015 10:30:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [gomp4.1] fold ordered depend(sink) clauses
Message-ID: <20150730100114.GO1780@tucnak.redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <55B96647.20805@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <55B96647.20805@redhat.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg02550.txt.bz2

On Wed, Jul 29, 2015 at 04:48:23PM -0700, Aldy Hernandez wrote:
> @@ -7490,8 +7503,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>  	      == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
>    gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
>  	      == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
> -  gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH
> -				       (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH
> +					   (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH
> +				      (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH
> +					     (OMP_FOR_INIT (for_stmt)));
>    for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
>      {
>        t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);

I think the above should be guarded with
  tree c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED);
  if (c && OMP_CLAUSE_ORDERED_EXPR (c))
The most common case is that ordered(n) is not present, so we should
optimize for that.

> @@ -7501,10 +7518,10 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>        gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
>  		  || POINTER_TYPE_P (TREE_TYPE (decl)));
>        if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))
> -	gimplify_omp_ctxp->iter_vars.quick_push
> +	gimplify_omp_ctxp->loop_iter_var.quick_push
>  	  (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i));
>        else
> -	gimplify_omp_ctxp->iter_vars.quick_push (decl);
> +	gimplify_omp_ctxp->loop_iter_var.quick_push (decl);
>  
>        /* Make sure the iteration variable is private.  */
>        tree c = NULL_TREE;

And all these etc. pushes too, simply remember is_doacross in some bool
variable.

> @@ -8387,6 +8435,228 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p)
>    return GS_ALL_DONE;
>  }
>  

Function comment missing.

> +static gimple
> +gimplify_omp_ordered (tree expr, gimple_seq body)
> +{

> +     The basic algorithm is to create a sink vector whose first
> +     element is the GCD of all the first elements, and whose remaining
> +     elements are the minimum of the subsequent columns.
> +
> +     We ignore dependence vectors whose first element is zero because
> +     such dependencies are known to be executed by the same thread.
> +
> +     ?? ^^^^ Is this only applicable for collapse(1) loops?  If so, how
> +     ?? to handle collapse(N) loops where N > 1?

For collapse(N) N > 1, you can't ignore first iter var with 0 offset, you can only
ignore if N first iter vars have 0 offset.

Pretty much for the purpose of the algorithm, you compute "first element"
for collapse(N) N > 1 and ordered(5-N)
	for (iv1 = ..; iv1 < ..; iv1 += step1)
	  for (iv2 = ..; iv2 < ..; iv2 += step2)
	    for (iv3 = ..; iv3 < ..; iv3 += step3)
	      for (iv4 = ..; iv4 < ..; iv4 += step4)
		depend(iv1 + off1, iv2 + off2, iv3 + off3, iv4 + off4)
as: off(N) + off(N-1)*step(N) + iv(N-2)*step(N)*step(N-1)...
(to be checked the case if the directions differ).
So basically, you want to precompute if you'd add some loop counter
in between loop(N) and loop(N+1) that would be initially zero and incremented
by step(N).  The GCD is then performed on this, it is compared to zero,
and then finally split again into the several offsets.
The "is this invalid iteration" check (is offN divisible by stepN)
needs to be done before the merging of course.

Except, now that I think, it is not that easy.  Because we have another
test for "is this invalid iteration", in particularly one performed at
runtime on each of the depends.  Say if offN is negative and loop(N)
is forward loop (thus stepN is positive (note, for backward loop stepN
still might be huge positive value for unsigned iterators)), the check
would be if (ivN + offN >= initialN), for forward loop with positive offN
if (ivN + offN < endvalueN) etc.  Now, not sure if by computing the GCD and
merging several depend clauses into one we preserve those tests or not.

All in all, I think it might be really better to do the depend clause
merging during omp lowering in omp-low.c, where you can call
extract_omp_for_data and inspect the iteration variables, have the steps
computed for you etc.  That is the spot where we probably want to emit some
GOMP_depend_sink call (and GOMP_depend_source) and prepare arguments for it.

> +void
> +funk ()
> +{
> +#pragma omp parallel for ordered(2)
> +  for (i=0; i < N; i++)
> +    for (j=0; j < N; ++j)
> +    {
> +/* We should keep the (sink:i,j-2) by virtue of it the i+0.  The
> +   remaining clauses get folded with a GCD of -2 for `i' and a minimum
> +   of -1 for 'j'.  */

I think we shouldn't keep the useless one (sink: i, j-2) (or keep invalid
ones).  Of course, if we remove all depend clauses, we need to remove the
GIMPLE_OMP_ORDERED stmt altogether.

	Jakub