public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Tom de Vries <Tom_deVries@mentor.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
	    "gcc-patches@gnu.org" <gcc-patches@gnu.org>,
	    Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
Date: Tue, 17 Nov 2015 15:18:00 -0000	[thread overview]
Message-ID: <alpine.LSU.2.11.1511171618210.4884@t29.fhfr.qr> (raw)
In-Reply-To: <564B3F69.50600@mentor.com>

On Tue, 17 Nov 2015, Tom de Vries wrote:

> On 17/11/15 11:05, Richard Biener wrote:
> > On Tue, Nov 17, 2015 at 12:20 AM, Tom de Vries <Tom_deVries@mentor.com>
> > wrote:
> > > On 16/11/15 13:45, Richard Biener wrote:
> > > > > > 
> > > > > > +             NEXT_PASS (pass_scev_cprop);
> > > > > > > > 
> > > > > > > > What's that for?  It's supposed to help removing loops - I don't
> > > > > > > > expect kernels to vanish.
> > > > > 
> > > > > > 
> > > > > > I'm using pass_scev_cprop for the "final value replacement"
> > > > > > functionality.
> > > > > > Added comment.
> > > 
> > > 
> > > > That functionality is intented to enable loop removal.
> > > 
> > > 
> > > Let me try to explain in a bit more detail.
> > > 
> > > 
> > > I.
> > > 
> > > Consider a parloops testcase test.c, with a use of the final value of the
> > > iteration variable (return i):
> > > ...
> > > unsigned int
> > > foo (int n, int *a)
> > > {
> > >    int i;
> > >    for (i = 0; i < n; ++i)
> > >      a[i] = 1;
> > > 
> > >    return i;
> > > }
> > > ...
> > > 
> > > Say we compile with:
> > > ...
> > > $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
> > > ...
> > > 
> > > We can see here in the parloops dump-file that the loop was parallelized:
> > > ...
> > >    SUCCESS: may be parallelized
> > > ...
> > > 
> > > Now say that we run with -fno-tree-scev-cprop in addition. Instead we find
> > > in the parloops dump-file:
> > > ...
> > > phi is i_1 = PHI <i_10(4)>
> > > arg of phi to exit:   value i_10 used outside loop
> > >    checking if it a part of reduction pattern:
> > >    FAILED: it is not a part of reduction.
> > > ...
> > > 
> > > Auto-parallelization fails in this case because there is a loop exit phi
> > > (the one in bb 6 defining i_1) which is not part of a reduction:
> > > ...
> > >    <bb 4>:
> > >    # i_13 = PHI <0(3), i_10(5)>
> > >    _5 = (long unsigned int) i_13;
> > >    _6 = _5 * 4;
> > >    _8 = a_7(D) + _6;
> > >    *_8 = 1;
> > >    i_10 = i_13 + 1;
> > >    if (n_4(D) > i_10)
> > >      goto <bb 5>;
> > >    else
> > >      goto <bb 6>;
> > > 
> > >    <bb 5>:
> > >    goto <bb 4>;
> > > 
> > >    <bb 6>:
> > >    # i_1 = PHI <i_10(4)>
> > >    _20 = (unsigned int) i_1;
> > > ...
> > > 
> > > With -ftree-scev-cprop, we find in the pass_scev_cprop dump-file:
> > > ...
> > > final value replacement:
> > >    i_1 = PHI <i_10(4)>
> > >    with
> > >    i_1 = n_4(D);
> > > ...
> > > 
> > > And the resulting loop no longer has any loop exit phis, so
> > > auto-parallelization succeeds:
> > > ...
> > >    <bb 4>:
> > >    # i_13 = PHI <0(3), i_10(5)>
> > >    _5 = (long unsigned int) i_13;
> > >    _6 = _5 * 4;
> > >    _8 = a_7(D) + _6;
> > >    *_8 = 1;
> > >    i_10 = i_13 + 1;
> > >    if (n_4(D) > i_10)
> > >      goto <bb 5>;
> > >    else
> > >      goto <bb 6>;
> > > 
> > >    <bb 5>:
> > >    goto <bb 4>;
> > > 
> > >    <bb 6>:
> > >    _20 = (unsigned int) n_4(D);
> > > ...
> > > 
> > > [ I've filed PR68373 - "autopar fails on loop exit phi with argument
> > > defined
> > > outside loop", for a slightly different testcase where despite the final
> > > value replacement autopar still fails. ]
> > > 
> > > 
> > > II.
> > > 
> > > Now, back to oacc kernels.
> > > 
> > > Consider test-case kernels-loop-n.f95 (will add this one to the
> > > test-cases):
> > > ...
> > > module test
> > > contains
> > >    subroutine foo(n)
> > >      implicit none
> > >      integer :: n
> > >      integer, dimension (0:n-1) :: a, b, c
> > >      integer                    :: i, ii
> > >      do i = 0, n - 1
> > >         a(i) = i * 2
> > >      end do
> > > 
> > >      do i = 0, n -1
> > >         b(i) = i * 4
> > >      end do
> > > 
> > >      !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
> > >      do ii = 0, n - 1
> > >         c(ii) = a(ii) + b(ii)
> > >      end do
> > >      !$acc end kernels
> > > 
> > >      do i = 0, n - 1
> > >         if (c(i) .ne. a(i) + b(i)) call abort
> > >      end do
> > > 
> > >    end subroutine foo
> > > end module test
> > > ...
> > > 
> > > The loop at the start of the kernels pass group contains an in-memory
> > > iteration variable, with a store to '*_9 = _38'.
> > > ...
> > >    <bb 4>:
> > >    _13 = *.omp_data_i_4(D).c;
> > >    c.21_14 = *_13;
> > >    _16 = *_9;
> > >    _17 = (integer(kind=8)) _16;
> > >    _18 = *.omp_data_i_4(D).a;
> > >    a.22_19 = *_18;
> > >    _23 = MEM[(integer(kind=4)[0:D.3488] *)a.22_19][_17];
> > >    _24 = *.omp_data_i_4(D).b;
> > >    b.23_25 = *_24;
> > >    _29 = MEM[(integer(kind=4)[0:D.3484] *)b.23_25][_17];
> > >    _30 = _23 + _29;
> > >    MEM[(integer(kind=4)[0:D.3480] *)c.21_14][_17] = _30;
> > >    _38 = _16 + 1;
> > >    *_9 = _38;
> > >    if (_8 == _16)
> > >      goto <bb 3>;
> > >    else
> > >      goto <bb 4>;
> > > ...
> > > 
> > > After pass_lim/pass_copy_prop, we've rewritten that into using a local
> > > iteration variable, but we've generated a read of the final value of the
> > > iteration variable outside the loop, which means auto-parallelization will
> > > fail:
> > > ...
> > >    <bb 5>:
> > >    # D__lsm.29_12 = PHI <D__lsm.29_15(4), _38(7)>
> > >    _17 = (integer(kind=8)) D__lsm.29_12;
> > >    _23 = MEM[(integer(kind=4)[0:D.3488] *)a.22_19][_17];
> > >    _29 = MEM[(integer(kind=4)[0:D.3484] *)b.23_25][_17];
> > >    _30 = _23 + _29;
> > >    MEM[(integer(kind=4)[0:D.3480] *)c.21_14][_17] = _30;
> > >    _38 = D__lsm.29_12 + 1;
> > >    if (_8 == D__lsm.29_12)
> > >      goto <bb 6>;
> > >    else
> > >      goto <bb 7>;
> > > 
> > >    <bb 6>:
> > >    # D__lsm.29_27 = PHI <_38(5)>
> > >    *_9 = D__lsm.29_27;
> > >    goto <bb 3>;
> > 
> > So this store is not actually necessary?
> 
> a.
> In the case of this example, the store is dead.
> 
> There is a corresponding load at the point that we split off the region:
> ...
>   <bb 9>:
>   #pragma omp return
> 
>   <bb 10>:
>   D.3635 = .omp_data_arr.25.ii;
>   ii = *D.3635;
> ...
> 
> This load is later removed, given that ii is unused after the region. But once
> the region is split off,  there's nothing in the context of the store to
> suggest that it's dead.
> 
> And to get rid of the load of ii before the region is split off, we would have
> to implement some sort of liveness analysis on pre-ssa code.
> 
> b.
> There's the case where there is an explicit use of ii after the region, in
> which case the store is not dead.
> 
> c.
> And there's the case were we use a data clause on the region, f.i. 'create
> (ii)' to indicate that the variable is neither copied in nor copied out of the
> region (the default for a scalar in a kernels region is 'copy', meaning
> copy-in-and-out).
> 
> [ This means the value of ii after the region is uninitialized. So even if
> there's a read from ii after the region, we cannot consider it connected to
> the store, given that the value written by the store on the accelerator will
> not be copied back to the host. ]
> 
> In this case, we already don't have any load of ii after the region:
> ...
>   <bb 9>:
>   #pragma omp return
> 
>   <bb 10>:
>   .omp_data_sizes.28 = {CLOBBER};
>   .omp_data_arr.27 = {CLOBBER};
> ...
> 
> We could insert clobbers for the bits of .omp_data_arr at the end of the
> region to indicate that those are not used. That might enable dse to get rid
> of the dead store.
> 
> 
> But, I think we want a generic solution that handles cases a, b and c, which
> means we have to solve the most difficult case, which is b, where the store is
> not dead.
> 
> >  Or just in an inconvenient place?
> 
> I don't think the place of the store is inconvenient, it would be worse to
> have the store in the loop.
> 
> What is inconvenient about the store is the fact that it reads the final value
> of the iteration variable (which inhibits parloops).
> 
> > >    <bb 7>:
> > >    goto <bb 5>;
> > > ...
> > > 
> > > This makes it similar to the parloops example above, and that's why I've
> > > added pass_scev_cprop in the kernels pass group.
> > > 
> > > [ And for some kernels test-cases with constant loop bound, it's not the
> > > final value replacement bit that does the substitution, but the first bit
> > > in
> > > scev_const_prop using resolve_mixers. So that's a related reason to use
> > > pass_scev_cprop. ]
> > 
> > IMHO autopar needs to handle induction itself.
> 
> I'm not sure what you mean. Could you elaborate?  Autopar handles induction
> variables, but it doesn't handle exit phis reading the final value of the
> induction variable. Is that what you want fixed? How?

Yes.  Perform final value replacement.

> > And the above LIM example
> > is none for why you need two LIM passes...
> 
> Indeed. I'm planning a separate reply to explain in more detail the need for
> the two pass_lims.

Thanks.

  reply	other threads:[~2015-11-17 15:18 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-09 15:35 [PATCH series, 16] Use parloops to parallelize oacc kernels regions Tom de Vries
2015-11-09 15:44 ` [PATCH, 1/16] Insert new exit block only when needed in transform_to_exit_first_loop_alt Tom de Vries
2015-11-11 10:50   ` Richard Biener
2015-11-09 15:45 ` [PATCH, 2/16] Make create_parallel_loop return void Tom de Vries
2015-11-11 10:50   ` Richard Biener
2015-11-09 15:51 ` [PATCH, 3/16] Ignore reduction clause on kernels directive Tom de Vries
2015-11-24 12:25   ` [PING][PATCH, " Tom de Vries
2016-01-18 14:24     ` [PING^2][PATCH, " Tom de Vries
2016-01-18 14:26       ` Jakub Jelinek
2015-11-09 16:10 ` [PATCH, 4/16] Implement -foffload-alias Tom de Vries
2015-11-11 10:53   ` Richard Biener
2015-11-11 11:01     ` Jakub Jelinek
2015-11-12 16:04       ` Tom de Vries
2015-11-13  8:46         ` Richard Biener
2015-11-13 11:03           ` Tom de Vries
2015-11-13 11:30             ` Richard Biener
2015-11-13 11:39               ` Jakub Jelinek
2015-11-21 12:24                 ` Tom de Vries
2015-11-23 11:46                   ` Richard Biener
2015-11-27 11:44                     ` Tom de Vries
2015-11-27 12:14                       ` Tom de Vries
2015-12-02  9:59                         ` Jakub Jelinek
2016-03-14 13:16                           ` Tom de Vries
2016-03-14 23:18                             ` Tom de Vries
2015-12-02  9:46                       ` Jakub Jelinek
2015-12-02 13:11                         ` Tom de Vries
2015-12-11 12:45                 ` Tom de Vries
2015-12-11 13:00                   ` Richard Biener
2015-12-13 16:38                     ` Tom de Vries
2015-12-14 13:26                       ` Richard Biener
2015-12-14 15:44                         ` Tom de Vries
2015-12-16 13:16                           ` Richard Biener
2015-12-16 14:43                             ` Tom de Vries
2015-12-17 12:03                               ` [gomp4] " Thomas Schwinge
2015-12-03 11:53       ` Tom de Vries
2015-11-09 16:31 ` [PATCH, 5/16] Add in_oacc_kernels_region in struct loop Tom de Vries
2015-11-11 10:57   ` Richard Biener
2015-11-16 11:39     ` Tom de Vries
2015-11-16 11:39     ` Tom de Vries
2015-11-16 12:41       ` Richard Biener
2015-11-09 17:39 ` [PATCH, 6/16] Add pass_oacc_kernels Tom de Vries
2015-11-11 10:59   ` Richard Biener
2015-11-19 13:51     ` Tom de Vries
2015-11-24 12:17       ` Tom de Vries
2015-11-25 10:42         ` Richard Biener
2016-02-05 12:06   ` Use plain -fopenacc to enable OpenACC kernels processing (was: [PATCH, 6/16] Add pass_oacc_kernels) Thomas Schwinge
2016-02-10 14:40     ` Use plain -fopenacc to enable OpenACC kernels processing Thomas Schwinge
2016-02-15 16:54       ` Tom de Vries
2016-02-23 15:19         ` Thomas Schwinge
2015-11-09 18:14 ` [PATCH, 7/16] Add pass_dominator_oacc_kernels Tom de Vries
2015-11-11 11:05   ` Richard Biener
2015-11-16 12:04     ` Tom de Vries
2015-11-09 18:34 ` [PATCH, 8/16] Add pass_ch_oacc_kernels Tom de Vries
2015-11-11 20:29   ` Tom de Vries
2015-11-30 12:12     ` [gomp4] Use pass_ch instead of pass_ch_oacc_kernels (was: [PATCH, 8/16] Add pass_ch_oacc_kernels) Thomas Schwinge
2015-11-09 19:53 ` [PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels Tom de Vries
2015-11-16 11:59   ` Tom de Vries
2015-11-24 12:27     ` Tom de Vries
2015-12-13 16:58       ` [PIING][PATCH, " Tom de Vries
2015-12-14 15:23         ` Richard Biener
2016-01-16 22:41           ` [Committed] Move pass_expand_omp_ssa out of pass_parallelize_loops Tom de Vries
2016-01-18 12:59           ` [Committed] Allow pass_parallelize_loops to be run outside the loop pipeline Tom de Vries
2016-01-18 13:07           ` [committed] Add oacc_kernels_p argument to pass_parallelize_loops Tom de Vries
2016-01-18 13:30             ` [committed] Add pass_parallelize_loops to pass_oacc_kernels Tom de Vries
2016-01-20  8:54             ` [committed] Add oacc_kernels_p argument to pass_parallelize_loops Thomas Schwinge
2016-01-20 10:31               ` Tom de Vries
2015-11-09 19:59 ` [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def Tom de Vries
2015-11-11 11:03   ` Richard Biener
2015-11-16 11:55     ` Tom de Vries
2015-11-16 12:45       ` Richard Biener
2015-11-16 23:21         ` Tom de Vries
2015-11-17 10:05           ` Richard Biener
2015-11-17 14:54             ` Tom de Vries
2015-11-17 15:18               ` Richard Biener [this message]
2015-11-17 15:39                 ` Tom de Vries
2015-11-17 22:21                   ` [PATCH, PR68373 ] Call scev_const_prop in pass_parallelize_loops::execute Tom de Vries
2015-11-19  9:36                     ` Tom de Vries
2015-11-20 10:15                       ` Richard Biener
2015-11-18  8:30                   ` [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def Richard Biener
2015-11-18 16:22                     ` Bernhard Reutner-Fischer
2015-11-20 12:53                       ` [committed, trivial] Fix typo and trailing whitespace in dump-file strings in parloops Tom de Vries
2015-11-19  0:35               ` [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def Tom de Vries
2015-11-20 10:28                 ` Richard Biener
2015-11-21  8:42                   ` Tom de Vries
2015-11-23 11:31                     ` Richard Biener
2015-11-23 15:53                       ` Tom de Vries
2015-11-23 16:38                         ` Richard Biener
2015-11-19 10:31         ` Tom de Vries
2015-11-20 10:37           ` Richard Biener
2015-11-20 13:27             ` Tom de Vries
2015-11-20 13:29               ` Richard Biener
2015-11-20 16:34                 ` Tom de Vries
2015-11-23 10:11                   ` Richard Biener
2015-11-24 12:22                     ` Tom de Vries
2015-11-24 13:19                       ` Richard Biener
2015-11-24 14:33                         ` Tom de Vries
2015-11-24 14:36                           ` Richard Biener
2015-11-24 15:05                             ` Tom de Vries
2015-11-25 10:43                               ` Richard Biener
2015-11-25 10:44                       ` Richard Biener
2015-11-30 17:48                         ` [gomp4] " Thomas Schwinge
2015-11-22 23:37             ` [PATCH] Don't reapply loops flags if unnecessary in loop_optimizer_init Tom de Vries
2015-11-23 10:33               ` Richard Biener
2015-11-23 11:27                 ` Tom de Vries
2015-11-09 20:02 ` [PATCH, 11/16] Update testcases after adding kernels pass group Tom de Vries
2015-11-11 11:03   ` Richard Biener
2015-11-12 14:32     ` Tom de Vries
2015-11-12 14:43       ` Richard Biener
2015-11-12 15:42         ` David Malcolm
2015-11-13  9:44           ` Richard Biener
2015-11-09 20:06 ` [PATCH, 12/16] Handle acc loop directive Tom de Vries
2015-11-24 12:30   ` [PING][PATCH, " Tom de Vries
2016-01-18 14:27     ` [PING^2][PATCH, " Tom de Vries
2016-01-26 12:38       ` [PING^3][PATCH, " Tom de Vries
2016-01-26 12:50         ` Jakub Jelinek
2016-02-12 11:11           ` Tom de Vries
2016-02-22 10:55             ` Tom de Vries
2016-02-22 10:58               ` Jakub Jelinek
2016-02-29  3:27                 ` Tom de Vries
2016-03-07  8:22                   ` [PING][PATCH, " Tom de Vries
2016-03-14  6:21                     ` [PING^2][PATCH, " Tom de Vries
2015-11-09 20:08 ` [PATCH, 13/16] Add c-c++-common/goacc/kernels-*.c Tom de Vries
2016-01-18 13:33   ` [committed] Add oacc kernels tests in goacc Tom de Vries
2015-11-09 20:09 ` [PATCH, 14/16] Add gfortran.dg/goacc/kernels-*.f95 Tom de Vries
2015-11-09 20:11 ` [PATCH, 15/16] Add libgomp.oacc-c-c++-common/kernels-*.c Tom de Vries
2016-01-18 13:39   ` [comitted] Add oacc kernels test in libgomp Tom de Vries
2016-03-09  9:18   ` [PATCH, 15/16] Add libgomp.oacc-c-c++-common/kernels-*.c Tom de Vries
2016-03-18 12:46     ` Scan for parallelization of the oacc kernels test-cases in gfortran.dg/goacc (was: [PATCH, 15/16] Add libgomp.oacc-c-c++-common/kernels-*.c) Thomas Schwinge
2016-04-05  9:13       ` Scan for parallelization of the oacc kernels test-cases in gfortran.dg/goacc Tom de Vries
2016-04-07 15:26         ` Thomas Schwinge
2015-11-09 20:12 ` [PATCH, 16/16] Add libgomp.oacc-fortran/kernels-*.f95 Tom de Vries
2016-03-09  9:19   ` Tom de Vries
2016-03-16 13:12     ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1511171618210.4884@t29.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tom_deVries@mentor.com \
    --cc=gcc-patches@gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).