Re: [PATCH, 8/8] Do simple omp lowering for no address taken var

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Tom de Vries <Tom_deVries@mentor.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Jakub Jelinek <jakub@redhat.com>,
	    Thomas Schwinge <Thomas_Schwinge@mentor.com>,
	ebotcazou@adacore.com
Subject: Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
Date: Mon, 17 Nov 2014 10:29:00 -0000	[thread overview]
Message-ID: <alpine.LSU.2.11.1411171104160.374@zhemvz.fhfr.qr> (raw)
In-Reply-To: <54678C29.40006@mentor.com>

On Sat, 15 Nov 2014, Tom de Vries wrote:

> On 15-11-14 13:14, Tom de Vries wrote:
> > Hi,
> > 
> > I'm submitting a patch series with initial support for the oacc kernels
> > directive.
> > 
> > The patch series uses pass_parallelize_loops to implement parallelization of
> > loops in the oacc kernels region.
> > 
> > The patch series consists of these 8 patches:
> > ...
> >      1  Expand oacc kernels after pass_build_ealias
> >      2  Add pass_oacc_kernels
> >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >      5  Add pass_loop_im to pass_oacc_kernels
> >      6  Add pass_ccp to pass_oacc_kernels
> >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >      8  Do simple omp lowering for no address taken var
> > ...
> 
> This patch lowers integer variables that do not have their address taken as
> local variable.  We use a copy at region entry and exit to copy the value in
> and out.
> 
> In the context of reduction handling in a kernels region, this allows the
> parloops reduction analysis to recognize the reduction, even after oacc
> lowering has been done in pass_lower_omp.
> 
> In more detail, without this patch, the omp_data_i load and stores are
> generated in place (in this case, in the loop):
> ...
>                 {
>                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>                   {
>                     unsigned intD.9 iD.2146;
> 
>                     iD.2146 = 0;
>                     goto <D.2207>;
>                     <D.2208>:
>                     D.2216 = .omp_data_iD.2201->cD.2203;
>                     c.9D.2176 = *D.2216;
>                     D.2177 = (long unsigned intD.10) iD.2146;
>                     D.2178 = D.2177 * 4;
>                     D.2179 = c.9D.2176 + D.2178;
>                     D.2180 = *D.2179;
>                     D.2217 = .omp_data_iD.2201->sumD.2205;
>                     D.2218 = *D.2217;
>                     D.2217 = .omp_data_iD.2201->sumD.2205;
>                     D.2219 = D.2180 + D.2218;
>                     *D.2217 = D.2219;
>                     iD.2146 = iD.2146 + 1;
>                     <D.2207>:
>                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>                     <D.2209>:
>                   }
> ...
> 
> With this patch, the omp_data_i load and stores for sum are generated at entry
> and exit:
> ...
>                 {
>                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>                   D.2216 = .omp_data_iD.2201->sumD.2205;
>                   sumD.2206 = *D.2216;
>                   {
>                     unsigned intD.9 iD.2146;
> 
>                     iD.2146 = 0;
>                     goto <D.2207>;
>                     <D.2208>:
>                     D.2217 = .omp_data_iD.2201->cD.2203;
>                     c.9D.2176 = *D.2217;
>                     D.2177 = (long unsigned intD.10) iD.2146;
>                     D.2178 = D.2177 * 4;
>                     D.2179 = c.9D.2176 + D.2178;
>                     D.2180 = *D.2179;
>                     sumD.2206 = D.2180 + sumD.2206;
>                     iD.2146 = iD.2146 + 1;
>                     <D.2207>:
>                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>                     <D.2209>:
>                   }
>                   *D.2216 = sumD.2206;
>                   #pragma omp return
>                 }
> ...
> 
> 
> So, without the patch the reduction operation looks like this:
> ...
>     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
> ...
> 
> And with this patch the reduction operation is simply:
> ...
>     sumD.2206 = sumD.2206 + x:
> ...
> 
> OK for trunk?

I presume the reason you are trying to do that here is that otherwise
it happens too late?  What you do is what loop store motion would
do.

Now - I can see how that is easily confused by the static chain
being address-taken.  But I also remember that Eric did some
preparatory work to fix that, for nested functions, that is,
possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.

That said - the gimple_seq_ior_addresses_taken_op callback looks
completely broken.  Consider &a.x which you'd fail to mark as
address-taken.  It looks like the body is not yet in CFG form
when you apply all this?

That said - the functions do not belong to gimple.[ch] at least
as they are not going to work in general.  I also question
why they are necessary - you do

+           if (gimple_code (stmt) == GIMPLE_OACC_KERNELS
+               && !bitmap_bit_p (addresses_taken, DECL_UID (var))
+               && INTEGRAL_TYPE_P (TREE_TYPE (var)))

but why don't you simply check TREE_ADDRESSABLE (var)?  TREE_ADDRESSABLE
is conservative correct here.

And the above won't help for float reductions.  So if, then you
should probably test is_gimple_reg_type (TREE_TYPE (var)) instead
of INTEGRAL_TYPE_P and you definitely should limit the number of
vars treated this way.

Oh - and the optimization should be somewhere more general - after
all it applies to all nested functions (thus move it to tree-nested.c?)
and to autopar loops as well.  Not sure how much code the omp
lowering shares with unnesting - but hopefully enough.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

next prev parent reply	other threads:[~2014-11-17 10:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
2014-11-24 11:29   ` Tom de Vries
2014-11-25 11:30     ` Tom de Vries
2015-04-21 19:40       ` Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias) Thomas Schwinge
2015-04-22  7:36         ` Richard Biener
2015-06-04 16:50           ` Expand oacc kernels after pass_fre Tom de Vries
2015-06-08  7:29             ` Richard Biener
2015-06-19  9:04               ` Tom de Vries
2015-08-05  7:24             ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
2015-08-05  7:29               ` Richard Biener
2015-08-05  8:48                 ` Tom de Vries
2015-08-05  9:30                   ` Richard Biener
2015-08-05 10:49                     ` Tom de Vries
2015-08-05 11:13                       ` Richard Biener
2015-08-11  9:25                         ` [committed] Add todo comment for move_sese_region_to_fn Tom de Vries
2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
2015-08-12 10:51                           ` Richard Biener
2015-09-24  6:36                           ` Thomas Schwinge
2015-09-24  7:21                             ` Tom de Vries
2015-09-24  9:31                               ` Thomas Schwinge
2015-09-30  8:05                                 ` [gomp4,committed] Remove release_dangling_ssa_names Tom de Vries
2015-09-30 10:05                                   ` Thomas Schwinge
2015-09-30 10:25                                     ` Tom de Vries
2015-09-30 10:43                                       ` Thomas Schwinge
2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
2014-11-25 11:31   ` Tom de Vries
2015-04-21 19:46     ` Thomas Schwinge
2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels Tom de Vries
2014-11-25 11:39   ` Tom de Vries
2015-04-21 19:49     ` Thomas Schwinge
2015-04-22  7:39       ` Richard Biener
2015-06-03  9:22         ` Tom de Vries
2015-06-03 11:21           ` Richard Biener
2015-06-04 15:59             ` Tom de Vries
2015-06-03 10:05         ` Tom de Vries
2015-06-03 11:22           ` Richard Biener
2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} " Tom de Vries
2014-11-25 11:42   ` Tom de Vries
2015-04-21 19:52     ` Thomas Schwinge
2015-04-22  7:40       ` Richard Biener
2015-06-02 13:52         ` Tom de Vries
2015-06-02 13:58           ` Richard Biener
2015-06-02 15:40             ` Tom de Vries
2015-06-03 11:26               ` Richard Biener
2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
2014-11-25 12:00   ` Tom de Vries
2015-04-21 19:57     ` [PATCH, 5/8] Add pass_lim " Thomas Schwinge
2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
2014-11-25 12:03   ` Tom de Vries
2015-04-21 20:01     ` [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels Thomas Schwinge
2015-04-22  7:42       ` Richard Biener
2015-06-02 13:04         ` Tom de Vries
2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
2014-11-25 12:15   ` Tom de Vries
2015-04-21 20:09     ` [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels " Thomas Schwinge
2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
2014-11-17 10:29   ` Richard Biener [this message]
2014-11-18  9:13     ` Eric Botcazou
2014-11-18  9:53       ` Richard Biener
2014-11-18 12:20         ` Richard Biener
2014-11-24 11:53     ` Tom de Vries
2014-11-24 11:55       ` Tom de Vries
2014-11-24 12:42         ` Richard Biener
2014-11-24 18:49           ` Tom de Vries
2014-11-24 12:40       ` Richard Biener
2014-11-19 20:34 ` openacc kernels directive -- initial support Tom de Vries
2015-04-21 19:27 ` Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support) Thomas Schwinge
2015-04-21 20:24 ` Handle global loop counters in fortran oacc kernels " Thomas Schwinge
2015-04-21 20:29 ` Handle global loop counters in c/c++ " Thomas Schwinge
2015-04-21 20:33 ` Handle oacc kernels with other directives " Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1411171104160.374@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Thomas_Schwinge@mentor.com \
    --cc=Tom_deVries@mentor.com \
    --cc=ebotcazou@adacore.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).