public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct
       [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
@ 2020-04-20 19:57 ` tschwinge at gcc dot gnu.org
  2020-04-21  6:42 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-04-20 19:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-04-20
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |sandra at gcc dot gnu.org

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
I'm not very familiar with IPA optimizations, I'm not yet clear on what can be
done (optimized) with reasonable effort in GCC, and where/how, so will very
much appreciate your input, Jakub, Richard, others.

Sandra is going to look into this topic (who is not too familiar with OMP
handling).

As a first step, we shall not consider any offloading compilation specifics,
but just the usual host-side compilation.

Looking at the following simple example:

    int main(void)
    {
      int var;
    #pragma acc parallel copyout(var)
    #pragma omp target map(from:var)
      {
        var = 1;
      }
      // 'var' never read.

      return 0;
    }

For reference, with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated,
pragmas ignored), the 'var = 1' assignment disappears in the '030t.ccp1' dump
file, and 'var' itself disappears in the '047t.release_ssa' dump file.

With OMP enabled ('-fopenacc' shown in the following, but '-fopenmp' is very
similar), it's more difficult to optimize 'var': the OMP region (here: just the
'var = 1' assignment, plus any set-up and tear-down code) is moved (outlined)
into a separate function 'main._omp_fn.0', and the address of 'var' is taken,
stored in an internal data structure '.omp_data_arr.1', and is dereferenced in
'main._omp_fn.0' to access the original 'var'.  The outlined function
'main._omp_fn.0' is called via 'GOACC_parallel_keyed'.

One step is to add logic so that in this example, we can optimize 'copyout' to
'create'.  The 'copyout' clauses is encoded in '.omp_data_kinds.3[0]' as value
'514', where with the upper data alignment byte stripped off, '514 & 255 = 2',
which is 'GOMP_MAP_FROM'.  This shall -- at a suitable point in the pass
pipeline -- be optimized to 'GOMP_MAP_ALLOC'.

Another step is to add logic so that the "dead"ness of 'var' after the outlined
function 'main._omp_fn.0' called via 'GOACC_parallel_keyed' gets propagated
into 'main._omp_fn.0', so that the 'var = 1' assignment can be eliminated.

By the way, there already is some special IPA information handling for
'GOACC_parallel_keyed' ('BUILT_IN_GOACC_PARALLEL') in
'gcc/tree-ssa-structalias.c'.  This is, if I remember correctly, to evaluate
aliasing "transparently through" the outlined OMP function; see the PR46032
commit r231076 "Handle BUILT_IN_GOMP_PARALLEL in ipa-pta" and later commits,
for reference.  (And, there is a 'pass_oacc_ipa' to enable '-fipa-pta' for
OpenACC OMP functions.)

Working on eliminating 'var' completely from the internal '.omp_data_arr.1'
etc. data structures shall be a separate step, for later.  If that elimination
happens when we're already in offloading compilation pipeline, this is expected
to require more infrastructure to communicate that information back from the
offloading compiler(s) to the host compiler/runtime.


But first one step back -- I noticed that compiling the following:

    #pragma acc routine
    #pragma omp declare target
    static void __attribute__((noinline)) f(int *var)
    {
      *var = 1;
    }
    #pragma omp end declare target

    int main(void)
    {
      int var;
    #pragma acc parallel copyout(var)
    #pragma omp target map(from:var)
      {
        f(&var);
      }
      // 'var' never read.

      return 0;
    }

... with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated, pragmas ignored)
also does *not* see any optimization of 'var'.  Is that something that needs to
be addressed first, before attempting the OMP case?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct
       [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
  2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org
@ 2020-04-21  6:42 ` rguenth at gcc dot gnu.org
  2020-04-28 16:53 ` burnus at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-21  6:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jamborm at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think while relying on a robust IPA analysis and optimization framework
sounds appealing the problem is _much_ easier solved before OMP/OACC lowering
and I would strongly suggest to tackle the problem from that side if you want a
workable solution in a timely manner.

I realize that has a plethora of its own issues, first of all it seems
the respective lowering is done _very_ early - aka the optimization would
need to be part of omplower? (I see .omp_data_i constructed there)

So what you need is liveness and def/use analysis on high GIMPLE which I
think is straight-forward enough.  You have no SSA form at your hands
(actually SSA names can appear and there'll be use->def links but
no immediate uses).

OK, back to the IPA route and your example in comment#2 - no, we currently
do not have IPA dead store elimination or, if you'd view it in a very special
sense, IPA SRA does not consider instantiating 'var' in 'f' instead of passing
it down by reference (not sure if that's an optimization that would be
generally
useful - though I remember myself passing down dummy and unused out-parameters
to functions in GCC).

Note that once you go the IPA route it becomes critical to do IPA pointer
analysis which frankly GCC does not really have in a form I'd be comfortable
enabling by default.  IPA points-to does compute

__attribute__((noinline))
f (int * var)
{
  # PT = { D.1935 } (nonlocal)
  # ALIGN = 4, MISALIGN = 0
  int * var_2(D) = var;
  <bb 2> [local count: 1073741824]:
  *var_2(D) = 1;
  return;
}

main ()
{
  int var;

  <bb 2> [local count: 1073741824]:
  # CLB = { D.1935 }
  f (&var);
  var ={v} {CLOBBER};
  return 0;

}

so we know that f clobbers var but in f we do not know nothing in callers
use it (so it is considered "nonlocal" aka global memory for following
local optimizations).  Maybe the pending IPA mod/ref analysis can solve
this though I heard it's a TBAA mod/ref analysis and not a classical one.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct
       [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
  2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org
  2020-04-21  6:42 ` rguenth at gcc dot gnu.org
@ 2020-04-28 16:53 ` burnus at gcc dot gnu.org
  2020-05-05 14:16 ` burnus at gcc dot gnu.org
  2021-03-22 11:37 ` tschwinge at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-04-28 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu.org

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #1)
> We want to add some attribute(s) on the structure types used to pass
> information in and out (or in the fields), and have some specialized IPA
> optimization that tries to optimize such cases.

I concur - and had even today a hello-world example for the copy-in example:
 double x = 0.5;
 #pragma omp target
   printf("%f\n", sin(x);
could be compile-time optimized the sin() call to a constant but doesn't.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct
       [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-04-28 16:53 ` burnus at gcc dot gnu.org
@ 2020-05-05 14:16 ` burnus at gcc dot gnu.org
  2021-03-22 11:37 ` tschwinge at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-05-05 14:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591

--- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Somewhat related: In terms of OpenMP (to be refined in the spec), the following
applies (in order to work both with shared + nonshared memory):

  int x = 5;
  #pragma omp target map(from:x)
    x = 7;
  printf("%f\n", x);

This code is invalid – which means that the compiler could use this for
optimization and it probably should warn that "x" is used uninitialized.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct
       [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-05-05 14:16 ` burnus at gcc dot gnu.org
@ 2021-03-22 11:37 ` tschwinge at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2021-03-22 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|sandra at gcc dot gnu.org          |tschwinge at gcc dot gnu.org

--- Comment #6 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> I think while relying on a robust IPA analysis and optimization framework
> sounds appealing the problem is _much_ easier solved before OMP/OACC
> lowering and I would strongly suggest to tackle the problem from that side
> if you want a
> workable solution in a timely manner.

ACK.  WIP:
<http://mid.mail-archive.com/87eege1y2g.fsf@dem-tschwing-1.ger.mentorg.com>.

> I realize that has a plethora of its own issues, first of all it seems
> the respective lowering is done _very_ early - aka the optimization would
> need to be part of omplower? (I see .omp_data_i constructed there)
> 
> So what you need is liveness and def/use analysis on high GIMPLE which I
> think is straight-forward enough.  You have no SSA form at your hands
> (actually SSA names can appear and there'll be use->def links but
> no immediate uses).

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-22 11:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/>
2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org
2020-04-21  6:42 ` rguenth at gcc dot gnu.org
2020-04-28 16:53 ` burnus at gcc dot gnu.org
2020-05-05 14:16 ` burnus at gcc dot gnu.org
2021-03-22 11:37 ` tschwinge at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).