public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> @ 2020-04-20 19:57 ` tschwinge at gcc dot gnu.org 2020-04-21 6:42 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 4 siblings, 0 replies; 5+ messages in thread From: tschwinge at gcc dot gnu.org @ 2020-04-20 19:57 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591 Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2020-04-20 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |sandra at gcc dot gnu.org --- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- I'm not very familiar with IPA optimizations, I'm not yet clear on what can be done (optimized) with reasonable effort in GCC, and where/how, so will very much appreciate your input, Jakub, Richard, others. Sandra is going to look into this topic (who is not too familiar with OMP handling). As a first step, we shall not consider any offloading compilation specifics, but just the usual host-side compilation. Looking at the following simple example: int main(void) { int var; #pragma acc parallel copyout(var) #pragma omp target map(from:var) { var = 1; } // 'var' never read. return 0; } For reference, with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated, pragmas ignored), the 'var = 1' assignment disappears in the '030t.ccp1' dump file, and 'var' itself disappears in the '047t.release_ssa' dump file. With OMP enabled ('-fopenacc' shown in the following, but '-fopenmp' is very similar), it's more difficult to optimize 'var': the OMP region (here: just the 'var = 1' assignment, plus any set-up and tear-down code) is moved (outlined) into a separate function 'main._omp_fn.0', and the address of 'var' is taken, stored in an internal data structure '.omp_data_arr.1', and is dereferenced in 'main._omp_fn.0' to access the original 'var'. The outlined function 'main._omp_fn.0' is called via 'GOACC_parallel_keyed'. One step is to add logic so that in this example, we can optimize 'copyout' to 'create'. The 'copyout' clauses is encoded in '.omp_data_kinds.3[0]' as value '514', where with the upper data alignment byte stripped off, '514 & 255 = 2', which is 'GOMP_MAP_FROM'. This shall -- at a suitable point in the pass pipeline -- be optimized to 'GOMP_MAP_ALLOC'. Another step is to add logic so that the "dead"ness of 'var' after the outlined function 'main._omp_fn.0' called via 'GOACC_parallel_keyed' gets propagated into 'main._omp_fn.0', so that the 'var = 1' assignment can be eliminated. By the way, there already is some special IPA information handling for 'GOACC_parallel_keyed' ('BUILT_IN_GOACC_PARALLEL') in 'gcc/tree-ssa-structalias.c'. This is, if I remember correctly, to evaluate aliasing "transparently through" the outlined OMP function; see the PR46032 commit r231076 "Handle BUILT_IN_GOMP_PARALLEL in ipa-pta" and later commits, for reference. (And, there is a 'pass_oacc_ipa' to enable '-fipa-pta' for OpenACC OMP functions.) Working on eliminating 'var' completely from the internal '.omp_data_arr.1' etc. data structures shall be a separate step, for later. If that elimination happens when we're already in offloading compilation pipeline, this is expected to require more infrastructure to communicate that information back from the offloading compiler(s) to the host compiler/runtime. But first one step back -- I noticed that compiling the following: #pragma acc routine #pragma omp declare target static void __attribute__((noinline)) f(int *var) { *var = 1; } #pragma omp end declare target int main(void) { int var; #pragma acc parallel copyout(var) #pragma omp target map(from:var) { f(&var); } // 'var' never read. return 0; } ... with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated, pragmas ignored) also does *not* see any optimization of 'var'. Is that something that needs to be addressed first, before attempting the OMP case? ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> 2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org @ 2020-04-21 6:42 ` rguenth at gcc dot gnu.org 2020-04-28 16:53 ` burnus at gcc dot gnu.org ` (2 subsequent siblings) 4 siblings, 0 replies; 5+ messages in thread From: rguenth at gcc dot gnu.org @ 2020-04-21 6:42 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org, | |jamborm at gcc dot gnu.org --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- I think while relying on a robust IPA analysis and optimization framework sounds appealing the problem is _much_ easier solved before OMP/OACC lowering and I would strongly suggest to tackle the problem from that side if you want a workable solution in a timely manner. I realize that has a plethora of its own issues, first of all it seems the respective lowering is done _very_ early - aka the optimization would need to be part of omplower? (I see .omp_data_i constructed there) So what you need is liveness and def/use analysis on high GIMPLE which I think is straight-forward enough. You have no SSA form at your hands (actually SSA names can appear and there'll be use->def links but no immediate uses). OK, back to the IPA route and your example in comment#2 - no, we currently do not have IPA dead store elimination or, if you'd view it in a very special sense, IPA SRA does not consider instantiating 'var' in 'f' instead of passing it down by reference (not sure if that's an optimization that would be generally useful - though I remember myself passing down dummy and unused out-parameters to functions in GCC). Note that once you go the IPA route it becomes critical to do IPA pointer analysis which frankly GCC does not really have in a form I'd be comfortable enabling by default. IPA points-to does compute __attribute__((noinline)) f (int * var) { # PT = { D.1935 } (nonlocal) # ALIGN = 4, MISALIGN = 0 int * var_2(D) = var; <bb 2> [local count: 1073741824]: *var_2(D) = 1; return; } main () { int var; <bb 2> [local count: 1073741824]: # CLB = { D.1935 } f (&var); var ={v} {CLOBBER}; return 0; } so we know that f clobbers var but in f we do not know nothing in callers use it (so it is considered "nonlocal" aka global memory for following local optimizations). Maybe the pending IPA mod/ref analysis can solve this though I heard it's a TBAA mod/ref analysis and not a classical one. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> 2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org 2020-04-21 6:42 ` rguenth at gcc dot gnu.org @ 2020-04-28 16:53 ` burnus at gcc dot gnu.org 2020-05-05 14:16 ` burnus at gcc dot gnu.org 2021-03-22 11:37 ` tschwinge at gcc dot gnu.org 4 siblings, 0 replies; 5+ messages in thread From: burnus at gcc dot gnu.org @ 2020-04-28 16:53 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591 Tobias Burnus <burnus at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |burnus at gcc dot gnu.org --- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #1) > We want to add some attribute(s) on the structure types used to pass > information in and out (or in the fields), and have some specialized IPA > optimization that tries to optimize such cases. I concur - and had even today a hello-world example for the copy-in example: double x = 0.5; #pragma omp target printf("%f\n", sin(x); could be compile-time optimized the sin() call to a constant but doesn't. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> ` (2 preceding siblings ...) 2020-04-28 16:53 ` burnus at gcc dot gnu.org @ 2020-05-05 14:16 ` burnus at gcc dot gnu.org 2021-03-22 11:37 ` tschwinge at gcc dot gnu.org 4 siblings, 0 replies; 5+ messages in thread From: burnus at gcc dot gnu.org @ 2020-05-05 14:16 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591 --- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> --- Somewhat related: In terms of OpenMP (to be refined in the spec), the following applies (in order to work both with shared + nonshared memory): int x = 5; #pragma omp target map(from:x) x = 7; printf("%f\n", x); This code is invalid – which means that the compiler could use this for optimization and it probably should warn that "x" is used uninitialized. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> ` (3 preceding siblings ...) 2020-05-05 14:16 ` burnus at gcc dot gnu.org @ 2021-03-22 11:37 ` tschwinge at gcc dot gnu.org 4 siblings, 0 replies; 5+ messages in thread From: tschwinge at gcc dot gnu.org @ 2021-03-22 11:37 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591 Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|sandra at gcc dot gnu.org |tschwinge at gcc dot gnu.org --- Comment #6 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > I think while relying on a robust IPA analysis and optimization framework > sounds appealing the problem is _much_ easier solved before OMP/OACC > lowering and I would strongly suggest to tackle the problem from that side > if you want a > workable solution in a timely manner. ACK. WIP: <http://mid.mail-archive.com/87eege1y2g.fsf@dem-tschwing-1.ger.mentorg.com>. > I realize that has a plethora of its own issues, first of all it seems > the respective lowering is done _very_ early - aka the optimization would > need to be part of omplower? (I see .omp_data_i constructed there) > > So what you need is liveness and def/use analysis on high GIMPLE which I > think is straight-forward enough. You have no SSA form at your hands > (actually SSA names can appear and there'll be use->def links but > no immediate uses). ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-22 11:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-90591-4@http.gcc.gnu.org/bugzilla/> 2020-04-20 19:57 ` [Bug tree-optimization/90591] Avoid unnecessary data transfer out of OMP construct tschwinge at gcc dot gnu.org 2020-04-21 6:42 ` rguenth at gcc dot gnu.org 2020-04-28 16:53 ` burnus at gcc dot gnu.org 2020-05-05 14:16 ` burnus at gcc dot gnu.org 2021-03-22 11:37 ` tschwinge at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).