public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation
@ 2023-06-01 19:24 tschwinge at gcc dot gnu.org
  2023-06-02  8:00 ` [Bug gcov-profile/110082] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2023-06-01 19:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

            Bug ID: 110082
           Summary: Coverage analysis vs. offloading compilation
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: openacc, openmp, wrong-code
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, jakub at gcc dot gnu.org,
                    marxin at gcc dot gnu.org, rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: amdgcn-amdhsa, nvptx-none

(Via a customer report) we've determined that offloading compilation fails in
combination with '-fprofile-arcs' (as implied by '--coverage', coverage
analysis):

    <built-in>: error: variable ‘__gcov0.main._omp_fn.0’ has been referenced in
offloaded code but hasn’t been marked to be included in the offloaded code
    lto1: fatal error: errors during merging of translation units
    compilation terminated.
    nvptx mkoffload: fatal error:
build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
    [...]

Per my quick look, during early host compilation, via
'gcc/tree-profile.cc:gimple_gen_edge_profiler', via 'pass_ipa_tree_profile', as
visible in '*.069i.profile', '__atomic_fetch_add_8 (&__gcov0.main._omp_fn.0[0],
1, 0);' etc. statements are added.  (Or, different statements in case that the
target cannot "utilize atomic update operations", or 'PROFILE_UPDATE_SINGLE' is
used.)

That's part of the IPA passes, so before offloading compilation split-off. 
(... as per the error, evidently).

As we've got no mechanism implemented currently to move any device-side
coverage data from the device execution back to the host, and integrate it with
the host-side coverage data, we propose to not do coverage analysis for
offloading compilation.

My idea is to abstract the "increment the edge execution count" operations into
some new GIMPLE/IFN code (?), and then later, once the offloading code has been
split off, lower it to the current form (host-side), or no-op (device-side). 
I'd appreciate a quick review if that approach makes sense?

---

We've seen and dealt with a few already, over the past decade, but still more
such similar issues certainly exist for other scenarios where GCC "early"
(before the offloading code split-off) does code transformations that device
compilation may choke on, and thus has to explicitly handle.  A full review of
GCC "early" transformations doesn't seem feasible, so we shall continue
addressing these incrementally, as encountered.

For another example (that I shall soon be working on), see
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101544#c9>.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
@ 2023-06-02  8:00 ` rguenth at gcc dot gnu.org
  2023-06-02 10:35 ` tschwinge at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-02  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sebastian.huber@embedded-br
                   |                            |ains.de

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Sebastian was also working in this area.  Note that when you do it as proposed
the code will appear as having no coverage (the counters will be allocated at
the host side but nothing will increment them).

I suppose the very same issue exists for -fprofile-generate/use then where
this will then cause the offload code to be optimized for size because
it's cold (unless you use -fprofile-partial-training)?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
  2023-06-02  8:00 ` [Bug gcov-profile/110082] " rguenth at gcc dot gnu.org
@ 2023-06-02 10:35 ` tschwinge at gcc dot gnu.org
  2023-06-02 10:41 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2023-06-02 10:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-06-02
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Note that when you do it as
> proposed the code will appear as having no coverage (the counters will be
> allocated at the host side but nothing will increment them).

ACK, our customer does understand this.

I infer correctly that the "do it as proposed" does seem fine to you:

(In reply to me from comment #0)
> My idea is to abstract the "increment the edge execution count" operations
> into some new GIMPLE/IFN code (?), and then later, once the offloading code
> has been split off, lower it to the current form (host-side), or no-op
> (device-side).  I'd appreciate a quick review if that approach makes sense?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
  2023-06-02  8:00 ` [Bug gcov-profile/110082] " rguenth at gcc dot gnu.org
  2023-06-02 10:35 ` tschwinge at gcc dot gnu.org
@ 2023-06-02 10:41 ` rguenther at suse dot de
  2023-06-02 10:45 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2023-06-02 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 2 Jun 2023, tschwinge at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
> 
> Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Ever confirmed|0                           |1
>    Last reconfirmed|                            |2023-06-02
>              Status|UNCONFIRMED                 |NEW
> 
> --- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> > Note that when you do it as
> > proposed the code will appear as having no coverage (the counters will be
> > allocated at the host side but nothing will increment them).
> 
> ACK, our customer does understand this.
> 
> I infer correctly that the "do it as proposed" does seem fine to you:
> 
> (In reply to me from comment #0)
> > My idea is to abstract the "increment the edge execution count" operations
> > into some new GIMPLE/IFN code (?), and then later, once the offloading code
> > has been split off, lower it to the current form (host-side), or no-op
> > (device-side).  I'd appreciate a quick review if that approach makes sense?

Yes, I think this is a reasonable way to do this - I'll note there's
IPA pass analysis that might need adjustments to correctly capture
the semantics of the internal functions.

I suppose you want to apply this generally, not only to offloaded
functions and when offloading is enabled?

I briefly considered whether it's possible/useful to move profile
instrumentation to the main IPA _transform_ stage but I guess
this will unnecessarily complicate the intricate web of things
there.  Profile read for -fprofile-use would then still need to
happen at IPA analysis phase so keeping meta-data between
compile and LTRANS phase in-sync to make that working out nicely
would be another challenge.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-02 10:41 ` rguenther at suse dot de
@ 2023-06-02 10:45 ` jakub at gcc dot gnu.org
  2023-06-02 12:28 ` tschwinge at gcc dot gnu.org
  2023-06-02 14:04 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-02 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #3)
> I suppose you want to apply this generally, not only to offloaded
> functions and when offloading is enabled?

It could be done just for the functions that aren't host only, i.e.
the offloading kernels or declare target functions, what the offloading LTO
streams out.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-02 10:45 ` jakub at gcc dot gnu.org
@ 2023-06-02 12:28 ` tschwinge at gcc dot gnu.org
  2023-06-02 14:04 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2023-06-02 12:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #5 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #4)
> (In reply to rguenther@suse.de from comment #3)
> > I suppose you want to apply this generally, not only to offloaded
> > functions and when offloading is enabled?
> 
> It could be done just for the functions that aren't host only, i.e.
> the offloading kernels or declare target functions, what the offloading LTO
> streams out.

Indeed my idea has been to apply this abstraction generally, without any
conditionals on offloading constructs etc.  That's for reasons of
maintainability: to not add any more diverging code paths, requiring special
testing (now, and for future changes), and to lessen possibility of surprising
behavior re the diverging code paths doing different things.  OK?


Thanks for your input!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gcov-profile/110082] Coverage analysis vs. offloading compilation
  2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-06-02 12:28 ` tschwinge at gcc dot gnu.org
@ 2023-06-02 14:04 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2023-06-02 14:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 2 Jun 2023, tschwinge at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
> 
> --- Comment #5 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
> (In reply to Jakub Jelinek from comment #4)
> > (In reply to rguenther@suse.de from comment #3)
> > > I suppose you want to apply this generally, not only to offloaded
> > > functions and when offloading is enabled?
> > 
> > It could be done just for the functions that aren't host only, i.e.
> > the offloading kernels or declare target functions, what the offloading LTO
> > streams out.
> 
> Indeed my idea has been to apply this abstraction generally, without any
> conditionals on offloading constructs etc.  That's for reasons of
> maintainability: to not add any more diverging code paths, requiring special
> testing (now, and for future changes), and to lessen possibility of surprising
> behavior re the diverging code paths doing different things.  OK?

Yes, I think that's good.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-02 14:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 19:24 [Bug gcov-profile/110082] New: Coverage analysis vs. offloading compilation tschwinge at gcc dot gnu.org
2023-06-02  8:00 ` [Bug gcov-profile/110082] " rguenth at gcc dot gnu.org
2023-06-02 10:35 ` tschwinge at gcc dot gnu.org
2023-06-02 10:41 ` rguenther at suse dot de
2023-06-02 10:45 ` jakub at gcc dot gnu.org
2023-06-02 12:28 ` tschwinge at gcc dot gnu.org
2023-06-02 14:04 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).