public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)
@ 2020-09-23 12:53 Tobias Burnus
  2020-09-23 13:02 ` Jakub Jelinek
  2020-09-23 13:09 ` Richard Biener
  0 siblings, 2 replies; 14+ messages in thread
From: Tobias Burnus @ 2020-09-23 12:53 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]

(Pre-remark: the following applies to the host; the offloading
is only involved as otherwise the (.gnu.)offload_{vars,funcs}
global variable/table wouldn't exist.)

Due to partitioning, it can happen that the offloading table
and the functions and variables (which it contains as pointer),
end up in different ltrans.  As the functions and vars end
up as local symbols – the linker will not associate the entries
in the table of one ltrans with the symbol from the other ltrans,
failing with "undefined reference" errors.


This could be fixed properly by either placing all vars/funcs
referenced in the (.gnu.)offload_{vars,funcs} table in the same
ltrans parition — or by ensuring that those symbols are available.
For funcs, the latter works by setting TREE_PUBLIC (cf. PR) – but
I didn't manage to do this for variables. — Hence:

This patch disables lto partitioning if the code has offloading.

OK for mainline? — Or better suggestions how to handle this?

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Attachment #2: lto-wrapper.diff --]
[-- Type: text/x-patch, Size: 629 bytes --]

lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)

gcc/ChangeLog:

	PR lto/97179
	* lto-wrapper.c (run_gcc):

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 82cfa6bd67e..027db8c5200 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1490,6 +1490,12 @@ run_gcc (unsigned argc, char *argv[])
 	case OPT_flto_partition_:
 	  if (strcmp (option->arg, "none") == 0)
 	    no_partition = true;
+	  else if (have_offload)
+	    {
+	      /* PR lto/97179.  */
+	      no_partition = true;
+	      warning (0, "Ignoring %<-flto_partition%> as offloading is used");
+	    }
 	  break;
 
 	case OPT_flto_:

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)
  2020-09-23 12:53 [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179) Tobias Burnus
@ 2020-09-23 13:02 ` Jakub Jelinek
  2020-09-23 13:09 ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2020-09-23 13:02 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Richard Biener

On Wed, Sep 23, 2020 at 02:53:54PM +0200, Tobias Burnus wrote:
> (Pre-remark: the following applies to the host; the offloading
> is only involved as otherwise the (.gnu.)offload_{vars,funcs}
> global variable/table wouldn't exist.)
> 
> Due to partitioning, it can happen that the offloading table
> and the functions and variables (which it contains as pointer),
> end up in different ltrans.  As the functions and vars end
> up as local symbols – the linker will not associate the entries
> in the table of one ltrans with the symbol from the other ltrans,
> failing with "undefined reference" errors.
> 
> 
> This could be fixed properly by either placing all vars/funcs
> referenced in the (.gnu.)offload_{vars,funcs} table in the same
> ltrans parition — or by ensuring that those symbols are available.
> For funcs, the latter works by setting TREE_PUBLIC (cf. PR) – but
> I didn't manage to do this for variables. — Hence:
> 
> This patch disables lto partitioning if the code has offloading.
> 
> OK for mainline? — Or better suggestions how to handle this?

I think we should treat the tables and their content as if the user has
added their own (__attribute__((used))) arrays referencing the same
functions/variables.  Like:
void foo1 (void) {}
static void foo2 (void) {}
int var1;
static int var2;
static void *table[] __attribute__((used)) = { (void *)foo1, (void *)foo2, (void *)&var1, (void *)&var2 };
The partitioning code has to handle this and the tables should follow that.
Whether that means teaching the reference discovery code about these, or
whatever else, dunno.

	Jakub


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)
  2020-09-23 12:53 [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179) Tobias Burnus
  2020-09-23 13:02 ` Jakub Jelinek
@ 2020-09-23 13:09 ` Richard Biener
  2020-09-23 13:10   ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Biener @ 2020-09-23 13:09 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek

On Wed, 23 Sep 2020, Tobias Burnus wrote:

> (Pre-remark: the following applies to the host; the offloading
> is only involved as otherwise the (.gnu.)offload_{vars,funcs}
> global variable/table wouldn't exist.)
> 
> Due to partitioning, it can happen that the offloading table
> and the functions and variables (which it contains as pointer),
> end up in different ltrans.  As the functions and vars end
> up as local symbols ? the linker will not associate the entries
> in the table of one ltrans with the symbol from the other ltrans,
> failing with "undefined reference" errors.
> 
> 
> This could be fixed properly by either placing all vars/funcs
> referenced in the (.gnu.)offload_{vars,funcs} table in the same
> ltrans parition ? or by ensuring that those symbols are available.
> For funcs, the latter works by setting TREE_PUBLIC (cf. PR) ? but
> I didn't manage to do this for variables. ? Hence:
> 
> This patch disables lto partitioning if the code has offloading.
> 
> OK for mainline? ? Or better suggestions how to handle this?

LTRANS usually makes the symbols hidden, not local.  So are you
sure this isn't a target bug (hidden symbols not implemented
but the host compiler obviously having checked that but assuming
the target behaves the same as the host) or a linker bug?

Richard.

> Tobias
> 
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)
  2020-09-23 13:09 ` Richard Biener
@ 2020-09-23 13:10   ` Richard Biener
  2020-09-23 14:23     ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) (was: lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)) Tobias Burnus
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2020-09-23 13:10 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek

On Wed, 23 Sep 2020, Richard Biener wrote:

> On Wed, 23 Sep 2020, Tobias Burnus wrote:
> 
> > (Pre-remark: the following applies to the host; the offloading
> > is only involved as otherwise the (.gnu.)offload_{vars,funcs}
> > global variable/table wouldn't exist.)
> > 
> > Due to partitioning, it can happen that the offloading table
> > and the functions and variables (which it contains as pointer),
> > end up in different ltrans.  As the functions and vars end
> > up as local symbols ? the linker will not associate the entries
> > in the table of one ltrans with the symbol from the other ltrans,
> > failing with "undefined reference" errors.
> > 
> > 
> > This could be fixed properly by either placing all vars/funcs
> > referenced in the (.gnu.)offload_{vars,funcs} table in the same
> > ltrans parition ? or by ensuring that those symbols are available.
> > For funcs, the latter works by setting TREE_PUBLIC (cf. PR) ? but
> > I didn't manage to do this for variables. ? Hence:
> > 
> > This patch disables lto partitioning if the code has offloading.
> > 
> > OK for mainline? ? Or better suggestions how to handle this?
> 
> LTRANS usually makes the symbols hidden, not local.  So are you
> sure this isn't a target bug (hidden symbols not implemented
> but the host compiler obviously having checked that but assuming
> the target behaves the same as the host) or a linker bug?

See lto/lto-partition.c:promote_symbol btw

Richard.

> Richard.
> 
> > Tobias
> > 
> > -----------------
> > Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> > Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> > Walter
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) (was: lto-wrapper.c: Use -flto-partition=none with offloading (PR97179))
  2020-09-23 13:10   ` Richard Biener
@ 2020-09-23 14:23     ` Tobias Burnus
  2020-09-23 15:47       ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) Tobias Burnus
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Burnus @ 2020-09-23 14:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On 9/23/20 3:10 PM, Richard Biener wrote:

> On Wed, 23 Sep 2020, Richard Biener wrote:
>> LTRANS usually makes the symbols hidden, not local.
Could also be – whatever the 'nm' output means.
>> So are you
>> sure this isn't a target bug (hidden symbols not implemented
>> but the host compiler obviously having checked that but assuming
>> the target behaves the same as the host) or a linker bug?

Unlikely, I assume the Linux x86-64 linker is rather well tested.
As written this is the host – just the offloading symbol table is
device specific.

> See lto/lto-partition.c:promote_symbol btw.

Thanks for the pointer; it pointed me to node->externally_visible,
which does the trick :-)

Thus, next try – which a patch I like much better!

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Attachment #2: lto-offload.diff --]
[-- Type: text/x-patch, Size: 1140 bytes --]

LTO: Force externally_visible for offload_vars/funcs (PR97179)

gcc/ChangeLog:

	PR lto/97179
	* lto-cgraph.c (input_offload_tables):

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 93a99f3465b..1fb608705eb 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1793,7 +1793,11 @@ input_offload_tables (bool do_force_output)
 		 may be no refs from the parent function to child_fn in offload
 		 LTO mode.  */
 	      if (do_force_output)
-		cgraph_node::get (fn_decl)->mark_force_output ();
+		{
+		  cgraph_node *node = cgraph_node::get (fn_decl);
+		  node->force_output = 1;
+		  node->externally_visible = 1;
+		}
 	    }
 	  else if (tag == LTO_symtab_variable)
 	    {
@@ -1804,7 +1808,11 @@ input_offload_tables (bool do_force_output)
 	      /* Prevent IPA from removing var_decl as unused, since there
 		 may be no refs to var_decl in offload LTO mode.  */
 	      if (do_force_output)
-		varpool_node::get (var_decl)->force_output = 1;
+		{
+		  varpool_node *node = varpool_node::get (var_decl);
+		  node->force_output = 1;
+		  node->externally_visible = 1;
+		}
 	    }
 	  else
 	    fatal_error (input_location,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-23 14:23     ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) (was: lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)) Tobias Burnus
@ 2020-09-23 15:47       ` Tobias Burnus
  2020-09-23 21:29         ` Tobias Burnus
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Burnus @ 2020-09-23 15:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jakub Jelinek

I think I screwed up with testing – it seem now as if does not work like
this, but I need to recheck.

Tobias

On 9/23/20 4:23 PM, Tobias Burnus wrote:
> On 9/23/20 3:10 PM, Richard Biener wrote:
>
>> On Wed, 23 Sep 2020, Richard Biener wrote:
>>> LTRANS usually makes the symbols hidden, not local.
> Could also be – whatever the 'nm' output means.
>>> So are you
>>> sure this isn't a target bug (hidden symbols not implemented
>>> but the host compiler obviously having checked that but assuming
>>> the target behaves the same as the host) or a linker bug?
>
> Unlikely, I assume the Linux x86-64 linker is rather well tested.
> As written this is the host – just the offloading symbol table is
> device specific.
>
>> See lto/lto-partition.c:promote_symbol btw.
>
> Thanks for the pointer; it pointed me to node->externally_visible,
> which does the trick :-)
>
> Thus, next try – which a patch I like much better!
>
> Tobias
>
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-23 15:47       ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) Tobias Burnus
@ 2020-09-23 21:29         ` Tobias Burnus
  2020-09-24  7:03           ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Burnus @ 2020-09-23 21:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

Actually working patch attached.

As mentioned, just using TREE_PUBLIC in input_offload_tables
works for functions but for variables this gets overridden.

The externally_visible is set to avoid running into the
promote_symbol code (→ visibility hidden) later in the
function.

On 9/23/20 5:47 PM, Tobias Burnus wrote:
> ...
> On 9/23/20 4:23 PM, Tobias Burnus wrote:
>> On 9/23/20 3:10 PM, Richard Biener wrote:
>>
>>> On Wed, 23 Sep 2020, Richard Biener wrote:
>>>> LTRANS usually makes the symbols hidden, not local.
>> Could also be – whatever the 'nm' output means.
>>>> So are you
>>>> sure this isn't a target bug (hidden symbols not implemented
>>>> but the host compiler obviously having checked that but assuming
>>>> the target behaves the same as the host) or a linker bug?
>>
>> Unlikely, I assume the Linux x86-64 linker is rather well tested.
>> As written this is the host – just the offloading symbol table is
>> device specific.
>>
>>> See lto/lto-partition.c:promote_symbol btw.
>>
>> Thanks for the pointer; it pointed me to node->externally_visible,

...

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Attachment #2: lto-offload-v2.diff --]
[-- Type: text/x-patch, Size: 1361 bytes --]

LTO: Force externally_visible+tree_public for offload_vars/funcs (PR97179)

gcc/ChangeLog:

	PR lto/97179
	* lto-cgraph.c (input_offload_tables):

diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 8e0488a..52f4811 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "omp-offload.h"  /* For offload_funcs and offload_vars.  */
 
 vec<ltrans_partition> ltrans_partitions;
 
@@ -1120,6 +1121,7 @@ void
 lto_promote_cross_file_statics (void)
 {
   unsigned i, n_sets;
+  tree decl;
 
   gcc_assert (flag_wpa);
 
@@ -1135,6 +1137,20 @@ lto_promote_cross_file_statics (void)
       part->encoder = compute_ltrans_boundary (part->encoder);
     }
 
+  /* Ensure that all offload table referenced vars/funcs are available. */
+  if (offload_funcs)
+    FOR_EACH_VEC_ELT (*offload_funcs, i, decl)
+      {
+	cgraph_node::get (decl)->externally_visible = 1;
+	TREE_PUBLIC (decl) = 1;
+      }
+  if (offload_vars)
+    FOR_EACH_VEC_ELT (*offload_vars, i, decl)
+      {
+	varpool_node::get (decl)->externally_visible = 1;
+	TREE_PUBLIC (decl) = 1;
+      }
+
   lto_clone_numbers = new hash_map<const char *, unsigned>;
 
   /* Look at boundaries and promote symbols as needed.  */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-23 21:29         ` Tobias Burnus
@ 2020-09-24  7:03           ` Richard Biener
  2020-09-24  7:47             ` Tobias Burnus
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2020-09-24  7:03 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek, Jan Hubicka

On Wed, 23 Sep 2020, Tobias Burnus wrote:

> Actually working patch attached.
> 
> As mentioned, just using TREE_PUBLIC in input_offload_tables
> works for functions but for variables this gets overridden.
> 
> The externally_visible is set to avoid running into the
> promote_symbol code (? visibility hidden) later in the
> function.

Hmm, but offload_vars and offload_funcs do not need to be exported
since they get stored into tables with addresses pointing to them
(and that table is exported).  So I think we don't yet understand
what goes wrong.

Note that ultimatively the desired visibility is determined by
the linker and communicated via the resolution file to the WPA
stage.  I'm not sure whether both host and offload code participate
in the same link and thus if the offload tables are properly
seen as being referenced (for a non-DSO symbols are usually _not_
force-exported) - so, how is the offload table constructed?
I'm not sure we can properly tell the linker of the host object
that a certain symbol will be referenced from a dynamically loaded
DSO - visibility("default") doesn't work.

#include <dlfcn.h>
int global_sym __attribute__((visibility("default")));
int main()
{
  dlopen("test.so", RTLD_NOW);
  return 0;
}

with -flto we elide global_sym, if I compile with
-Wl,-export-dynamic it works fine (even w/o the visibility).
But the question is how to selectively mark a symbol at
the compile-stage so the linker will force-export it ...

The 'used' attribute seems to work but that feels like a band-aid ...

Richard.

> On 9/23/20 5:47 PM, Tobias Burnus wrote:
> > ...
> > On 9/23/20 4:23 PM, Tobias Burnus wrote:
> >> On 9/23/20 3:10 PM, Richard Biener wrote:
> >>
> >>> On Wed, 23 Sep 2020, Richard Biener wrote:
> >>>> LTRANS usually makes the symbols hidden, not local.
> >> Could also be ? whatever the 'nm' output means.
> >>>> So are you
> >>>> sure this isn't a target bug (hidden symbols not implemented
> >>>> but the host compiler obviously having checked that but assuming
> >>>> the target behaves the same as the host) or a linker bug?
> >>
> >> Unlikely, I assume the Linux x86-64 linker is rather well tested.
> >> As written this is the host ? just the offloading symbol table is
> >> device specific.
> >>
> >>> See lto/lto-partition.c:promote_symbol btw.
> >>
> >> Thanks for the pointer; it pointed me to node->externally_visible,
> 
> ...
> 
> Tobias
> 
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  7:03           ` Richard Biener
@ 2020-09-24  7:47             ` Tobias Burnus
  2020-09-24  8:03               ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Burnus @ 2020-09-24  7:47 UTC (permalink / raw)
  To: Richard Biener, Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek, Jan Hubicka

On 9/24/20 9:03 AM, Richard Biener wrote:

> Hmm, but offload_vars and offload_funcs do not need to be exported
> since they get stored into tables with addresses pointing to them
> (and that table is exported).

Granted but the x86-64 linker does not seem to be able to resolve
the symbol if the table is in a.ltrans0.ltrans.o and the variable
or function is in a.ltrans1.ltrans.o

That's both host/x86-64 code; the linker might not see that the
table is used by a dynamic library – but still it should resolve
the links, shouldn't it?

Possibly, the 'externally_visible = 1' in my code is also a
read herring; it also works by using:
    TREE_PUBLIC (decl) = 1;
    gcc_assert (!node->offloadable);
    node->offloadable = 1;
and below
   if (node->offloadable)
     {
       node->offloadable = 0;
       validize_symbol_for_target (node);
       continue;
     }
Namely: PUBLIC + avoid calling promote_symbol.

> Note that ultimatively the desired visibility is determined by
> the linker and communicated via the resolution file to the WPA
> stage.  I'm not sure whether both host and offload code participate
> in the same link and thus if the offload tables are properly
> seen as being referenced

This could be the problem. The device part is linked by the
host/x86-64 linker – but the device's ".o" files are just linked
and not processed by 'ld. (In case of nvptx, they are host
compiled .o files which contain everything as strings with the
nvptx as text – to be passed to the JIT at startup.)

Note that *no* WPA/LTO is done on the device side – there only all
generated files are collected without any inter-file
optimizations. (Sufficient for the code generated by the program,
which is all in one file – but it still would be useful to
inline, e.g., libm functions.)

> (for a non-DSO symbols are usually _not_
> force-exported) - so, how is the offload table constructed?

First, the offload tables exist both on the host and on the
device(s). They have to be identical as otherwise the
association between variables and function is lost.

The symbols are added to offload_vars + offload_funcs.

In lto-cgraph.c's output_offload_tables there is the last chance
to remove now unused nodes — as once the tables are streamed
for device usage, they cannot be changed. Hence, there one
has
    node->force_output = 1;
[Unrelated: this prevents later optimizations, which still
could be done; cf. PR95622]


The table itself is written in omp-offload.c's omp_finish_file.

For the host, the constructor is constructed in
add_decls_addresses_to_decl_constructor, which does:
       CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
       if (is_var)
         CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
and then in omp_finish_file:
       tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
                                     get_identifier (".offload_func_table"),
                                     funcs_decl_type);
       DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
       SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
       DECL_INITIAL (funcs_decl) = ctor_f;
       set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
       varpool_node::finalize_decl (vars_decl);

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  7:47             ` Tobias Burnus
@ 2020-09-24  8:03               ` Richard Biener
  2020-09-24  9:41                 ` Tobias Burnus
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2020-09-24  8:03 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek, Jan Hubicka

On Thu, 24 Sep 2020, Tobias Burnus wrote:

> On 9/24/20 9:03 AM, Richard Biener wrote:
> 
> > Hmm, but offload_vars and offload_funcs do not need to be exported
> > since they get stored into tables with addresses pointing to them
> > (and that table is exported).
> 
> Granted but the x86-64 linker does not seem to be able to resolve
> the symbol if the table is in a.ltrans0.ltrans.o and the variable
> or function is in a.ltrans1.ltrans.o
> 
> That's both host/x86-64 code; the linker might not see that the
> table is used by a dynamic library ? but still it should resolve
> the links, shouldn't it?
> 
> Possibly, the 'externally_visible = 1' in my code is also a
> read herring; it also works by using:
>    TREE_PUBLIC (decl) = 1;
>    gcc_assert (!node->offloadable);
>    node->offloadable = 1;
> and below
>   if (node->offloadable)
>     {
>       node->offloadable = 0;
>       validize_symbol_for_target (node);
>       continue;
>     }
> Namely: PUBLIC + avoid calling promote_symbol.
> 
> > Note that ultimatively the desired visibility is determined by
> > the linker and communicated via the resolution file to the WPA
> > stage.  I'm not sure whether both host and offload code participate
> > in the same link and thus if the offload tables are properly
> > seen as being referenced
> 
> This could be the problem. The device part is linked by the
> host/x86-64 linker ? but the device's ".o" files are just linked
> and not processed by 'ld. (In case of nvptx, they are host
> compiled .o files which contain everything as strings with the
> nvptx as text ? to be passed to the JIT at startup.)
> 
> Note that *no* WPA/LTO is done on the device side ? there only all
> generated files are collected without any inter-file
> optimizations. (Sufficient for the code generated by the program,
> which is all in one file ? but it still would be useful to
> inline, e.g., libm functions.)
> 
> > (for a non-DSO symbols are usually _not_
> > force-exported) - so, how is the offload table constructed?
> 
> First, the offload tables exist both on the host and on the
> device(s). They have to be identical as otherwise the
> association between variables and function is lost.
> 
> The symbols are added to offload_vars + offload_funcs.
> 
> In lto-cgraph.c's output_offload_tables there is the last chance
> to remove now unused nodes ? as once the tables are streamed
> for device usage, they cannot be changed. Hence, there one
> has
>    node->force_output = 1;
> [Unrelated: this prevents later optimizations, which still
> could be done; cf. PR95622]
> 
> 
> The table itself is written in omp-offload.c's omp_finish_file.

But this is called at LTRANS time only, in particular we seem
to stream the offload_funcs/vars array, marking streamed nodes
as force_output but we do not make the offload table visible
to the partitioner.  But force_output should make the
nodes not renamed.  But then output_offload_tables is called at
the very end and we likely do not stream the altered
force_output state.

So - can you try, in prune_offload_funcs, in addition to
setting DECL_PRESERVE_P, mark the cgraph node ->force_output
so this happens early?  I guess the same is needed for
variables (there's no prune_offloar_vars ...).

> For the host, the constructor is constructed in
> add_decls_addresses_to_decl_constructor, which does:
>       CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
>       if (is_var)
>         CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
> and then in omp_finish_file:
>       tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
>                                     get_identifier (".offload_func_table"),
>                                     funcs_decl_type);
>       DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
>       SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
>       DECL_INITIAL (funcs_decl) = ctor_f;
>       set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
>       varpool_node::finalize_decl (vars_decl);
> 
> Tobias
> 
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  8:03               ` Richard Biener
@ 2020-09-24  9:41                 ` Tobias Burnus
  2020-09-24  9:49                   ` Jakub Jelinek
  2020-09-24  9:50                   ` Richard Biener
  0 siblings, 2 replies; 14+ messages in thread
From: Tobias Burnus @ 2020-09-24  9:41 UTC (permalink / raw)
  To: Richard Biener, Tobias Burnus; +Cc: gcc-patches, Jakub Jelinek, Jan Hubicka

On 9/24/20 10:03 AM, Richard Biener wrote:

>> The symbols are added to offload_vars + offload_funcs.
>> In lto-cgraph.c's output_offload_tables there is the last chance
>> to remove now unused nodes ? as once the tables are streamed
>> for device usage, they cannot be changed. Hence, there one
>> has
>>     node->force_output = 1;
>> [Unrelated: this prevents later optimizations, which still
>> could be done; cf. PR95622]
>>
>>
>> The table itself is written in omp-offload.c's omp_finish_file.
> But this is called at LTRANS time only, in particular we seem
> to stream the offload_funcs/vars array, marking streamed nodes
> as force_output but we do not make the offload table visible
> to the partitioner.  But force_output should make the
> nodes not renamed.  But then output_offload_tables is called at
> the very end and we likely do not stream the altered
> force_output state.
>
> So - can you try, in prune_offload_funcs, in addition to
> setting DECL_PRESERVE_P, mark the cgraph node ->force_output
> so this happens early?  I guess the same is needed for
> variables (there's no prune_offloar_vars ...).

As it accesses global variables, I could do just the same
with the variables – but it did not seems to have an effect.

Following Jakub's suggestion, I also added
   __attribute__((used))
to the tree belonging to both tables in omp-offload.c's omp_finish
but that did not help, either.

I think both the 'used' and 'force_output' are red herrings:
after all, the tables and the referrenced funcs/vars are output;
the problem is 'just' that they end up in different ltrans
while not being public. – Thus, some property ia wrong
during building the cgraph or when it is partitioned into ltrans.

Any additional suggestion to try?

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  9:41                 ` Tobias Burnus
@ 2020-09-24  9:49                   ` Jakub Jelinek
  2020-09-24  9:51                     ` Richard Biener
  2020-09-24  9:50                   ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2020-09-24  9:49 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Richard Biener, gcc-patches, Jan Hubicka

On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> Following Jakub's suggestion, I also added
>   __attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.

That is really DECL_PRESERVED_P, the attribute is turned into that, so no
need to have attribute around after setting it.
That is needed (but already done), but clearly not sufficient.
What we need to emulate is the effect of all those decls being referenced
from a single (preserved) initializer, which would need to refer to their
names too.  Except we don't really have such a var and initializer
constructed early enough probably.
Now, for vars with initializers I think there is
record_references_in_initializer to remember those references, so do we need
to emulate that behavior?
Or, see what effects it has on the partitioning, and if it means forcing all
the referenced decls that aren't TREE_PUBLIC into the same partition, do it
for the offloading funcs and vars too?

	Jakub


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  9:41                 ` Tobias Burnus
  2020-09-24  9:49                   ` Jakub Jelinek
@ 2020-09-24  9:50                   ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Biener @ 2020-09-24  9:50 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Richard Biener, Jakub Jelinek, gcc-patches, Jan Hubicka

On Thu, Sep 24, 2020 at 11:41 AM Tobias Burnus <tobias@codesourcery.com> wrote:
>
> On 9/24/20 10:03 AM, Richard Biener wrote:
>
> >> The symbols are added to offload_vars + offload_funcs.
> >> In lto-cgraph.c's output_offload_tables there is the last chance
> >> to remove now unused nodes ? as once the tables are streamed
> >> for device usage, they cannot be changed. Hence, there one
> >> has
> >>     node->force_output = 1;
> >> [Unrelated: this prevents later optimizations, which still
> >> could be done; cf. PR95622]
> >>
> >>
> >> The table itself is written in omp-offload.c's omp_finish_file.
> > But this is called at LTRANS time only, in particular we seem
> > to stream the offload_funcs/vars array, marking streamed nodes
> > as force_output but we do not make the offload table visible
> > to the partitioner.  But force_output should make the
> > nodes not renamed.  But then output_offload_tables is called at
> > the very end and we likely do not stream the altered
> > force_output state.
> >
> > So - can you try, in prune_offload_funcs, in addition to
> > setting DECL_PRESERVE_P, mark the cgraph node ->force_output
> > so this happens early?  I guess the same is needed for
> > variables (there's no prune_offloar_vars ...).
>
> As it accesses global variables, I could do just the same
> with the variables – but it did not seems to have an effect.
>
> Following Jakub's suggestion, I also added
>    __attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.
>
> I think both the 'used' and 'force_output' are red herrings:
> after all, the tables and the referrenced funcs/vars are output;
> the problem is 'just' that they end up in different ltrans
> while not being public. – Thus, some property ia wrong
> during building the cgraph or when it is partitioned into ltrans.
>
> Any additional suggestion to try?

As I said the table itself is only created _after_ partitioning
so LTO doesn't see they are referenced from outside of their
LTRANS unit (in the unit that has the offload table).

I think we need to create the offload table during WPA
instead.

Richard.

> Tobias
>
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)
  2020-09-24  9:49                   ` Jakub Jelinek
@ 2020-09-24  9:51                     ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2020-09-24  9:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Tobias Burnus, gcc-patches, Richard Biener, Jan Hubicka

On Thu, Sep 24, 2020 at 11:50 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> > Following Jakub's suggestion, I also added
> >   __attribute__((used))
> > to the tree belonging to both tables in omp-offload.c's omp_finish
> > but that did not help, either.
>
> That is really DECL_PRESERVED_P, the attribute is turned into that, so no
> need to have attribute around after setting it.
> That is needed (but already done), but clearly not sufficient.
> What we need to emulate is the effect of all those decls being referenced
> from a single (preserved) initializer, which would need to refer to their
> names too.  Except we don't really have such a var and initializer
> constructed early enough probably.
> Now, for vars with initializers I think there is
> record_references_in_initializer to remember those references, so do we need
> to emulate that behavior?
> Or, see what effects it has on the partitioning, and if it means forcing all
> the referenced decls that aren't TREE_PUBLIC into the same partition, do it
> for the offloading funcs and vars too?

Create the offload table at WPA time so we get to see it during partitioning?

>         Jakub
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-24  9:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 12:53 [Patch] lto-wrapper.c: Use -flto-partition=none with offloading (PR97179) Tobias Burnus
2020-09-23 13:02 ` Jakub Jelinek
2020-09-23 13:09 ` Richard Biener
2020-09-23 13:10   ` Richard Biener
2020-09-23 14:23     ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) (was: lto-wrapper.c: Use -flto-partition=none with offloading (PR97179)) Tobias Burnus
2020-09-23 15:47       ` [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) Tobias Burnus
2020-09-23 21:29         ` Tobias Burnus
2020-09-24  7:03           ` Richard Biener
2020-09-24  7:47             ` Tobias Burnus
2020-09-24  8:03               ` Richard Biener
2020-09-24  9:41                 ` Tobias Burnus
2020-09-24  9:49                   ` Jakub Jelinek
2020-09-24  9:51                     ` Richard Biener
2020-09-24  9:50                   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).