public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] API extension for binutils (type of symbols).
@ 2020-03-09  9:30 Martin Liška
  2020-03-09 15:36 ` H.J. Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-09  9:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: H.J. Lu, Richard Biener, Jan Hubicka

Hi.

With change of -fno-common default for GCC 10 we're facing serious
problem with LTO where one can't distinguish in between global symbols
and variables based on the output of nm:
https://sourceware.org/bugzilla/show_bug.cgi?id=25355

$ cat nm.c
char nm_test_var;

$ gcc-9 nm.c -c -fno-common
$ nm nm.o
0000000000000000 B nm_test_var

$ gcc-9 nm.c -c -fno-common -flto
$ nm nm.o
00000000 T nm_test_var

H.J. decided to implement quite heavy solution which is about usage of lto-wrapper
that takes a LTO object file and makes a final assembly file. The file is then
utilized with nm. That has some disadvantages:

- it's slow - using nm x.a can take very long time
- way of finding lto-wrapper is quite hard-coded to location of LTO plugin
- we face issues with multiple final object files:
   https://sourceware.org/bugzilla/show_bug.cgi?id=25640

That said, I'm suggesting to expect LTO plugin API to tell binutils whether
a symbol is variable or function. That should help us to mark global variables
with "D" in nm output.

I would like to note that even with -fcommon, the nm output for LTO bytecode is far
from perfect:

$ cat bss.c
int global_zero;
int global_one = 1;

$ gcc-9 bss.c -c  -flto
$ nm bss.o
00000000 T global_one
00000000 C global_zero

I believe in this case we can mark both symbols with D as a reasonable guess.

Thoughts?
Martin

gcc/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-streamer-out.c (write_symbol): Stream
	symbol type.

include/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	* plugin-api.h (struct ld_plugin_symbol): New member
	symbols_type.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.

lto-plugin/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (parse_table_entry): Parse symbol type.
---
  gcc/lto-streamer-out.c  |  2 ++
  include/lto-symtab.h    |  7 +++++++
  include/plugin-api.h    | 13 ++++++++++++-
  lto-plugin/lto-plugin.c | 12 ++++++++++++
  4 files changed, 33 insertions(+), 1 deletion(-)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09  9:30 [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
@ 2020-03-09 15:36 ` H.J. Lu
  2020-03-09 16:56   ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-09 15:36 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, Richard Biener, Jan Hubicka

On Mon, Mar 9, 2020 at 2:30 AM Martin Liška <mliska@suse.cz> wrote:
>
> Hi.
>
> With change of -fno-common default for GCC 10 we're facing serious
> problem with LTO where one can't distinguish in between global symbols
> and variables based on the output of nm:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25355
>
> $ cat nm.c
> char nm_test_var;
>
> $ gcc-9 nm.c -c -fno-common
> $ nm nm.o
> 0000000000000000 B nm_test_var
>
> $ gcc-9 nm.c -c -fno-common -flto
> $ nm nm.o
> 00000000 T nm_test_var
>
> H.J. decided to implement quite heavy solution which is about usage of lto-wrapper
> that takes a LTO object file and makes a final assembly file. The file is then
> utilized with nm. That has some disadvantages:
>
> - it's slow - using nm x.a can take very long time
> - way of finding lto-wrapper is quite hard-coded to location of LTO plugin
> - we face issues with multiple final object files:
>    https://sourceware.org/bugzilla/show_bug.cgi?id=25640
>
> That said, I'm suggesting to expect LTO plugin API to tell binutils whether
> a symbol is variable or function. That should help us to mark global variables
> with "D" in nm output.
>
> I would like to note that even with -fcommon, the nm output for LTO bytecode is far
> from perfect:
>
> $ cat bss.c
> int global_zero;
> int global_one = 1;
>
> $ gcc-9 bss.c -c  -flto
> $ nm bss.o
> 00000000 T global_one
> 00000000 C global_zero
>
> I believe in this case we can mark both symbols with D as a reasonable guess.
>
> Thoughts?
> Martin
>
> gcc/ChangeLog:
>
> 2020-03-09  Martin Liska  <mliska@suse.cz>
>
>         * lto-streamer-out.c (write_symbol): Stream
>         symbol type.
>
> include/ChangeLog:
>
> 2020-03-09  Martin Liska  <mliska@suse.cz>
>
>         * lto-symtab.h (enum gcc_plugin_symbol_type): New.
>         * plugin-api.h (struct ld_plugin_symbol): New member
>         symbols_type.
>         (enum ld_plugin_symbol_type): New.
>         (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.
>
> lto-plugin/ChangeLog:
>
> 2020-03-09  Martin Liska  <mliska@suse.cz>
>
>         * lto-plugin.c (parse_table_entry): Parse symbol type.
> ---
>   gcc/lto-streamer-out.c  |  2 ++
>   include/lto-symtab.h    |  7 +++++++
>   include/plugin-api.h    | 13 ++++++++++++-
>   lto-plugin/lto-plugin.c | 12 ++++++++++++
>   4 files changed, 33 insertions(+), 1 deletion(-)
>
>

We nee to support different variables, like TLS, data and bss variables.

-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 15:36 ` H.J. Lu
@ 2020-03-09 16:56   ` Martin Liška
  2020-03-09 18:25     ` H.J. Lu
                       ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-09 16:56 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Richard Biener, Jan Hubicka

On 3/9/20 4:36 PM, H.J. Lu wrote:
> We nee to support different variables, like TLS, data and bss variables.

Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

About BSS and DATA I agree that it would be handy. I can theoretically
covered with code in get_variable_section/bss_initializer_p. But it's
quite logic and I'm not sure we should simulate it.

@Honza/Richi: Do you have any opinion about that?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 16:56   ` Martin Liška
@ 2020-03-09 18:25     ` H.J. Lu
  2020-03-09 20:19       ` Jan Hubicka
  2020-03-09 19:45     ` Michael Matz
  2020-03-26 16:54     ` Jeff Law
  2 siblings, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-09 18:25 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, Richard Biener, Jan Hubicka

On Mon, Mar 9, 2020 at 9:56 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
>
> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

Since you are introducing symbol types, why not support TLS?

> About BSS and DATA I agree that it would be handy. I can theoretically
> covered with code in get_variable_section/bss_initializer_p. But it's
> quite logic and I'm not sure we should simulate it.
>
> @Honza/Richi: Do you have any opinion about that?
>
> Thanks,
> Martin



-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 16:56   ` Martin Liška
  2020-03-09 18:25     ` H.J. Lu
@ 2020-03-09 19:45     ` Michael Matz
  2020-03-10  9:39       ` Martin Liška
  2020-03-26 16:54     ` Jeff Law
  2 siblings, 1 reply; 55+ messages in thread
From: Michael Matz @ 2020-03-09 19:45 UTC (permalink / raw)
  To: Martin Liška; +Cc: H.J. Lu, GCC Patches, Jan Hubicka

Hello,

On Mon, 9 Mar 2020, Martin Liška wrote:

> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
> 
> Why do we need TLS? Right now, it's not supported by nm.

Of course it does.  It's the 'T' (or 't') character.  When you introduce 
symbol categories into the plugin system it would be advisable to include 
all we usually care about, and as the ELF categories are (roughly) a 
superset of everything we support, I'd say that should be the list to look 
at.  I.e. a mixture of visibility, locality (aka binding) and type:

   {object,function,common,tls}
 x {local,global,weak,unique}
 x {default,internal,hidden,protected}

That doesn't include symbols types section,file,ifunc or os or arch 
specific types or visibilities or bindings.  But it would probably not be 
the worst idea to simply encode what we need with ELF constants and names.  
While not all the world is ELF, all concepts we have can be mapped onto 
ELF.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 18:25     ` H.J. Lu
@ 2020-03-09 20:19       ` Jan Hubicka
  2020-03-10 10:05         ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Hubicka @ 2020-03-09 20:19 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Martin Liška, GCC Patches, Richard Biener

> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška <mliska@suse.cz> wrote:
> >
> > On 3/9/20 4:36 PM, H.J. Lu wrote:
> > > We nee to support different variables, like TLS, data and bss variables.
> >
> > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> 
> Since you are introducing symbol types, why not support TLS?
> 
> > About BSS and DATA I agree that it would be handy. I can theoretically
> > covered with code in get_variable_section/bss_initializer_p. But it's
> > quite logic and I'm not sure we should simulate it.

I think it should not be that hard to factor out the logic from
get_variable_section to return enum of what we want to do and then
have get_variale_section as a wrapper parsing this enum to actual
section.
> >
> > @Honza/Richi: Do you have any opinion about that?

I guess we indeed want to get as close to non-LTO nm behaviour as
possible. So we want to support them and perhaps think of .symtab
section file format that can be made backward compatible (such as having
attribute string for symbols where we can add new info in future in a
way that old plugins will still get info they want).

Of course IPA optimizations may migrate symbols around (say from data to
bss)/take them away/rename them, but with that we need to live. I would
expect most tools inspecting nm are interested in what will enter
linking not what will be in final output.

Since we discuss plugin extensions (and I do not want this to complicate
finishing Martin's patch).  Are we aware of other plugin limitations?
One thing that I consider unsafe is the way we produce local names when
we need to promote symbol to hidden due to partitining.  We add
.lto_priv, but that is not safe if we link with .o file that was
incrementally lto-optimized to target object file (this is reason why I
did not enabled WHOPR path for it). 

We may also want to inform lld and llvm's gold plugin maintainers about
intended changes.
Honza
> >
> > Thanks,
> > Martin
> 
> 
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 19:45     ` Michael Matz
@ 2020-03-10  9:39       ` Martin Liška
  2020-03-10 12:20         ` Martin Liška
  2020-03-10 14:26         ` Michael Matz
  0 siblings, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-10  9:39 UTC (permalink / raw)
  To: Michael Matz; +Cc: H.J. Lu, GCC Patches, Jan Hubicka

On 3/9/20 8:45 PM, Michael Matz wrote:
> Hello,
> 
> On Mon, 9 Mar 2020, Martin Liška wrote:
> 
>> On 3/9/20 4:36 PM, H.J. Lu wrote:
>>> We nee to support different variables, like TLS, data and bss variables.
>>
>> Why do we need TLS? Right now, it's not supported by nm.
> 
> Of course it does.  It's the 'T' (or 't') character.

Thank you reply!

Are you sure about it?
$ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o
...
0000000000000000 D s
0000000000000010 D t

?

>  When you introduce
> symbol categories into the plugin system it would be advisable to include
> all we usually care about, and as the ELF categories are (roughly) a
> superset of everything we support, I'd say that should be the list to look
> at.  I.e. a mixture of visibility, locality (aka binding) and type:

I agree with that.

> 
>     {object,function,common,tls}

Here LDPK_COMMON is already handled.

>   x {local,global,weak,unique}
>   x {default,internal,hidden,protected}
> 
> That doesn't include symbols types section,file,ifunc or os or arch
> specific types or visibilities or bindings.  But it would probably not be
> the worst idea to simply encode what we need with ELF constants and names.
> While not all the world is ELF, all concepts we have can be mapped onto
> ELF.
> 
> 
> Ciao,
> Michael.
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 20:19       ` Jan Hubicka
@ 2020-03-10 10:05         ` Martin Liška
  2020-03-10 10:49           ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-10 10:05 UTC (permalink / raw)
  To: Jan Hubicka, H.J. Lu; +Cc: GCC Patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

On 3/9/20 9:19 PM, Jan Hubicka wrote:
>> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška <mliska@suse.cz> wrote:
>>>
>>> On 3/9/20 4:36 PM, H.J. Lu wrote:
>>>> We nee to support different variables, like TLS, data and bss variables.
>>>
>>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
>>
>> Since you are introducing symbol types, why not support TLS?
>>
>>> About BSS and DATA I agree that it would be handy. I can theoretically
>>> covered with code in get_variable_section/bss_initializer_p. But it's
>>> quite logic and I'm not sure we should simulate it.
> 
> I think it should not be that hard to factor out the logic from
> get_variable_section to return enum of what we want to do and then
> have get_variale_section as a wrapper parsing this enum to actual
> section.

So it was easier that I expected and I'm sending updated version
of the patch.

>>>
>>> @Honza/Richi: Do you have any opinion about that?
> 
> I guess we indeed want to get as close to non-LTO nm behaviour as
> possible. So we want to support them and perhaps think of .symtab
> section file format that can be made backward compatible (such as having
> attribute string for symbols where we can add new info in future in a
> way that old plugins will still get info they want).

I like the idea. But it's probably next stage1 material. Or can you prepare
a patch?

> 
> Of course IPA optimizations may migrate symbols around (say from data to
> bss)/take them away/rename them, but with that we need to live. I would
> expect most tools inspecting nm are interested in what will enter
> linking not what will be in final output.

Yes, there are mostly used during configure script run where they commonly
do not take final linked executables/shared libs.

> 
> Since we discuss plugin extensions (and I do not want this to complicate
> finishing Martin's patch).

Please suggest that in another patch.

The current situation with binutils is bad because we can't build distribution
with -fno-common and LTO.

Martin

>  Are we aware of other plugin limitations?
> One thing that I consider unsafe is the way we produce local names when
> we need to promote symbol to hidden due to partitining.  We add
> .lto_priv, but that is not safe if we link with .o file that was
> incrementally lto-optimized to target object file (this is reason why I
> did not enabled WHOPR path for it).
> 
> We may also want to inform lld and llvm's gold plugin maintainers about
> intended changes.
> Honza
>>>
>>> Thanks,
>>> Martin
>>
>>
>>
>> -- 
>> H.J.


[-- Attachment #2: 0001-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 4008 bytes --]

From 4214743fc011fd8900a89166759a5511d8da5da2 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-streamer-out.c (write_symbol): Stream
	symbol type.

include/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	* plugin-api.h (struct ld_plugin_symbol): New member
	symbols_type.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.

lto-plugin/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (parse_table_entry): Parse symbol type.
---
 gcc/lto-streamer-out.c  | 14 ++++++++++++++
 include/lto-symtab.h    |  8 ++++++++
 include/plugin-api.h    | 14 +++++++++++++-
 lto-plugin/lto-plugin.c | 13 +++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..ead606eb665 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&c, 1);
   c = (unsigned char) visibility;
   lto_write_data (&c, 1);
+
+  gcc_plugin_symbol_type st;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      st = (s->common.flags & SECTION_BSS
+	    ? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA);
+    }
+  else
+    st = GCCST_FUNCTION;
+
+  c = (unsigned char) st;
+  lto_write_data (&c, 1);
   lto_write_data (&size, 8);
   lto_write_data (&slot_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..901bc3585c2 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE_DATA,
+  GCCST_VARIABLE_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..794a2dcc4ee 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -92,6 +92,7 @@ struct ld_plugin_symbol
   uint64_t size;
   char *comdat_key;
   int resolution;
+  int symbol_type;
 };
 
 /* An object's section.  */
@@ -123,6 +124,16 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE_DATA,
+  LDST_VARIABLE_BSS
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +442,8 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_GET_SYMBOLS_V4 = 33,
 };
 
 /* The plugin transfer vector.  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..230846c1d65 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -252,6 +252,14 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
       LDPV_HIDDEN
     };
 
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE_DATA,
+      LDST_VARIABLE_BSS
+    };
+
   switch (sym_style)
     {
     case ss_win32:
@@ -296,6 +304,11 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10 10:05         ` Martin Liška
@ 2020-03-10 10:49           ` Richard Biener
  2020-03-10 11:09             ` Jan Hubicka
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-10 10:49 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, H.J. Lu, GCC Patches

On Tue, Mar 10, 2020 at 11:05 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/9/20 9:19 PM, Jan Hubicka wrote:
> >> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška <mliska@suse.cz> wrote:
> >>>
> >>> On 3/9/20 4:36 PM, H.J. Lu wrote:
> >>>> We nee to support different variables, like TLS, data and bss variables.
> >>>
> >>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> >>
> >> Since you are introducing symbol types, why not support TLS?
> >>
> >>> About BSS and DATA I agree that it would be handy. I can theoretically
> >>> covered with code in get_variable_section/bss_initializer_p. But it's
> >>> quite logic and I'm not sure we should simulate it.
> >
> > I think it should not be that hard to factor out the logic from
> > get_variable_section to return enum of what we want to do and then
> > have get_variale_section as a wrapper parsing this enum to actual
> > section.
>
> So it was easier that I expected and I'm sending updated version
> of the patch.
>
> >>>
> >>> @Honza/Richi: Do you have any opinion about that?
> >
> > I guess we indeed want to get as close to non-LTO nm behaviour as
> > possible. So we want to support them and perhaps think of .symtab
> > section file format that can be made backward compatible (such as having
> > attribute string for symbols where we can add new info in future in a
> > way that old plugins will still get info they want).
>
> I like the idea. But it's probably next stage1 material. Or can you prepare
> a patch?

I think what's important is that the LTO plugin needs to understand
the old and the new version since there's only one for auto-loading.

The other missing feature of the linker plugin API is file claiming
which should be a on a section basis instead - but that's a different
part of the API and not related to symbol tables.  Enhancing that
part of the API would allow to elide the LTO debug copying ...

> >
> > Of course IPA optimizations may migrate symbols around (say from data to
> > bss)/take them away/rename them, but with that we need to live. I would
> > expect most tools inspecting nm are interested in what will enter
> > linking not what will be in final output.
>
> Yes, there are mostly used during configure script run where they commonly
> do not take final linked executables/shared libs.
>
> >
> > Since we discuss plugin extensions (and I do not want this to complicate
> > finishing Martin's patch).
>
> Please suggest that in another patch.
>
> The current situation with binutils is bad because we can't build distribution
> with -fno-common and LTO.
>
> Martin
>
> >  Are we aware of other plugin limitations?
> > One thing that I consider unsafe is the way we produce local names when
> > we need to promote symbol to hidden due to partitining.  We add
> > .lto_priv, but that is not safe if we link with .o file that was
> > incrementally lto-optimized to target object file (this is reason why I
> > did not enabled WHOPR path for it).
> >
> > We may also want to inform lld and llvm's gold plugin maintainers about
> > intended changes.
> > Honza
> >>>
> >>> Thanks,
> >>> Martin
> >>
> >>
> >>
> >> --
> >> H.J.
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10 10:49           ` Richard Biener
@ 2020-03-10 11:09             ` Jan Hubicka
  2020-03-10 11:24               ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Hubicka @ 2020-03-10 11:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches

> > >>> @Honza/Richi: Do you have any opinion about that?
> > >
> > > I guess we indeed want to get as close to non-LTO nm behaviour as
> > > possible. So we want to support them and perhaps think of .symtab
> > > section file format that can be made backward compatible (such as having
> > > attribute string for symbols where we can add new info in future in a
> > > way that old plugins will still get info they want).
> >
> > I like the idea. But it's probably next stage1 material. Or can you prepare
> > a patch?
> 
> I think what's important is that the LTO plugin needs to understand
> the old and the new version since there's only one for auto-loading.
> 
> The other missing feature of the linker plugin API is file claiming
> which should be a on a section basis instead - but that's a different
> part of the API and not related to symbol tables.  Enhancing that
> part of the API would allow to elide the LTO debug copying ...

Thinking of it, it seems to me that we do not need to break
compatibility with existing plugins.  We could keep existing .symtab
section the way it is implemented right now
and add additional data to new .symtab_ext section so existing plugins
will work as expeted.

We could tag symtab_ext by a version string and keep adding extensions
of extensions in the future being compatible with old plugins.

Indeed that would help situation user already has distro provided plugin
in the search path but compiles its own gcc10.

Honza

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10 11:09             ` Jan Hubicka
@ 2020-03-10 11:24               ` Richard Biener
  2020-03-10 12:07                 ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-10 11:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, GCC Patches

On Tue, Mar 10, 2020 at 12:09 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > > >>> @Honza/Richi: Do you have any opinion about that?
> > > >
> > > > I guess we indeed want to get as close to non-LTO nm behaviour as
> > > > possible. So we want to support them and perhaps think of .symtab
> > > > section file format that can be made backward compatible (such as having
> > > > attribute string for symbols where we can add new info in future in a
> > > > way that old plugins will still get info they want).
> > >
> > > I like the idea. But it's probably next stage1 material. Or can you prepare
> > > a patch?
> >
> > I think what's important is that the LTO plugin needs to understand
> > the old and the new version since there's only one for auto-loading.
> >
> > The other missing feature of the linker plugin API is file claiming
> > which should be a on a section basis instead - but that's a different
> > part of the API and not related to symbol tables.  Enhancing that
> > part of the API would allow to elide the LTO debug copying ...
>
> Thinking of it, it seems to me that we do not need to break
> compatibility with existing plugins.  We could keep existing .symtab
> section the way it is implemented right now
> and add additional data to new .symtab_ext section so existing plugins
> will work as expeted.
>
> We could tag symtab_ext by a version string and keep adding extensions
> of extensions in the future being compatible with old plugins.

Not sure how symtab is encoded right now but we also could have

entry1: symbol <current data>
entry2: ^ext-ver-n <extra data>
entry3: ^ext-ver-m <extra data>
entry4: symbol <current data>

and ^ext-ver-X being escape byte(s) denoting we're providing
additional data for the preceeding symbol.  Not sure if there's
something conveniently available in the current encoding that
would make older plugins skip an entry ;)

> Indeed that would help situation user already has distro provided plugin
> in the search path but compiles its own gcc10.
>
> Honza

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10 11:24               ` Richard Biener
@ 2020-03-10 12:07                 ` Martin Liška
  2020-03-11  9:19                   ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-10 12:07 UTC (permalink / raw)
  To: Richard Biener, Jan Hubicka; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 303 bytes --]

On 3/10/20 12:24 PM, Richard Biener wrote:
> Not sure how symtab is encoded right now but we also could have

Ok, right now I don't see symtab entry much extensible.

But what am I suggesting is to parse LTO bytecode version and then
process conditional parsing of lto_symtab section.

Thoughts?
Martin

[-- Attachment #2: 0001-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 5924 bytes --]

From 017dd9cba9a0222104682bef094d9f5057d2c9ae Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-streamer-out.c (write_symbol): Stream
	symbol type.

include/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	* plugin-api.h (struct ld_plugin_symbol): New member
	symbols_type.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.

lto-plugin/ChangeLog:

2020-03-09  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (parse_table_entry): Parse symbol type.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(struct lto_section): New.
---
 gcc/lto-streamer-out.c  | 14 ++++++++++++
 include/lto-symtab.h    |  8 +++++++
 include/plugin-api.h    | 14 +++++++++++-
 lto-plugin/lto-plugin.c | 48 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..ead606eb665 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&c, 1);
   c = (unsigned char) visibility;
   lto_write_data (&c, 1);
+
+  gcc_plugin_symbol_type st;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      st = (s->common.flags & SECTION_BSS
+	    ? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA);
+    }
+  else
+    st = GCCST_FUNCTION;
+
+  c = (unsigned char) st;
+  lto_write_data (&c, 1);
   lto_write_data (&size, 8);
   lto_write_data (&slot_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..901bc3585c2 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE_DATA,
+  GCCST_VARIABLE_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..794a2dcc4ee 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -92,6 +92,7 @@ struct ld_plugin_symbol
   uint64_t size;
   char *comdat_key;
   int resolution;
+  int symbol_type;
 };
 
 /* An object's section.  */
@@ -123,6 +124,16 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE_DATA,
+  LDST_VARIABLE_BSS
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +442,8 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_GET_SYMBOLS_V4 = 33,
 };
 
 /* The plugin transfer vector.  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..33afae9afb6 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX		".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	(sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
     }
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
 /* This little wrapper allows check to be called with a non-integer
    first argument, such as a pointer that must be non-NULL.  We can't
    use c99 bool type to coerce it into range, so we explicitly test.  */
@@ -252,6 +268,14 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
       LDPV_HIDDEN
     };
 
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE_DATA,
+      LDST_VARIABLE_BSS
+    };
+
   switch (sym_style)
     {
     case ss_win32:
@@ -296,6 +320,15 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  /* Symbol type was added in GCC 10.1 (LTO version 9.0).  */
+  if (lto_header.major_version >= 9)
+    {
+      t = *p;
+      check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+      entry->symbol_type = symbol_types[t];
+      p++;
+    }
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -951,7 +984,20 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
+    {
+      if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+	goto err;
+
+      ssize_t got = read (obj->file->fd, &lto_header,
+			  sizeof (struct lto_section));
+      if (got != sizeof (struct lto_section))
+	goto err;
+
+      return 1;
+    }
 
   if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
     return 1;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10  9:39       ` Martin Liška
@ 2020-03-10 12:20         ` Martin Liška
  2020-03-10 14:26         ` Michael Matz
  1 sibling, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-10 12:20 UTC (permalink / raw)
  To: Michael Matz; +Cc: H.J. Lu, GCC Patches, Jan Hubicka

On 3/10/20 10:39 AM, Martin Liška wrote:
> 
> Are you sure about it?
> $ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o
> ...
> 0000000000000000 D s
> 0000000000000010 D t
> 
> ?

A nicer example:

$ cat tls.c
__thread struct S { long a, b; } s = { 0, 0 };
__thread char t[16] = { 7 };

$ gcc tls.c -c && nm -B tls.o
0000000000000000 B s
0000000000000000 D t

Well, TLS seems to me an orthogonal problem. Similarly other special symbols like
.func, .symver, alias attribute, ...

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10  9:39       ` Martin Liška
  2020-03-10 12:20         ` Martin Liška
@ 2020-03-10 14:26         ` Michael Matz
  1 sibling, 0 replies; 55+ messages in thread
From: Michael Matz @ 2020-03-10 14:26 UTC (permalink / raw)
  To: Martin Liška; +Cc: H.J. Lu, GCC Patches, Jan Hubicka

Hello,

On Tue, 10 Mar 2020, Martin Liška wrote:

> >>> We nee to support different variables, like TLS, data and bss variables.
> >>
> >> Why do we need TLS? Right now, it's not supported by nm.
> > 
> > Of course it does.  It's the 'T' (or 't') character.
> 
> Thank you reply!
> 
> Are you sure about it?

Err, I bungled in between writing emails, you are right, nm(1) in BSD mode 
doesn't make a difference for TLS symbols.  (And of course T/t are for 
text, aka code, symbols).

Doesn't invalidate the rest of my email, though.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-10 12:07                 ` Martin Liška
@ 2020-03-11  9:19                   ` Martin Liška
  2020-03-11 10:22                     ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-11  9:19 UTC (permalink / raw)
  To: Richard Biener, Jan Hubicka; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4012 bytes --]

On 3/10/20 1:07 PM, Martin Liška wrote:
> On 3/10/20 12:24 PM, Richard Biener wrote:
>> Not sure how symtab is encoded right now but we also could have
> 
> Ok, right now I don't see symtab entry much extensible.
> 
> But what am I suggesting is to parse LTO bytecode version and then
> process conditional parsing of lto_symtab section.
> 
> Thoughts?
> Martin

So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct.
It would make ABI change as correctly identified by abidiff:

abidiff /tmp/before.o /tmp/after.o
Functions changes summary: 0 Removed, 1 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info

1 function with some indirect sub-type change:

   [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes:
     parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
       in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
         type size hasn't changed
         1 data member changes (1 filtered):
          type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
            type size hasn't changed
            1 data member changes (1 filtered):
             type of 'ld_plugin_add_symbols tv_add_symbols' changed:
               underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed:
                 in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)':
                   parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes:
                     in pointed to type 'const ld_plugin_symbol':
                       in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1:
                         type size changed from 256 to 288 (in bits)
                         1 data member insertion:
                           'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1

So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol
tables:

struct plugin_symtab
{
...
   struct ld_plugin_symbol_v2 *syms;
   struct ld_plugin_symbol *syms_v1;
...
};

and the information of these should by aligned.

The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will
utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.

Thoughts?
Martin



[-- Attachment #2: 0001-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 16402 bytes --]

From 1271e9cc487847c1aa07e5bc0470e8c5b71900e9 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-11  Martin Liska  <mliska@suse.cz>

	* lto-streamer-out.c (write_symbol): Stream type
	of symbol (function, variable - bss/data).

include/ChangeLog:

2020-03-11  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type):
	New.
	* plugin-api.h (struct ld_plugin_symbol_v2): New
	ld_plugin_symbol structure with added symbol_type.
	(enum ld_plugin_symbol_type): New.
	(ld_plugin_add_symbols_v2): New.
	(ld_plugin_get_symbols_v2): New.
	(ld_plugin_tag): Add LDPT_GET_SYMBOLS_V4
	and LDPT_ADD_SYMBOLS_V2.
	(struct ld_plugin_tv): Add tv_add_symbols_v2
	and tv_get_symbols_v4.

lto-plugin/ChangeLog:

2020-03-11  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): Likewise.
	(struct plugin_symtab): Make syms of new
	type ld_plugin_symbol_v2 and add syms_v1
	for older API hooks.
	(struct lto_section): New.
	(parse_table_entry): Change type to ld_plugin_symbol_v2
	and conditionally (based on LTO bytecode) parse symbol
	type.
	(translate): Change type to ld_plugin_symbol_v2.
	(fill_up_older_syms): New function.
	(free_1): Release also syms_v1.
	(write_resolution): Support new version of get_symbols.
	(eq_sym): Change type to ld_plugin_symbol_v2.
	(hash_sym): Likewise.
	(symbol_strength): Likewise.
	(resolve_conflicts): Likewise.
	(process_symtab): Parse LTO_LTO_PREFIX section.
	(claim_file_handler): Support new version of add_symbols.
	(onload): Fill up new hooks.
---
 gcc/lto-streamer-out.c  |  14 ++++
 include/lto-symtab.h    |   8 +++
 include/plugin-api.h    |  42 +++++++++++-
 lto-plugin/lto-plugin.c | 148 ++++++++++++++++++++++++++++++++++------
 4 files changed, 190 insertions(+), 22 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..ead606eb665 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&c, 1);
   c = (unsigned char) visibility;
   lto_write_data (&c, 1);
+
+  gcc_plugin_symbol_type st;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      st = (s->common.flags & SECTION_BSS
+	    ? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA);
+    }
+  else
+    st = GCCST_FUNCTION;
+
+  c = (unsigned char) st;
+  lto_write_data (&c, 1);
   lto_write_data (&size, 8);
   lto_write_data (&slot_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..901bc3585c2 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE_DATA,
+  GCCST_VARIABLE_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..123c4a58a6d 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -94,6 +94,21 @@ struct ld_plugin_symbol
   int resolution;
 };
 
+/* A symbol belonging to an input file managed by the plugin library
+   (version 2).  */
+
+struct ld_plugin_symbol_v2
+{
+  char *name;
+  char *version;
+  int def;
+  int visibility;
+  uint64_t size;
+  char *comdat_key;
+  int resolution;
+  int symbol_type;
+};
+
 /* An object's section.  */
 
 struct ld_plugin_section
@@ -123,6 +138,16 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE_DATA,
+  LDST_VARIABLE_BSS
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -210,6 +235,12 @@ enum ld_plugin_status
 (*ld_plugin_add_symbols) (void *handle, int nsyms,
                           const struct ld_plugin_symbol *syms);
 
+typedef
+enum ld_plugin_status
+(*ld_plugin_add_symbols_v2) (void *handle, int nsyms,
+			     const struct ld_plugin_symbol_v2 *syms);
+
+
 /* The linker's interface for getting the input file information with
    an open (possibly re-opened) file descriptor.  */
 
@@ -235,6 +266,11 @@ enum ld_plugin_status
 (*ld_plugin_get_symbols) (const void *handle, int nsyms,
                           struct ld_plugin_symbol *syms);
 
+typedef
+enum ld_plugin_status
+(*ld_plugin_get_symbols_v2) (const void *handle, int nsyms,
+			     struct ld_plugin_symbol_v2 *syms);
+
 /* The linker's interface for adding a compiled input file.  */
 
 typedef
@@ -431,7 +467,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_GET_SYMBOLS_V4 = 33,
+  LDPT_ADD_SYMBOLS_V2 = 34,
 };
 
 /* The plugin transfer vector.  */
@@ -447,7 +485,9 @@ struct ld_plugin_tv
     ld_plugin_register_all_symbols_read tv_register_all_symbols_read;
     ld_plugin_register_cleanup tv_register_cleanup;
     ld_plugin_add_symbols tv_add_symbols;
+    ld_plugin_add_symbols_v2 tv_add_symbols_v2;
     ld_plugin_get_symbols tv_get_symbols;
+    ld_plugin_get_symbols_v2 tv_get_symbols_v4;
     ld_plugin_add_input_file tv_add_input_file;
     ld_plugin_message tv_message;
     ld_plugin_get_input_file tv_get_input_file;
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..b1b4516ec1b 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX		".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	(sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -109,7 +111,8 @@ struct plugin_symtab
 {
   int nsyms;
   struct sym_aux *aux;
-  struct ld_plugin_symbol *syms;
+  struct ld_plugin_symbol_v2 *syms;
+  struct ld_plugin_symbol *syms_v1;
   unsigned long long id;
 };
 
@@ -155,11 +158,13 @@ static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
 static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols_v2 get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
 static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols_v2 add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -221,6 +226,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
     }
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
 /* This little wrapper allows check to be called with a non-integer
    first argument, such as a pointer that must be non-NULL.  We can't
    use c99 bool type to coerce it into range, so we explicitly test.  */
@@ -231,7 +250,7 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
    Returns the address of the next entry. */
 
 static char *
-parse_table_entry (char *p, struct ld_plugin_symbol *entry, 
+parse_table_entry (char *p, struct ld_plugin_symbol_v2 *entry,
 		   struct sym_aux *aux)
 {
   unsigned char t;
@@ -252,6 +271,14 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
       LDPV_HIDDEN
     };
 
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE_DATA,
+      LDST_VARIABLE_BSS
+    };
+
   switch (sym_style)
     {
     case ss_win32:
@@ -296,6 +323,15 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  /* Symbol type was added in GCC 10.1 (LTO version 9.0).  */
+  if (lto_header.major_version >= 9)
+    {
+      t = *p;
+      check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+      entry->symbol_type = symbol_types[t];
+      p++;
+    }
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -316,14 +352,14 @@ static void
 translate (char *data, char *end, struct plugin_symtab *out)
 {
   struct sym_aux *aux;
-  struct ld_plugin_symbol *syms = NULL;
+  struct ld_plugin_symbol_v2 *syms = NULL;
   int n, len;
 
   /* This overestimates the output buffer sizes, but at least 
      the algorithm is O(1) now. */
 
   len = (end - data)/8 + out->nsyms + 1;
-  syms = xrealloc (out->syms, len * sizeof (struct ld_plugin_symbol));
+  syms = xrealloc (out->syms, len * sizeof (struct ld_plugin_symbol_v2));
   aux = xrealloc (out->aux, len * sizeof (struct sym_aux));
   
   for (n = out->nsyms; data < end; n++) 
@@ -339,6 +375,33 @@ translate (char *data, char *end, struct plugin_symtab *out)
   out->aux = aux;
 }
 
+/* Copy information from syms to syms_v1 for older hooks.  */
+
+static void
+fill_up_older_syms (struct plugin_symtab *out)
+{
+  if (out->syms_v1)
+    return;
+
+  struct ld_plugin_symbol *syms
+    = xrealloc (out->syms_v1, out->nsyms * sizeof (struct ld_plugin_symbol));
+  for (unsigned i = 0; i < out->nsyms; i++)
+    {
+      struct ld_plugin_symbol *old = &syms[i];
+      struct ld_plugin_symbol_v2 *new = &out->syms[i];
+
+      old->name = new->name;
+      old->version = new->version;
+      old->def = new->def;
+      old->visibility = new->visibility;
+      old->size = new->size;
+      old->comdat_key = new->comdat_key;
+      old->resolution = new->resolution;
+    }
+
+  out->syms_v1 = syms;
+}
+
 /* Free all memory that is no longer needed after writing the symbol
    resolution. */
 
@@ -353,12 +416,18 @@ free_1 (struct plugin_file_info *files, unsigned num_files)
       unsigned int j;
       for (j = 0; j < symtab->nsyms; j++)
 	{
-	  struct ld_plugin_symbol *s = &symtab->syms[j];
+	  struct ld_plugin_symbol_v2 *s = &symtab->syms[j];
 	  free (s->name);
 	  free (s->comdat_key);
 	}
       free (symtab->syms);
       symtab->syms = NULL;
+
+      if (symtab->syms_v1 != NULL)
+	{
+	  free (symtab->syms_v1);
+	  symtab->syms_v1 = NULL;
+	}
     }
 }
 
@@ -478,7 +547,7 @@ free_symtab (struct plugin_symtab *symtab)
 static void
 write_resolution (void)
 {
-  unsigned int i;
+  unsigned int i, j;
   FILE *f;
 
   check (resolution_file, LDPL_FATAL, "resolution file not specified");
@@ -491,14 +560,23 @@ write_resolution (void)
     {
       struct plugin_file_info *info = &claimed_files[i];
       struct plugin_symtab *symtab = &info->symtab;
-      struct ld_plugin_symbol *syms = symtab->syms;
 
       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+	get_symbols_v4 (info->handle, symtab->nsyms, symtab->syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+	{
+	  fill_up_older_syms (symtab);
+	  if (get_symbols_v2)
+	    get_symbols_v2 (info->handle, symtab->nsyms, symtab->syms_v1);
+	  else
+	    get_symbols (info->handle, symtab->nsyms, symtab->syms_v1);
+
+	  /* Update resolution from syms_v1 to syms.  */
+	  for (j = 0; j < symtab->nsyms; j++)
+	    symtab->syms[j].resolution = symtab->syms_v1[j].resolution;
+	}
 
       finish_conflict_resolution (symtab, &info->conflicts);
 
@@ -827,8 +905,8 @@ cleanup_handler (void)
 
 static int eq_sym (const void *a, const void *b)
 {
-  const struct ld_plugin_symbol *as = (const struct ld_plugin_symbol *)a;
-  const struct ld_plugin_symbol *bs = (const struct ld_plugin_symbol *)b;
+  const struct ld_plugin_symbol_v2 *as = (const struct ld_plugin_symbol_v2 *)a;
+  const struct ld_plugin_symbol_v2 *bs = (const struct ld_plugin_symbol_v2 *)b;
 
   return !strcmp (as->name, bs->name);
 }
@@ -837,14 +915,14 @@ static int eq_sym (const void *a, const void *b)
 
 static hashval_t hash_sym (const void *a)
 {
-  const struct ld_plugin_symbol *as = (const struct ld_plugin_symbol *)a;
+  const struct ld_plugin_symbol_v2 *as = (const struct ld_plugin_symbol_v2 *)a;
 
   return htab_hash_string (as->name);
 }
 
 /* Determine how strong a symbol is */
 
-static int symbol_strength (struct ld_plugin_symbol *s)
+static int symbol_strength (struct ld_plugin_symbol_v2 *s)
 {
   switch (s->def) 
     { 
@@ -885,14 +963,14 @@ resolve_conflicts (struct plugin_symtab *t, struct plugin_symtab *conflicts)
   int outlen;
 
   outlen = t->nsyms;
-  conflicts->syms = xmalloc (sizeof (struct ld_plugin_symbol) * outlen);
+  conflicts->syms = xmalloc (sizeof (struct ld_plugin_symbol_v2) * outlen);
   conflicts->aux = xmalloc (sizeof (struct sym_aux) * outlen);
 
   /* Move all duplicate symbols into the auxiliary conflicts table. */
   out = 0;
   for (i = 0; i < t->nsyms; i++) 
     {
-      struct ld_plugin_symbol *s = &t->syms[i];
+      struct ld_plugin_symbol_v2 *s = &t->syms[i];
       struct sym_aux *aux = &t->aux[i];
       void **slot;
 
@@ -900,13 +978,14 @@ resolve_conflicts (struct plugin_symtab *t, struct plugin_symtab *conflicts)
       if (*slot != NULL)
 	{
 	  int cnf;
-	  struct ld_plugin_symbol *orig = (struct ld_plugin_symbol *)*slot;
+	  struct ld_plugin_symbol_v2
+	    *orig = (struct ld_plugin_symbol_v2 *)*slot;
 	  struct sym_aux *orig_aux = &t->aux[orig - t->syms];
 
 	  /* Always let the linker resolve the strongest symbol */
 	  if (symbol_strength (orig) < symbol_strength (s)) 
 	    {
-	      SWAP (struct ld_plugin_symbol, *orig, *s);
+	      SWAP (struct ld_plugin_symbol_v2, *orig, *s);
 	      SWAP (uint32_t, orig_aux->slot, aux->slot);
 	      SWAP (unsigned long long, orig_aux->id, aux->id);
 	      /* Don't swap conflict chain pointer */
@@ -951,7 +1030,20 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
+    {
+      if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+	goto err;
+
+      ssize_t got = read (obj->file->fd, &lto_header,
+			  sizeof (struct lto_section));
+      if (got != sizeof (struct lto_section))
+	goto err;
+
+      return 1;
+    }
 
   if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
     return 1;
@@ -1080,8 +1172,16 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	{
+	  fill_up_older_syms (&lto_file.symtab);
+	  status = add_symbols (file->handle, lto_file.symtab.nsyms,
+				lto_file.symtab.syms_v1);
+	}
+
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,12 +1342,18 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols_v2;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
 	case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 	  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
 	  break;
+	case LDPT_GET_SYMBOLS_V4:
+	  get_symbols_v4 = p->tv_u.tv_get_symbols_v4;
+	  break;
 	case LDPT_GET_SYMBOLS_V2:
 	  get_symbols_v2 = p->tv_u.tv_get_symbols;
 	  break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11  9:19                   ` Martin Liška
@ 2020-03-11 10:22                     ` Richard Biener
  2020-03-11 10:30                       ` Richard Biener
  2020-03-11 12:22                       ` Martin Liška
  0 siblings, 2 replies; 55+ messages in thread
From: Richard Biener @ 2020-03-11 10:22 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 11, 2020 at 10:19 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/10/20 1:07 PM, Martin Liška wrote:
> > On 3/10/20 12:24 PM, Richard Biener wrote:
> >> Not sure how symtab is encoded right now but we also could have
> >
> > Ok, right now I don't see symtab entry much extensible.
> >
> > But what am I suggesting is to parse LTO bytecode version and then
> > process conditional parsing of lto_symtab section.
> >
> > Thoughts?
> > Martin
>
> So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct.
> It would make ABI change as correctly identified by abidiff:
>
> abidiff /tmp/before.o /tmp/after.o
> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info
>
> 1 function with some indirect sub-type change:
>
>    [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes:
>      parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
>        in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
>          type size hasn't changed
>          1 data member changes (1 filtered):
>           type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
>             type size hasn't changed
>             1 data member changes (1 filtered):
>              type of 'ld_plugin_add_symbols tv_add_symbols' changed:
>                underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed:
>                  in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)':
>                    parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes:
>                      in pointed to type 'const ld_plugin_symbol':
>                        in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1:
>                          type size changed from 256 to 288 (in bits)
>                          1 data member insertion:
>                            'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1
>
> So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol
> tables:
>
> struct plugin_symtab
> {
> ...
>    struct ld_plugin_symbol_v2 *syms;
>    struct ld_plugin_symbol *syms_v1;
> ...
> };
>
> and the information of these should by aligned.
>
> The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will
> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
>
> Thoughts?

Can't we simply have _V4/V2 use the upper half of
ld_plugin_symbol::def?  If the linker
then requests _V4 but the plugin cannot cope it could still "use" the
data but get
LDST_UNKNOWN (zero) there.

IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.
If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
Note there's also rodata vs data info that would be missing in case
we'd want tools
like readelf -s dump the symbol table of the IL part of an object.  It
looks like
nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
data sections (not sure what's that about).  It seems nm cannot distinguish
symbols in mergeable string sections (it dumps "R" for me there).  So intead
of mangling everything into enum ld_plugin_symbol_type should we instead
add a

enum ld_plugin_symbol_special_section
{
  LDSSS_DEFAULT,
  LDSSS_BSS,
  LDSSS_RODATA,
  ...
}

where LDSSS_DEFAULT means .text for FUNC and .data for OBJECT?
LDSSS_COMDAT might also apply but there's already the comdat_key
member which makes this info implicitely available.

Richard.

> Martin
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 10:22                     ` Richard Biener
@ 2020-03-11 10:30                       ` Richard Biener
  2020-03-11 12:22                       ` Martin Liška
  1 sibling, 0 replies; 55+ messages in thread
From: Richard Biener @ 2020-03-11 10:30 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 11, 2020 at 11:22 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Mar 11, 2020 at 10:19 AM Martin Liška <mliska@suse.cz> wrote:
> >
> > On 3/10/20 1:07 PM, Martin Liška wrote:
> > > On 3/10/20 12:24 PM, Richard Biener wrote:
> > >> Not sure how symtab is encoded right now but we also could have
> > >
> > > Ok, right now I don't see symtab entry much extensible.
> > >
> > > But what am I suggesting is to parse LTO bytecode version and then
> > > process conditional parsing of lto_symtab section.
> > >
> > > Thoughts?
> > > Martin
> >
> > So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct.
> > It would make ABI change as correctly identified by abidiff:
> >
> > abidiff /tmp/before.o /tmp/after.o
> > Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> > Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
> > Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info
> >
> > 1 function with some indirect sub-type change:
> >
> >    [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes:
> >      parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
> >        in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
> >          type size hasn't changed
> >          1 data member changes (1 filtered):
> >           type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
> >             type size hasn't changed
> >             1 data member changes (1 filtered):
> >              type of 'ld_plugin_add_symbols tv_add_symbols' changed:
> >                underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed:
> >                  in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)':
> >                    parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes:
> >                      in pointed to type 'const ld_plugin_symbol':
> >                        in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1:
> >                          type size changed from 256 to 288 (in bits)
> >                          1 data member insertion:
> >                            'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1
> >
> > So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol
> > tables:
> >
> > struct plugin_symtab
> > {
> > ...
> >    struct ld_plugin_symbol_v2 *syms;
> >    struct ld_plugin_symbol *syms_v1;
> > ...
> > };
> >
> > and the information of these should by aligned.
> >
> > The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will
> > utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
> >
> > Thoughts?
>
> Can't we simply have _V4/V2 use the upper half of
> ld_plugin_symbol::def?  If the linker
> then requests _V4 but the plugin cannot cope it could still "use" the
> data but get
> LDST_UNKNOWN (zero) there.
>
> IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.
> If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
> Note there's also rodata vs data info that would be missing in case
> we'd want tools
> like readelf -s dump the symbol table of the IL part of an object.  It
> looks like
> nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
> data sections (not sure what's that about).  It seems nm cannot distinguish
> symbols in mergeable string sections (it dumps "R" for me there).  So intead
> of mangling everything into enum ld_plugin_symbol_type should we instead
> add a
>
> enum ld_plugin_symbol_special_section
> {
>   LDSSS_DEFAULT,
>   LDSSS_BSS,
>   LDSSS_RODATA,
>   ...
> }
>
> where LDSSS_DEFAULT means .text for FUNC and .data for OBJECT?
> LDSSS_COMDAT might also apply but there's already the comdat_key
> member which makes this info implicitely available.

I also think we're trying to solve an issue here that is better solved on
the other side - the configury failing because they are inspecting LTO IL
instead of real object files.

Richard.

> Richard.
>
> > Martin
> >
> >

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 10:22                     ` Richard Biener
  2020-03-11 10:30                       ` Richard Biener
@ 2020-03-11 12:22                       ` Martin Liška
  2020-03-11 12:51                         ` Richard Biener
  1 sibling, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-11 12:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

On 3/11/20 11:22 AM, Richard Biener wrote:
> On Wed, Mar 11, 2020 at 10:19 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 3/10/20 1:07 PM, Martin Liška wrote:
>>> On 3/10/20 12:24 PM, Richard Biener wrote:
>>>> Not sure how symtab is encoded right now but we also could have
>>>
>>> Ok, right now I don't see symtab entry much extensible.
>>>
>>> But what am I suggesting is to parse LTO bytecode version and then
>>> process conditional parsing of lto_symtab section.
>>>
>>> Thoughts?
>>> Martin
>>
>> So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct.
>> It would make ABI change as correctly identified by abidiff:
>>
>> abidiff /tmp/before.o /tmp/after.o
>> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
>> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>> Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
>> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info
>>
>> 1 function with some indirect sub-type change:
>>
>>     [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes:
>>       parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
>>         in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
>>           type size hasn't changed
>>           1 data member changes (1 filtered):
>>            type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
>>              type size hasn't changed
>>              1 data member changes (1 filtered):
>>               type of 'ld_plugin_add_symbols tv_add_symbols' changed:
>>                 underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed:
>>                   in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)':
>>                     parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes:
>>                       in pointed to type 'const ld_plugin_symbol':
>>                         in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1:
>>                           type size changed from 256 to 288 (in bits)
>>                           1 data member insertion:
>>                             'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1
>>
>> So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol
>> tables:
>>
>> struct plugin_symtab
>> {
>> ...
>>     struct ld_plugin_symbol_v2 *syms;
>>     struct ld_plugin_symbol *syms_v1;
>> ...
>> };
>>
>> and the information of these should by aligned.
>>
>> The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will
>> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
>>
>> Thoughts?
> 
> Can't we simply have _V4/V2 use the upper half of
> ld_plugin_symbol::def?  If the linker
> then requests _V4 but the plugin cannot cope it could still "use" the
> data but get
> LDST_UNKNOWN (zero) there.

Can be possible, but it's hack a bit. The plugin has a mechanisms for versioning
and this change does not align with the idea.

> 
> IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.

Yes.

> If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
> Note there's also rodata vs data info that would be missing in case
> we'd want tools
> like readelf -s dump the symbol table of the IL part of an object.  It
> looks like
> nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
> data sections (not sure what's that about).  It seems nm cannot distinguish
> symbols in mergeable string sections (it dumps "R" for me there).  So intead
> of mangling everything into enum ld_plugin_symbol_type should we instead
> add a
> 
> enum ld_plugin_symbol_special_section
> {
>    LDSSS_DEFAULT,
>    LDSSS_BSS,
>    LDSSS_RODATA,
>    ...
> }

Which maps to what we have for:

/* Information that is provided by all instances of the section type.  */
struct GTY(()) section_common {
   /* The set of SECTION_* flags that apply to this section.  */
   unsigned int flags;
};

#define SECTION_CODE	 0x00100	/* contains code */
...
#define SECTION_BSS	 0x02000	/* contains zeros only */
...
#define SECTION_TLS	 0x40000	/* contains thread-local storage */
...
#define SECTION_COMMON   0x800000	/* contains common data */
#define SECTION_RELRO	 0x1000000	/* data is readonly after relocation processing */
...

Anyway, that would be another type which we need in ld_plugin_symbol.

Martin

> 
> where LDSSS_DEFAULT means .text for FUNC and .data for OBJECT?
> LDSSS_COMDAT might also apply but there's already the comdat_key
> member which makes this info implicitely available.
> 
> Richard.
> 
>> Martin
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 12:22                       ` Martin Liška
@ 2020-03-11 12:51                         ` Richard Biener
  2020-03-11 13:07                           ` Martin Liška
  2020-03-11 14:15                           ` Martin Liška
  0 siblings, 2 replies; 55+ messages in thread
From: Richard Biener @ 2020-03-11 12:51 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 11, 2020 at 1:22 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/11/20 11:22 AM, Richard Biener wrote:
> > On Wed, Mar 11, 2020 at 10:19 AM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 3/10/20 1:07 PM, Martin Liška wrote:
> >>> On 3/10/20 12:24 PM, Richard Biener wrote:
> >>>> Not sure how symtab is encoded right now but we also could have
> >>>
> >>> Ok, right now I don't see symtab entry much extensible.
> >>>
> >>> But what am I suggesting is to parse LTO bytecode version and then
> >>> process conditional parsing of lto_symtab section.
> >>>
> >>> Thoughts?
> >>> Martin
> >>
> >> So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct.
> >> It would make ABI change as correctly identified by abidiff:
> >>
> >> abidiff /tmp/before.o /tmp/after.o
> >> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> >> Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
> >> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info
> >>
> >> 1 function with some indirect sub-type change:
> >>
> >>     [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes:
> >>       parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
> >>         in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
> >>           type size hasn't changed
> >>           1 data member changes (1 filtered):
> >>            type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
> >>              type size hasn't changed
> >>              1 data member changes (1 filtered):
> >>               type of 'ld_plugin_add_symbols tv_add_symbols' changed:
> >>                 underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed:
> >>                   in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)':
> >>                     parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes:
> >>                       in pointed to type 'const ld_plugin_symbol':
> >>                         in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1:
> >>                           type size changed from 256 to 288 (in bits)
> >>                           1 data member insertion:
> >>                             'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1
> >>
> >> So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol
> >> tables:
> >>
> >> struct plugin_symtab
> >> {
> >> ...
> >>     struct ld_plugin_symbol_v2 *syms;
> >>     struct ld_plugin_symbol *syms_v1;
> >> ...
> >> };
> >>
> >> and the information of these should by aligned.
> >>
> >> The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will
> >> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
> >>
> >> Thoughts?
> >
> > Can't we simply have _V4/V2 use the upper half of
> > ld_plugin_symbol::def?  If the linker
> > then requests _V4 but the plugin cannot cope it could still "use" the
> > data but get
> > LDST_UNKNOWN (zero) there.
>
> Can be possible, but it's hack a bit. The plugin has a mechanisms for versioning
> and this change does not align with the idea.

I'm not sure I understand the versioning, we should aim at something where
an updated plugin can talk to old and new ld and where a new ld can also talk
to an old plugin.  That requires an arbitration which I don't see implemented?

Splitting an existing field isn't hackish IMHO.  I guess even
explicitely changing it
to one short and two char fields would be OK.

Is there a comprehensive list of plugins out in the wild using the LD
plugin API?

Note we also have to bring in gold folks (not sure if lld also
implements the same
plugin API)

> >
> > IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.
>
> Yes.
>
> > If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
> > Note there's also rodata vs data info that would be missing in case
> > we'd want tools
> > like readelf -s dump the symbol table of the IL part of an object.  It
> > looks like
> > nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
> > data sections (not sure what's that about).  It seems nm cannot distinguish
> > symbols in mergeable string sections (it dumps "R" for me there).  So intead
> > of mangling everything into enum ld_plugin_symbol_type should we instead
> > add a
> >
> > enum ld_plugin_symbol_special_section
> > {
> >    LDSSS_DEFAULT,
> >    LDSSS_BSS,
> >    LDSSS_RODATA,
> >    ...
> > }
>
> Which maps to what we have for:
>
> /* Information that is provided by all instances of the section type.  */
> struct GTY(()) section_common {
>    /* The set of SECTION_* flags that apply to this section.  */
>    unsigned int flags;
> };
>
> #define SECTION_CODE     0x00100        /* contains code */
> ...
> #define SECTION_BSS      0x02000        /* contains zeros only */
> ...
> #define SECTION_TLS      0x40000        /* contains thread-local storage */
> ...
> #define SECTION_COMMON   0x800000       /* contains common data */
> #define SECTION_RELRO    0x1000000      /* data is readonly after relocation processing */
> ...
>
> Anyway, that would be another type which we need in ld_plugin_symbol.
>
> Martin
>
> >
> > where LDSSS_DEFAULT means .text for FUNC and .data for OBJECT?
> > LDSSS_COMDAT might also apply but there's already the comdat_key
> > member which makes this info implicitely available.
> >
> > Richard.
> >
> >> Martin
> >>
> >>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 12:51                         ` Richard Biener
@ 2020-03-11 13:07                           ` Martin Liška
  2020-03-11 13:24                             ` H.J. Lu
  2020-03-11 14:16                             ` Alexander Monakov
  2020-03-11 14:15                           ` Martin Liška
  1 sibling, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-11 13:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches, Alexander Monakov

On 3/11/20 1:51 PM, Richard Biener wrote:
> I'm not sure I understand the versioning, we should aim at something where
> an updated plugin can talk to old and new ld and where a new ld can also talk
> to an old plugin.  That requires an arbitration which I don't see implemented?

So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, these contain
ld_plugin_symbol_v2 with filled ::symbol_type. ld-new will check for he new API with
LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2 and will tv_add_symbols and tv_get_symbols callbacks.

ld-old will use the old tv_add_symbols and tv_get_symbols callback.

One another level of compatibility is the LTO bytecode, where new plugin should
handle also old LTO bytecode. That's done with parsing of LTO bytecode version
(lto_section).

> 
> Splitting an existing field isn't hackish IMHO.  I guess even
> explicitely changing it
> to one short and two char fields would be OK.

Then I'm fine with that as well. Am I right that the split order
will depend of the endianess?

> 
> Is there a comprehensive list of plugins out in the wild using the LD
> plugin API?

I know only about:
$ ls /usr/lib/bfd-plugins
liblto_plugin.so.0.0.0  LLVMgold.so

and I know about Alexander Monakov (some dead code elimination plug-in).

> 
> Note we also have to bring in gold folks (not sure if lld also
> implements the same
> plugin API)

I don't see how can they be affected?

Martin



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 13:07                           ` Martin Liška
@ 2020-03-11 13:24                             ` H.J. Lu
  2020-03-11 13:28                               ` Martin Liška
  2020-03-11 14:16                             ` Alexander Monakov
  1 sibling, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-11 13:24 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Alexander Monakov, GCC Patches, Jan Hubicka

On Wed, Mar 11, 2020 at 6:09 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/11/20 1:51 PM, Richard Biener wrote:
> > I'm not sure I understand the versioning, we should aim at something where
> > an updated plugin can talk to old and new ld and where a new ld can also talk
> > to an old plugin.  That requires an arbitration which I don't see implemented?
>
> So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, these contain

ld-new don't have to use the new interface since it isn't needed.


-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 13:24                             ` H.J. Lu
@ 2020-03-11 13:28                               ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-11 13:28 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Biener, Alexander Monakov, GCC Patches, Jan Hubicka

On 3/11/20 2:24 PM, H.J. Lu wrote:
> ld-new don't have to use the new interface since it isn't needed.

Yeah. We talk about nm that should utilize the new API, right?

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 12:51                         ` Richard Biener
  2020-03-11 13:07                           ` Martin Liška
@ 2020-03-11 14:15                           ` Martin Liška
  2020-03-12 10:36                             ` Strange read for simple_object_elf_find_sections for .a file Martin Liška
  2020-03-12 12:32                             ` [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
  1 sibling, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-11 14:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

On 3/11/20 1:51 PM, Richard Biener wrote:
> Splitting an existing field isn't hackish IMHO.  I guess even
> explicitely changing it
> to one short and two char fields would be OK.

Good, I'm doing that in the updated version of that patch.
H.J. is right now working on the corresponding binutils counterpart.

Thoughts?
Martin

[-- Attachment #2: 0001-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 9013 bytes --]

From 8a2e73edb53f26a1b6a543e2e2027b2661a48f94 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

---
 gcc/lto-streamer-out.c  | 12 +++++
 include/lto-symtab.h    | 12 +++++
 include/plugin-api.h    | 30 ++++++++++++-
 lto-plugin/lto-plugin.c | 99 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 142 insertions(+), 11 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..b212be35b3d 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,17 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&c, 1);
   c = (unsigned char) visibility;
   lto_write_data (&c, 1);
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+       ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (&c, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+    }
+  lto_write_data (&section_flags, 1);
   lto_write_data (&size, 8);
   lto_write_data (&slot_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..804684449cf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,17 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +133,20 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +455,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
 };
 
 /* The plugin transfer vector.  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..bce9f183049 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX		".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	(sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -154,12 +156,12 @@ enum symbol_style
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
-static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
     }
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
 /* This little wrapper allows check to be called with a non-integer
    first argument, such as a pointer that must be non-NULL.  We can't
    use c99 bool type to coerce it into range, so we explicitly test.  */
@@ -252,6 +268,13 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
       LDPV_HIDDEN
     };
 
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE,
+    };
+
   switch (sym_style)
     {
     case ss_win32:
@@ -296,6 +319,19 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  /* Symbol type was added in GCC 10.1 (LTO version 9.0).  */
+  if (lto_header.major_version >= 9)
+    {
+      t = *p;
+      check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+      entry->symbol_type = symbol_types[t];
+      p++;
+      entry->section_flags = *p;
+      p++;
+    }
+
+  entry->unused = 0;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -431,7 +467,7 @@ finish_conflict_resolution (struct plugin_symtab *symtab,
 
   for (i = 0; i < symtab->nsyms; i++)
     { 
-      int resolution = LDPR_UNKNOWN;
+      char resolution = LDPR_UNKNOWN;
 
       if (symtab->aux[i].next_conflict == -1)
 	continue;
@@ -473,6 +509,19 @@ free_symtab (struct plugin_symtab *symtab)
   symtab->aux = NULL;
 }
 
+/* Clear new flags symbol_type and section_flags
+   from symbol table entries.  */
+
+static void
+clear_new_symtab_flags (struct plugin_symtab *out)
+{
+  for (unsigned i = 0; i < out->nsyms; i++)
+    {
+      out->syms[i].symbol_type = 0;
+      out->syms[i].section_flags = 0;
+    }
+}
+
 /*  Writes the relocations to disk. */
 
 static void
@@ -495,10 +544,16 @@ write_resolution (void)
 
       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+	get_symbols_v4 (info->handle, symtab->nsyms, syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+	{
+	  clear_new_symtab_flags (symtab);
+	  if (get_symbols_v2)
+	    get_symbols_v2 (info->handle, symtab->nsyms, syms);
+	  else
+	    get_symbols (info->handle, symtab->nsyms, syms);
+	}
 
       finish_conflict_resolution (symtab, &info->conflicts);
 
@@ -951,7 +1006,20 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
+    {
+      if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+	goto err;
+
+      ssize_t got = read (obj->file->fd, &lto_header,
+			  sizeof (struct lto_section));
+      if (got != sizeof (struct lto_section))
+	goto err;
+
+      return 1;
+    }
 
   if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
     return 1;
@@ -1080,8 +1148,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	{
+	  clear_new_symtab_flags (&lto_file.symtab);
+	  status = add_symbols (file->handle, lto_file.symtab.nsyms,
+				lto_file.symtab.syms);
+	}
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,12 +1317,18 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
 	case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 	  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
 	  break;
+	case LDPT_GET_SYMBOLS_V4:
+	  get_symbols_v4 = p->tv_u.tv_get_symbols;
+	  break;
 	case LDPT_GET_SYMBOLS_V2:
 	  get_symbols_v2 = p->tv_u.tv_get_symbols;
 	  break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 13:07                           ` Martin Liška
  2020-03-11 13:24                             ` H.J. Lu
@ 2020-03-11 14:16                             ` Alexander Monakov
  1 sibling, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-03-11 14:16 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, Jan Hubicka, GCC Patches

On Wed, 11 Mar 2020, Martin Liška wrote:

> > Is there a comprehensive list of plugins out in the wild using the LD
> > plugin API?
> 
> I know only about:
> $ ls /usr/lib/bfd-plugins
> liblto_plugin.so.0.0.0  LLVMgold.so
> 
> and I know about Alexander Monakov (some dead code elimination plug-in).

I don't recall seeing any other plugin when working on our "debloating"
stuff, but of course I suppose people could have used the plugin API
for experimentation.  I don't think a comprehensive list could exist.

> > Note we also have to bring in gold folks (not sure if lld also
> > implements the same plugin API)
> 
> I don't see how can they be affected?

As discussed downthread the other linkers could freely ignore the
new callback, but fwiw LLD has no plans to implement the plugin-api.h
interface: https://bugs.llvm.org/show_bug.cgi?id=42446

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Strange read for simple_object_elf_find_sections for .a file
  2020-03-11 14:15                           ` Martin Liška
@ 2020-03-12 10:36                             ` Martin Liška
  2020-03-12 10:39                               ` Martin Liška
  2020-03-12 12:32                             ` [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
  1 sibling, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-12 10:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

Hi.

I noticed a strange error during parsing of .gnu.lto_.lto section
in LTO plugin:

$ cat bss.c
int global_zero;
int global_one = 1;

int main() { return 0; }

$ gcc -flto bss.c -c
$ gcc bss.o
read version 9.0

while:

$ ar r x.a bss.o
ar: creating x.a

$ gcc x.a
read version 13617.13368

$ objdump -s -j .gnu.lto_.lto.655b2ec59df60b19 x.a
In archive x.a:

bss.o:     file format elf64-x86-64

Contents of section .gnu.lto_.lto.655b2ec59df60b19:
  0000 09000000 01000100                    ........

which is the same as objdump of bss.o

Any idea why that happens?
Thanks,
Martin

[-- Attachment #2: 0001-Test-x.patch --]
[-- Type: text/x-patch, Size: 2733 bytes --]

From f12831ebf6ed68b569f6d0c6a9243dfa15ba3b40 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 12 Mar 2020 11:30:17 +0100
Subject: [PATCH] Test x.

---
 lto-plugin/lto-plugin.c | 57 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..fbe2c6885f0 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX         ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN     (sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -1010,6 +1012,52 @@ process_offload_section (void *data, const char *name, off_t offset, off_t len)
   return 1;
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
+
+/* Find and parse .lto section of an object file.  */
+
+static int
+process_lto_section (void *data, const char *name, off_t offset, off_t len)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
+    {
+      if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+	goto err;
+
+      ssize_t got = read (obj->file->fd, &lto_header,
+			  sizeof (struct lto_section));
+      if (got != sizeof (struct lto_section))
+	goto err;
+
+      fprintf (stderr, "read version %d.%d\n", lto_header.major_version,
+	       lto_header.minor_version);
+
+      return 0;
+    }
+
+  return 1;
+
+err:
+  if (message)
+    message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  obj->found = 0;
+  return 0;
+}
+
 /* Callback used by gold to check if the plugin will claim FILE. Writes
    the result in CLAIMED. */
 
@@ -1055,8 +1103,13 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
     goto err;
 
-  if (obj.objfile)
-    errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+   if (obj.objfile)
+    {
+      errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+
+      if (!errmsg)
+	errmsg = simple_object_find_sections (obj.objfile, process_lto_section, &obj, &err);
+    }
 
   if (!obj.objfile || errmsg)
     {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Strange read for simple_object_elf_find_sections for .a file
  2020-03-12 10:36                             ` Strange read for simple_object_elf_find_sections for .a file Martin Liška
@ 2020-03-12 10:39                               ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-12 10:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka

The answer is simple:


diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index fbe2c6885f0..83f1ac38d37 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1033,6 +1033,7 @@ static int
  process_lto_section (void *data, const char *name, off_t offset, off_t len)
  {
    struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  offset += obj->file->offset;
    if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
      {
        if (offset != lseek (obj->file->fd, offset, SEEK_SET))

Sorry for the noise.
Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-11 14:15                           ` Martin Liška
  2020-03-12 10:36                             ` Strange read for simple_object_elf_find_sections for .a file Martin Liška
@ 2020-03-12 12:32                             ` Martin Liška
  2020-03-12 12:55                               ` Jan Hubicka
  2020-03-12 13:31                               ` Martin Liška
  1 sibling, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-12 12:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

Hi.

I'm sending extended version of the patch where I address Honza's note
about backward compatibility. So I add a next section _lto.extension_symtab
where I put new bits. With that, new LTO bytecode can be opened with older
LTO plugin.

Moreover, I've also tested lto.exp tests and binutils master. And I can confirm
that patches work with H.J.'s counterpart:
https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/plugin/master

$ cat bss.c
int global_zero;
int global_one = 1;

int main() { return 0; }

$ gcc -c -flto bss.c

Old bintuils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin /home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
00000000 T global_one
00000000 T global_zero
00000000 T main

H.J.'s binutils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin /home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
00000000 D global_one
00000000 B global_zero
00000000 T main

System binutils (with old plugin):

$ nm /tmp/bss.o
00000000 T global_one
00000000 T global_zero
00000000 T main

Hope I'm done now.
Thoughts?

Thanks,
Martin

[-- Attachment #2: 0002-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 14454 bytes --]

From 8968bbdf627dc26056f1cb344e253eedae0efca1 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c    |   3 +-
 gcc/lto-streamer-out.c  |  64 +++++++++++++++-
 gcc/lto-streamer.h      |   1 +
 include/lto-symtab.h    |  12 +++
 lto-plugin/lto-plugin.c | 165 +++++++++++++++++++++++++++++++++++-----
 5 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..7bf59c513fc 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "extension_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..2fd20252ba5 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,6 +2778,25 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&slot_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+       ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (&c, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+    }
+  lto_write_data (&section_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
    SET and VSET are cgraph/varpool node sets we are outputting.  */
 
@@ -2818,6 +2838,45 @@ produce_symtab (struct output_block *ob)
   lto_end_section ();
 }
 
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob)
+{
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+					     NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+     First write everything defined and then all declarations.
+     This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+    }
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+    }
+
+  lto_end_section ();
+}
+
 
 /* Init the streamer_mode_table for output, where we collect info on what
    machine_mode values have been streamed.  */
@@ -3001,7 +3060,10 @@ produce_asm_for_decls (void)
   /* Write the symbol table.  It is used by linker to determine dependencies
      and thus we can skip it for WPA.  */
   if (!flag_wpa)
-    produce_symtab (ob);
+    {
+      produce_symtab (ob);
+      produce_symtab_extension (ob);
+    }
 
   /* Write command line opts.  */
   lto_write_options ();
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..4f82b439360 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -236,6 +236,7 @@ enum lto_section_type
   LTO_section_ipa_hsa,
   LTO_section_lto,
   LTO_section_ipa_sra,
+  LTO_section_symtab_extension,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..78c8672f160 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -88,10 +88,14 @@ along with this program; see the file COPYING3.  If not see
 
 /* LTO magic section name.  */
 
-#define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
-#define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
-#define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
-#define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
+#define LTO_SYMTAB_PREFIX	    ".gnu.lto_.symtab"
+#define LTO_SYMTAB_PREFIX_LEN	    (sizeof (LTO_SYMTAB_PREFIX) - 1)
+#define LTO_SYMTAB_EXT_PREFIX	    ".gnu.lto_.extension_symtab"
+#define LTO_SYMTAB_EXT_PREFIX_LEN   (sizeof (LTO_SYMTAB_EXT_PREFIX) - 1)
+#define LTO_LTO_PREFIX		    ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	    (sizeof (LTO_LTO_PREFIX) - 1)
+#define OFFLOAD_SECTION		    ".gnu.offload_lto_.opts"
+#define OFFLOAD_SECTION_LEN	    (sizeof (OFFLOAD_SECTION) - 1)
 
 /* The part of the symbol table the plugin has to keep track of. Note that we
    must keep SYMS until all_symbols_read is called to give the linker time to
@@ -154,12 +158,12 @@ enum symbol_style
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
-static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -231,7 +235,7 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
    Returns the address of the next entry. */
 
 static char *
-parse_table_entry (char *p, struct ld_plugin_symbol *entry, 
+parse_table_entry (char *p, struct ld_plugin_symbol *entry,
 		   struct sym_aux *aux)
 {
   unsigned char t;
@@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  entry->unused = 0;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -309,6 +315,32 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   return p;
 }
 
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE,
+    };
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}
+
+
 /* Translate the IL symbol table located between DATA and END. Append the
    slots and symbols to OUT. */
 
@@ -339,6 +371,13 @@ translate (char *data, char *end, struct plugin_symtab *out)
   out->aux = aux;
 }
 
+static void
+parse_symtab_extension (char *data, char *end, struct plugin_symtab *out)
+{
+  for (unsigned i = 0; i < out->nsyms; i++)
+    data = parse_table_entry_extension (data, &out->syms[i]);
+}
+
 /* Free all memory that is no longer needed after writing the symbol
    resolution. */
 
@@ -431,7 +470,7 @@ finish_conflict_resolution (struct plugin_symtab *symtab,
 
   for (i = 0; i < symtab->nsyms; i++)
     { 
-      int resolution = LDPR_UNKNOWN;
+      char resolution = LDPR_UNKNOWN;
 
       if (symtab->aux[i].next_conflict == -1)
 	continue;
@@ -473,6 +512,19 @@ free_symtab (struct plugin_symtab *symtab)
   symtab->aux = NULL;
 }
 
+/* Clear new flags symbol_type and section_flags
+   from symbol table entries.  */
+
+static void
+clear_new_symtab_flags (struct plugin_symtab *out)
+{
+  for (unsigned i = 0; i < out->nsyms; i++)
+    {
+      out->syms[i].symbol_type = 0;
+      out->syms[i].section_flags = 0;
+    }
+}
+
 /*  Writes the relocations to disk. */
 
 static void
@@ -495,10 +547,16 @@ write_resolution (void)
 
       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+	get_symbols_v4 (info->handle, symtab->nsyms, syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+	{
+	  clear_new_symtab_flags (symtab);
+	  if (get_symbols_v2)
+	    get_symbols_v2 (info->handle, symtab->nsyms, syms);
+	  else
+	    get_symbols (info->handle, symtab->nsyms, syms);
+	}
 
       finish_conflict_resolution (symtab, &info->conflicts);
 
@@ -951,9 +1009,9 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
 
-  if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
+  if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
     return 1;
 
   s = strrchr (name, '.');
@@ -995,6 +1053,59 @@ err:
   return 0;
 }
 
+/* Process one section of an object file.  */
+
+static int
+process_symtab_extension (void *data, const char *name, off_t offset,
+			  off_t length)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  char *s;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_SYMTAB_EXT_PREFIX, LTO_SYMTAB_EXT_PREFIX_LEN) != 0)
+    return 1;
+
+  s = strrchr (name, '.');
+  if (s)
+    sscanf (s, ".%" PRI_LL "x", &obj->out->id);
+  secdata = secdatastart = xmalloc (length);
+  offset += obj->file->offset;
+  if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+    goto err;
+
+  do
+    {
+      ssize_t got = read (obj->file->fd, secdata, length);
+      if (got == 0)
+	break;
+      else if (got > 0)
+	{
+	  secdata += got;
+	  length -= got;
+	}
+      else if (errno != EINTR)
+	goto err;
+    }
+  while (length > 0);
+  if (length > 0)
+    goto err;
+
+  parse_symtab_extension (secdatastart, secdata, obj->out);
+  obj->found++;
+  free (secdatastart);
+  return 1;
+
+err:
+  if (message)
+    message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  /* Force claim_file_handler to abandon this file.  */
+  obj->found = 0;
+  free (secdatastart);
+  return 0;
+}
+
+
 /* Find an offload section of an object file.  */
 
 static int
@@ -1055,8 +1166,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
     goto err;
 
-  if (obj.objfile)
-    errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+   if (obj.objfile)
+    {
+      errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj,
+					    &err);
+      if (!errmsg)
+	errmsg = simple_object_find_sections (obj.objfile,
+					      process_symtab_extension,
+					      &obj, &err);
+    }
 
   if (!obj.objfile || errmsg)
     {
@@ -1080,8 +1198,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	{
+	  clear_new_symtab_flags (&lto_file.symtab);
+	  status = add_symbols (file->handle, lto_file.symtab.nsyms,
+				lto_file.symtab.syms);
+	}
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,12 +1367,18 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
 	case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 	  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
 	  break;
+	case LDPT_GET_SYMBOLS_V4:
+	  get_symbols_v4 = p->tv_u.tv_get_symbols;
+	  break;
 	case LDPT_GET_SYMBOLS_V2:
 	  get_symbols_v2 = p->tv_u.tv_get_symbols;
 	  break;
-- 
2.25.1


[-- Attachment #3: 0001-Update-include-plugin-api.h.patch --]
[-- Type: text/x-patch, Size: 1747 bytes --]

From e9b12e59d1ce3042f528ed4d9cfdbdb22270dfaf Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 12 Mar 2020 13:15:48 +0100
Subject: [PATCH 1/2] Update include/plugin-api.h.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* plugin-api.h (struct ld_plugin_symbol): Split
	int def into 4 char fields.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_symbol_section_flags): New.
	(enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
	LDPT_GET_SYMBOLS_V4.
---
 include/plugin-api.h | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..804684449cf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,17 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +133,20 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +455,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
 };
 
 /* The plugin transfer vector.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 12:32                             ` [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
@ 2020-03-12 12:55                               ` Jan Hubicka
  2020-03-12 13:10                                 ` Martin Liška
  2020-03-12 13:31                               ` Martin Liška
  1 sibling, 1 reply; 55+ messages in thread
From: Jan Hubicka @ 2020-03-12 12:55 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches, H.J. Lu

> gcc/ChangeLog:
> 
> 2020-03-12  Martin Liska  <mliska@suse.cz>
> 
> 	* lto-section-in.c: Add extension_symtab.
> 	* lto-streamer-out.c (write_symbol_extension_info):
> 	New.
> 	(produce_symtab_extension): New.
> 	(produce_asm_for_decls): Stream also produce_symtab_extension.
> 	* lto-streamer.h (enum lto_section_type): New section.
> 
> include/ChangeLog:
> 
> 2020-03-12  Martin Liska  <mliska@suse.cz>
> 
> 	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
> 	(enum gcc_plugin_symbol_section_flags): Likewise.
> 
> lto-plugin/ChangeLog:
> 
> 2020-03-12  Martin Liska  <mliska@suse.cz>
> 
> 	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
> 	...
> 	(LTO_SYMTAB_PREFIX): ... this.
> 	(LTO_SECTION_PREFIX_LEN): Rename to ...
> 	(LTO_SYMTAB_PREFIX_LEN): ... this.
> 	(LTO_SYMTAB_EXT_PREFIX): New.
> 	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
> 	(LTO_LTO_PREFIX): New.
> 	(LTO_LTO_PREFIX_LEN): New.
> 	(parse_table_entry): Fill up unused to zero.
> 	(parse_table_entry_extension): New.
> 	(parse_symtab_extension): New.
> 	(finish_conflict_resolution): Change type
> 	for resolution.
> 	(clear_new_symtab_flags): New.
> 	(write_resolution): Support new get_symbols_v4.
> 	(process_symtab): Use new macro name.
> 	(process_symtab_extension): New.
> 	(claim_file_handler): Parse also process_symtab_extension.
> 	(onload): Call new add_symbols_v2.
Hi,
thanks for updating the patch.  Two minor nits
> ---
>  gcc/lto-section-in.c    |   3 +-
>  gcc/lto-streamer-out.c  |  64 +++++++++++++++-
>  gcc/lto-streamer.h      |   1 +
>  include/lto-symtab.h    |  12 +++
>  lto-plugin/lto-plugin.c | 165 +++++++++++++++++++++++++++++++++++-----
>  5 files changed, 226 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index c17dd69dbdd..7bf59c513fc 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>    "mode_table",
>    "hsa",
>    "lto",
> -  "ipa_sra"
> +  "ipa_sra",
> +  "extension_symtab"
I would call it symtab_ext so alphabetically it is coupled with symtab.
I wonder if moving it up in the enum would make it physically appear
next to .symtab in object file so we save a bit of i/o?
> +static void
> +produce_symtab_extension (struct output_block *ob)
> +{
> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> +					     NULL, 0, NULL);
> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> +  lto_symtab_encoder_iterator lsei;
> +
> +  lto_begin_section (section_name, false);
> +  free (section_name);
> +
> +  /* Write the symbol table.
> +     First write everything defined and then all declarations.
> +     This is necessary to handle cases where we have duplicated symbols.  */
> +  for (lsei = lsei_start (encoder);
> +       !lsei_end_p (lsei); lsei_next (&lsei))
> +    {
> +      symtab_node *node = lsei_node (lsei);
> +
> +      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
> +	continue;
> +      write_symbol_extension_info (node->decl);
> +    }
> +  for (lsei = lsei_start (encoder);
> +       !lsei_end_p (lsei); lsei_next (&lsei))
> +    {
> +      symtab_node *node = lsei_node (lsei);
> +
> +      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
> +	continue;
> +      write_symbol_extension_info (node->decl);
> +    }
> +
> +  lto_end_section ();
> +}

It seems fragile to me to duplicate the loops that needs to be kept in
sync because breaking that will likely get bit suprising outcome (i.e.
bootstrap will work since we do not care about symbol types). Perhaps it
would be more robust to simply push the extension bits into a vector in
write_symbol for later streaming.  Or at least add comment that loops
needs to be kept in sync.
> diff --git a/include/plugin-api.h b/include/plugin-api.h
> index 09e1202df07..804684449cf 100644
> --- a/include/plugin-api.h
> +++ b/include/plugin-api.h
> @@ -87,7 +87,17 @@ struct ld_plugin_symbol
>  {
>    char *name;
>    char *version;
> -  int def;
> +#ifdef __BIG_ENDIAN__
> +  char unused;
> +  char section_flags;
> +  char symbol_type;
> +  char def;
> +#else
> +  char def;
> +  char symbol_type;
> +  char section_flags;
> +  char unused;
> +#endif
Perhaps at least drop a comment why this is done this way (i.e.
compatibility with V3 version of API) so in 10 years we know?

Bit more systematic way would be to add a new hook to query extended
properties of a given symbol. I.e. something like
void *get_symbol_property (symbol, enum property)

Honza

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 12:55                               ` Jan Hubicka
@ 2020-03-12 13:10                                 ` Martin Liška
  2020-03-12 13:15                                   ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-12 13:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, GCC Patches, H.J. Lu

On 3/12/20 1:55 PM, Jan Hubicka wrote:
>> gcc/ChangeLog:
>>
>> 2020-03-12  Martin Liska  <mliska@suse.cz>
>>
>> 	* lto-section-in.c: Add extension_symtab.
>> 	* lto-streamer-out.c (write_symbol_extension_info):
>> 	New.
>> 	(produce_symtab_extension): New.
>> 	(produce_asm_for_decls): Stream also produce_symtab_extension.
>> 	* lto-streamer.h (enum lto_section_type): New section.
>>
>> include/ChangeLog:
>>
>> 2020-03-12  Martin Liska  <mliska@suse.cz>
>>
>> 	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
>> 	(enum gcc_plugin_symbol_section_flags): Likewise.
>>
>> lto-plugin/ChangeLog:
>>
>> 2020-03-12  Martin Liska  <mliska@suse.cz>
>>
>> 	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
>> 	...
>> 	(LTO_SYMTAB_PREFIX): ... this.
>> 	(LTO_SECTION_PREFIX_LEN): Rename to ...
>> 	(LTO_SYMTAB_PREFIX_LEN): ... this.
>> 	(LTO_SYMTAB_EXT_PREFIX): New.
>> 	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
>> 	(LTO_LTO_PREFIX): New.
>> 	(LTO_LTO_PREFIX_LEN): New.
>> 	(parse_table_entry): Fill up unused to zero.
>> 	(parse_table_entry_extension): New.
>> 	(parse_symtab_extension): New.
>> 	(finish_conflict_resolution): Change type
>> 	for resolution.
>> 	(clear_new_symtab_flags): New.
>> 	(write_resolution): Support new get_symbols_v4.
>> 	(process_symtab): Use new macro name.
>> 	(process_symtab_extension): New.
>> 	(claim_file_handler): Parse also process_symtab_extension.
>> 	(onload): Call new add_symbols_v2.
> Hi,
> thanks for updating the patch.  Two minor nits
>> ---
>>   gcc/lto-section-in.c    |   3 +-
>>   gcc/lto-streamer-out.c  |  64 +++++++++++++++-
>>   gcc/lto-streamer.h      |   1 +
>>   include/lto-symtab.h    |  12 +++
>>   lto-plugin/lto-plugin.c | 165 +++++++++++++++++++++++++++++++++++-----
>>   5 files changed, 226 insertions(+), 19 deletions(-)
>>
>> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
>> index c17dd69dbdd..7bf59c513fc 100644
>> --- a/gcc/lto-section-in.c
>> +++ b/gcc/lto-section-in.c
>> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>>     "mode_table",
>>     "hsa",
>>     "lto",
>> -  "ipa_sra"
>> +  "ipa_sra",
>> +  "extension_symtab"
> I would call it symtab_ext so alphabetically it is coupled with symtab.

It's problematic because of this collision where we search for prefix:

if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)

Note that the section names have suffix:
.gnu.lto_.symtab.48f1e54b1c048b0f

> I wonder if moving it up in the enum would make it physically appear
> next to .symtab in object file so we save a bit of i/o?

It's not about names order, but streaming order. Which should be fine:

$ readelf -SW /tmp/bss.o
...
   [10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS        0000000000000000 0001dd 0002de 00   E  0   0  1
   [11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS        0000000000000000 0004bb 000049 00   E  0   0  1
   [12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS        0000000000000000 000504 000006 00   E  0   0  1
   [13] .gnu.lto_.opts    PROGBITS        0000000000000000 00050a 000051 00   E  0   0  1
...


>> +static void
>> +produce_symtab_extension (struct output_block *ob)
>> +{
>> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
>> +					     NULL, 0, NULL);
>> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
>> +  lto_symtab_encoder_iterator lsei;
>> +
>> +  lto_begin_section (section_name, false);
>> +  free (section_name);
>> +
>> +  /* Write the symbol table.
>> +     First write everything defined and then all declarations.
>> +     This is necessary to handle cases where we have duplicated symbols.  */
>> +  for (lsei = lsei_start (encoder);
>> +       !lsei_end_p (lsei); lsei_next (&lsei))
>> +    {
>> +      symtab_node *node = lsei_node (lsei);
>> +
>> +      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
>> +	continue;
>> +      write_symbol_extension_info (node->decl);
>> +    }
>> +  for (lsei = lsei_start (encoder);
>> +       !lsei_end_p (lsei); lsei_next (&lsei))
>> +    {
>> +      symtab_node *node = lsei_node (lsei);
>> +
>> +      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
>> +	continue;
>> +      write_symbol_extension_info (node->decl);
>> +    }
>> +
>> +  lto_end_section ();
>> +}
> 
> It seems fragile to me to duplicate the loops that needs to be kept in
> sync because breaking that will likely get bit suprising outcome (i.e.
> bootstrap will work since we do not care about symbol types). Perhaps it
> would be more robust to simply push the extension bits into a vector in
> write_symbol for later streaming.  Or at least add comment that loops
> needs to be kept in sync.

I'll add comment.

>> diff --git a/include/plugin-api.h b/include/plugin-api.h
>> index 09e1202df07..804684449cf 100644
>> --- a/include/plugin-api.h
>> +++ b/include/plugin-api.h
>> @@ -87,7 +87,17 @@ struct ld_plugin_symbol
>>   {
>>     char *name;
>>     char *version;
>> -  int def;
>> +#ifdef __BIG_ENDIAN__
>> +  char unused;
>> +  char section_flags;
>> +  char symbol_type;
>> +  char def;
>> +#else
>> +  char def;
>> +  char symbol_type;
>> +  char section_flags;
>> +  char unused;
>> +#endif
> Perhaps at least drop a comment why this is done this way (i.e.
> compatibility with V3 version of API) so in 10 years we know?

Likewise here.

> 
> Bit more systematic way would be to add a new hook to query extended
> properties of a given symbol. I.e. something like
> void *get_symbol_property (symbol, enum property)

I know, but it would complicate code on both consumer and LTO plugin side.

Martin

> 
> Honza
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 13:10                                 ` Martin Liška
@ 2020-03-12 13:15                                   ` Richard Biener
  2020-03-12 13:24                                     ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-12 13:15 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches, H.J. Lu

On Thu, Mar 12, 2020 at 2:11 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/12/20 1:55 PM, Jan Hubicka wrote:
> >> gcc/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  <mliska@suse.cz>
> >>
> >>      * lto-section-in.c: Add extension_symtab.
> >>      * lto-streamer-out.c (write_symbol_extension_info):
> >>      New.
> >>      (produce_symtab_extension): New.
> >>      (produce_asm_for_decls): Stream also produce_symtab_extension.
> >>      * lto-streamer.h (enum lto_section_type): New section.
> >>
> >> include/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  <mliska@suse.cz>
> >>
> >>      * lto-symtab.h (enum gcc_plugin_symbol_type): New.
> >>      (enum gcc_plugin_symbol_section_flags): Likewise.
> >>
> >> lto-plugin/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  <mliska@suse.cz>
> >>
> >>      * lto-plugin.c (LTO_SECTION_PREFIX): Rename to
> >>      ...
> >>      (LTO_SYMTAB_PREFIX): ... this.
> >>      (LTO_SECTION_PREFIX_LEN): Rename to ...
> >>      (LTO_SYMTAB_PREFIX_LEN): ... this.
> >>      (LTO_SYMTAB_EXT_PREFIX): New.
> >>      (LTO_SYMTAB_EXT_PREFIX_LEN): New.
> >>      (LTO_LTO_PREFIX): New.
> >>      (LTO_LTO_PREFIX_LEN): New.
> >>      (parse_table_entry): Fill up unused to zero.
> >>      (parse_table_entry_extension): New.
> >>      (parse_symtab_extension): New.
> >>      (finish_conflict_resolution): Change type
> >>      for resolution.
> >>      (clear_new_symtab_flags): New.
> >>      (write_resolution): Support new get_symbols_v4.
> >>      (process_symtab): Use new macro name.
> >>      (process_symtab_extension): New.
> >>      (claim_file_handler): Parse also process_symtab_extension.
> >>      (onload): Call new add_symbols_v2.
> > Hi,
> > thanks for updating the patch.  Two minor nits
> >> ---
> >>   gcc/lto-section-in.c    |   3 +-
> >>   gcc/lto-streamer-out.c  |  64 +++++++++++++++-
> >>   gcc/lto-streamer.h      |   1 +
> >>   include/lto-symtab.h    |  12 +++
> >>   lto-plugin/lto-plugin.c | 165 +++++++++++++++++++++++++++++++++++-----
> >>   5 files changed, 226 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >> index c17dd69dbdd..7bf59c513fc 100644
> >> --- a/gcc/lto-section-in.c
> >> +++ b/gcc/lto-section-in.c
> >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >>     "mode_table",
> >>     "hsa",
> >>     "lto",
> >> -  "ipa_sra"
> >> +  "ipa_sra",
> >> +  "extension_symtab"
> > I would call it symtab_ext so alphabetically it is coupled with symtab.

Maybe ext_symtab then, proper enlish for the long form would be
extended_symtab I guess.

> It's problematic because of this collision where we search for prefix:
>
> if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
>
> Note that the section names have suffix:
> .gnu.lto_.symtab.48f1e54b1c048b0f
>
> > I wonder if moving it up in the enum would make it physically appear
> > next to .symtab in object file so we save a bit of i/o?
>
> It's not about names order, but streaming order. Which should be fine:
>
> $ readelf -SW /tmp/bss.o
> ...
>    [10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS        0000000000000000 0001dd 0002de 00   E  0   0  1
>    [11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS        0000000000000000 0004bb 000049 00   E  0   0  1
>    [12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS        0000000000000000 000504 000006 00   E  0   0  1
>    [13] .gnu.lto_.opts    PROGBITS        0000000000000000 00050a 000051 00   E  0   0  1
> ...
>
>
> >> +static void
> >> +produce_symtab_extension (struct output_block *ob)
> >> +{
> >> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> >> +                                         NULL, 0, NULL);
> >> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> >> +  lto_symtab_encoder_iterator lsei;
> >> +
> >> +  lto_begin_section (section_name, false);
> >> +  free (section_name);
> >> +
> >> +  /* Write the symbol table.
> >> +     First write everything defined and then all declarations.
> >> +     This is necessary to handle cases where we have duplicated symbols.  */
> >> +  for (lsei = lsei_start (encoder);
> >> +       !lsei_end_p (lsei); lsei_next (&lsei))
> >> +    {
> >> +      symtab_node *node = lsei_node (lsei);
> >> +
> >> +      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
> >> +    continue;
> >> +      write_symbol_extension_info (node->decl);
> >> +    }
> >> +  for (lsei = lsei_start (encoder);
> >> +       !lsei_end_p (lsei); lsei_next (&lsei))
> >> +    {
> >> +      symtab_node *node = lsei_node (lsei);
> >> +
> >> +      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
> >> +    continue;
> >> +      write_symbol_extension_info (node->decl);
> >> +    }
> >> +
> >> +  lto_end_section ();
> >> +}
> >
> > It seems fragile to me to duplicate the loops that needs to be kept in
> > sync because breaking that will likely get bit suprising outcome (i.e.
> > bootstrap will work since we do not care about symbol types). Perhaps it
> > would be more robust to simply push the extension bits into a vector in
> > write_symbol for later streaming.  Or at least add comment that loops
> > needs to be kept in sync.
>
> I'll add comment.
>
> >> diff --git a/include/plugin-api.h b/include/plugin-api.h
> >> index 09e1202df07..804684449cf 100644
> >> --- a/include/plugin-api.h
> >> +++ b/include/plugin-api.h
> >> @@ -87,7 +87,17 @@ struct ld_plugin_symbol
> >>   {
> >>     char *name;
> >>     char *version;
> >> -  int def;
> >> +#ifdef __BIG_ENDIAN__
> >> +  char unused;
> >> +  char section_flags;
> >> +  char symbol_type;
> >> +  char def;
> >> +#else

#elif defined __LITTLE_ENDIAN__

> >> +  char def;
> >> +  char symbol_type;
> >> +  char section_flags;
> >> +  char unused;

#else
#error ...
#endif

better be safe than sorry ;)

> >> +#endif
> > Perhaps at least drop a comment why this is done this way (i.e.
> > compatibility with V3 version of API) so in 10 years we know?
>
> Likewise here.
>
> >
> > Bit more systematic way would be to add a new hook to query extended
> > properties of a given symbol. I.e. something like
> > void *get_symbol_property (symbol, enum property)
>
> I know, but it would complicate code on both consumer and LTO plugin side.
>
> Martin
>
> >
> > Honza
> >
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 13:15                                   ` Richard Biener
@ 2020-03-12 13:24                                     ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-12 13:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches, H.J. Lu

On 3/12/20 2:15 PM, Richard Biener wrote:
> #elif defined __LITTLE_ENDIAN__

Hmm, the macro is not defined. Even a simple test-case shows that:

$ cat le.c
#ifdef __LITTLE_ENDIAN__
asdfa
#endif

$ gcc le.c -c
[no error]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 12:32                             ` [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
  2020-03-12 12:55                               ` Jan Hubicka
@ 2020-03-12 13:31                               ` Martin Liška
  2020-03-16 11:12                                 ` H.J. Lu
  2020-03-17 23:27                                 ` Jan Hubicka
  1 sibling, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-12 13:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and produce_symtab_extension

Martin

[-- Attachment #2: 0002-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 15444 bytes --]

From ad24565cfb3166fdd9995381b25c1f558c7bcf8c Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c    |   3 +-
 gcc/lto-streamer-out.c  |  76 +++++++++++++++++-
 gcc/lto-streamer.h      |   1 +
 include/lto-symtab.h    |  12 +++
 lto-plugin/lto-plugin.c | 165 +++++++++++++++++++++++++++++++++++-----
 5 files changed, 237 insertions(+), 20 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..e459ec91856 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&slot_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+       ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (&c, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+    }
+  lto_write_data (&section_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
    SET and VSET are cgraph/varpool node sets we are outputting.  */
 
-static void
+static unsigned int
 produce_symtab (struct output_block *ob)
 {
+  unsigned int streamed_symbols = 0;
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob)
       if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
     }
   for (lsei = lsei_start (encoder);
        !lsei_end_p (lsei); lsei_next (&lsei))
@@ -2813,8 +2835,55 @@ produce_symtab (struct output_block *ob)
       if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
+    }
+
+  lto_end_section ();
+
+  return streamed_symbols;
+}
+
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob,
+			  unsigned int previous_streamed_symbols)
+{
+  unsigned int streamed_symbols = 0;
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+					     NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+     First write everything defined and then all declarations.
+     This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
     }
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
 
+      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
+    }
+
+  gcc_assert (previous_streamed_symbols == streamed_symbols);
   lto_end_section ();
 }
 
@@ -3001,7 +3070,10 @@ produce_asm_for_decls (void)
   /* Write the symbol table.  It is used by linker to determine dependencies
      and thus we can skip it for WPA.  */
   if (!flag_wpa)
-    produce_symtab (ob);
+    {
+      unsigned int streamed_symbols = produce_symtab (ob);
+      produce_symtab_extension (ob, streamed_symbols);
+    }
 
   /* Write command line opts.  */
   lto_write_options ();
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..4f82b439360 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -236,6 +236,7 @@ enum lto_section_type
   LTO_section_ipa_hsa,
   LTO_section_lto,
   LTO_section_ipa_sra,
+  LTO_section_symtab_extension,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..f1f93921ba4 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -88,10 +88,14 @@ along with this program; see the file COPYING3.  If not see
 
 /* LTO magic section name.  */
 
-#define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
-#define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
-#define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
-#define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
+#define LTO_SYMTAB_PREFIX	    ".gnu.lto_.symtab"
+#define LTO_SYMTAB_PREFIX_LEN	    (sizeof (LTO_SYMTAB_PREFIX) - 1)
+#define LTO_SYMTAB_EXT_PREFIX	    ".gnu.lto_.ext_symtab"
+#define LTO_SYMTAB_EXT_PREFIX_LEN   (sizeof (LTO_SYMTAB_EXT_PREFIX) - 1)
+#define LTO_LTO_PREFIX		    ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	    (sizeof (LTO_LTO_PREFIX) - 1)
+#define OFFLOAD_SECTION		    ".gnu.offload_lto_.opts"
+#define OFFLOAD_SECTION_LEN	    (sizeof (OFFLOAD_SECTION) - 1)
 
 /* The part of the symbol table the plugin has to keep track of. Note that we
    must keep SYMS until all_symbols_read is called to give the linker time to
@@ -154,12 +158,12 @@ enum symbol_style
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
-static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -231,7 +235,7 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
    Returns the address of the next entry. */
 
 static char *
-parse_table_entry (char *p, struct ld_plugin_symbol *entry, 
+parse_table_entry (char *p, struct ld_plugin_symbol *entry,
 		   struct sym_aux *aux)
 {
   unsigned char t;
@@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  entry->unused = 0;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -309,6 +315,32 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   return p;
 }
 
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE,
+    };
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}
+
+
 /* Translate the IL symbol table located between DATA and END. Append the
    slots and symbols to OUT. */
 
@@ -339,6 +371,13 @@ translate (char *data, char *end, struct plugin_symtab *out)
   out->aux = aux;
 }
 
+static void
+parse_symtab_extension (char *data, char *end, struct plugin_symtab *out)
+{
+  for (unsigned i = 0; i < out->nsyms; i++)
+    data = parse_table_entry_extension (data, &out->syms[i]);
+}
+
 /* Free all memory that is no longer needed after writing the symbol
    resolution. */
 
@@ -431,7 +470,7 @@ finish_conflict_resolution (struct plugin_symtab *symtab,
 
   for (i = 0; i < symtab->nsyms; i++)
     { 
-      int resolution = LDPR_UNKNOWN;
+      char resolution = LDPR_UNKNOWN;
 
       if (symtab->aux[i].next_conflict == -1)
 	continue;
@@ -473,6 +512,19 @@ free_symtab (struct plugin_symtab *symtab)
   symtab->aux = NULL;
 }
 
+/* Clear new flags symbol_type and section_flags
+   from symbol table entries.  */
+
+static void
+clear_new_symtab_flags (struct plugin_symtab *out)
+{
+  for (unsigned i = 0; i < out->nsyms; i++)
+    {
+      out->syms[i].symbol_type = 0;
+      out->syms[i].section_flags = 0;
+    }
+}
+
 /*  Writes the relocations to disk. */
 
 static void
@@ -495,10 +547,16 @@ write_resolution (void)
 
       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+	get_symbols_v4 (info->handle, symtab->nsyms, syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+	{
+	  clear_new_symtab_flags (symtab);
+	  if (get_symbols_v2)
+	    get_symbols_v2 (info->handle, symtab->nsyms, syms);
+	  else
+	    get_symbols (info->handle, symtab->nsyms, syms);
+	}
 
       finish_conflict_resolution (symtab, &info->conflicts);
 
@@ -951,9 +1009,9 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
 
-  if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
+  if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
     return 1;
 
   s = strrchr (name, '.');
@@ -995,6 +1053,59 @@ err:
   return 0;
 }
 
+/* Process one section of an object file.  */
+
+static int
+process_symtab_extension (void *data, const char *name, off_t offset,
+			  off_t length)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  char *s;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_SYMTAB_EXT_PREFIX, LTO_SYMTAB_EXT_PREFIX_LEN) != 0)
+    return 1;
+
+  s = strrchr (name, '.');
+  if (s)
+    sscanf (s, ".%" PRI_LL "x", &obj->out->id);
+  secdata = secdatastart = xmalloc (length);
+  offset += obj->file->offset;
+  if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+    goto err;
+
+  do
+    {
+      ssize_t got = read (obj->file->fd, secdata, length);
+      if (got == 0)
+	break;
+      else if (got > 0)
+	{
+	  secdata += got;
+	  length -= got;
+	}
+      else if (errno != EINTR)
+	goto err;
+    }
+  while (length > 0);
+  if (length > 0)
+    goto err;
+
+  parse_symtab_extension (secdatastart, secdata, obj->out);
+  obj->found++;
+  free (secdatastart);
+  return 1;
+
+err:
+  if (message)
+    message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  /* Force claim_file_handler to abandon this file.  */
+  obj->found = 0;
+  free (secdatastart);
+  return 0;
+}
+
+
 /* Find an offload section of an object file.  */
 
 static int
@@ -1055,8 +1166,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
     goto err;
 
-  if (obj.objfile)
-    errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+   if (obj.objfile)
+    {
+      errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj,
+					    &err);
+      if (!errmsg)
+	errmsg = simple_object_find_sections (obj.objfile,
+					      process_symtab_extension,
+					      &obj, &err);
+    }
 
   if (!obj.objfile || errmsg)
     {
@@ -1080,8 +1198,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	{
+	  clear_new_symtab_flags (&lto_file.symtab);
+	  status = add_symbols (file->handle, lto_file.symtab.nsyms,
+				lto_file.symtab.syms);
+	}
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,12 +1367,18 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
 	case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 	  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
 	  break;
+	case LDPT_GET_SYMBOLS_V4:
+	  get_symbols_v4 = p->tv_u.tv_get_symbols;
+	  break;
 	case LDPT_GET_SYMBOLS_V2:
 	  get_symbols_v2 = p->tv_u.tv_get_symbols;
 	  break;
-- 
2.25.1


[-- Attachment #3: 0001-Update-include-plugin-api.h.patch --]
[-- Type: text/x-patch, Size: 1801 bytes --]

From a69c62106c62e96df40493b96adc8b7cf13a4506 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 12 Mar 2020 13:15:48 +0100
Subject: [PATCH 1/2] Update include/plugin-api.h.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* plugin-api.h (struct ld_plugin_symbol): Split
	int def into 4 char fields.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_symbol_section_flags): New.
	(enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
	LDPT_GET_SYMBOLS_V4.
---
 include/plugin-api.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..c1290497ecf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,18 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+  /* This is for compatibility with older ABIs.  */
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +456,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
 };
 
 /* The plugin transfer vector.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 13:31                               ` Martin Liška
@ 2020-03-16 11:12                                 ` H.J. Lu
  2020-03-16 13:50                                   ` Martin Liška
  2020-03-17 23:27                                 ` Jan Hubicka
  1 sibling, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-16 11:12 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On Thu, Mar 12, 2020 at 7:54 AM Martin Liška <mliska@suse.cz> wrote:
>
> Hi.
>
> There's updated version of the patch.
> Changes from the previous version:
> - comment added to ld_plugin_symbol
> - new section renamed to ext_symtab
> - assert added for loop iterations in produce_symtab and produce_symtab_extension
>
> Martin

2020-03-12  Martin Liska  <mliska@suse.cz>

* plugin-api.h (struct ld_plugin_symbol): Split
int def into 4 char fields.
(enum ld_plugin_symbol_type): New.
(enum ld_plugin_symbol_section_flags): New.
(enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
                                           ^^^^^^^^^^^^^^^ Typo,
should be LDPT_ADD_SYMBOLS_V2.
LDPT_GET_SYMBOLS_V4.


-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-16 11:12                                 ` H.J. Lu
@ 2020-03-16 13:50                                   ` Martin Liška
  2020-03-16 14:34                                     ` H.J. Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-16 13:50 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On 3/16/20 12:12 PM, H.J. Lu wrote:
> (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
>                                             ^^^^^^^^^^^^^^^ Typo,

Thank you. I fixed that in my development branch.

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-16 13:50                                   ` Martin Liška
@ 2020-03-16 14:34                                     ` H.J. Lu
  2020-03-16 14:38                                       ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-16 14:34 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On Mon, Mar 16, 2020 at 6:50 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/16/20 12:12 PM, H.J. Lu wrote:
> > (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
> >                                             ^^^^^^^^^^^^^^^ Typo,
>
> Thank you. I fixed that in my development branch.

Still not right.  It should be

    2020-03-12  Martin Liska  <mliska@suse.cz>

            PR binutils/25640
            * plugin-api.h (struct ld_plugin_symbol): Split int def into 4
            char fields.
            (enum ld_plugin_symbol_type): New.
            (enum ld_plugin_symbol_section_flags): New.
            (enum ld_plugin_tag): Add LDPT_ADD_SYMBOLS_V2 and
            LDPT_GET_SYMBOLS_V4.

It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2.

-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-16 14:34                                     ` H.J. Lu
@ 2020-03-16 14:38                                       ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-16 14:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On 3/16/20 3:34 PM, H.J. Lu wrote:
> It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2.

Sorry. It's fixed now.

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-12 13:31                               ` Martin Liška
  2020-03-16 11:12                                 ` H.J. Lu
@ 2020-03-17 23:27                                 ` Jan Hubicka
  2020-03-18  8:52                                   ` Martin Liška
  1 sibling, 1 reply; 55+ messages in thread
From: Jan Hubicka @ 2020-03-17 23:27 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches

> Hi.
> 
> There's updated version of the patch.
> Changes from the previous version:
> - comment added to ld_plugin_symbol
> - new section renamed to ext_symtab
> - assert added for loop iterations in produce_symtab and produce_symtab_extension
Hi,
I hope this is last version of the patch.
> 
> 2020-03-12  Martin Liska  <mliska@suse.cz>
> 
> 	* lto-section-in.c: Add extension_symtab.
ext_symtab  :)
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index c17dd69dbdd..78b015be696 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>    "mode_table",
>    "hsa",
>    "lto",
> -  "ipa_sra"
> +  "ipa_sra",
> +  "ext_symtab"
I would move ext_symtab next to symtab so the sections remains at least
bit reasonably ordered.
>  
> +/* Write extension information for symbols (symbol type, section flags).  */
> +
> +static void
> +write_symbol_extension_info (tree t)
> +{
> +  unsigned char c;
Do we still use vertical whitespace after decls per GNU coding style?
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 25bf6c468f7..4f82b439360 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -236,6 +236,7 @@ enum lto_section_type
>    LTO_section_ipa_hsa,
>    LTO_section_lto,
>    LTO_section_ipa_sra,
> +  LTO_section_symtab_extension,
I guess symtab_ext to match the actual section name?
>    LTO_N_SECTION_TYPES		/* Must be last.  */
>  };
>  
> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> index 0ce0de10121..47f0ff27df8 100644
> --- a/include/lto-symtab.h
> +++ b/include/lto-symtab.h
> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
>      GCCPV_HIDDEN
>    };
>  
> +enum gcc_plugin_symbol_type
> +{
> +  GCCST_UNKNOWN,
> +  GCCST_FUNCTION,
> +  GCCST_VARIABLE,
> +};
> +
> +enum gcc_plugin_symbol_section_flags
> +{
> +  GCCSSS_BSS = 1
> +};

Probably comments here?
> +
>  #endif /* GCC_LTO_SYMTAB_H  */
> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
> +   Returns the address of the next entry. */
> +
> +static char *
> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> +{
> +  unsigned char t;
> +  enum ld_plugin_symbol_type symbol_types[] =
> +    {
> +      LDST_UNKNOWN,
> +      LDST_FUNCTION,
> +      LDST_VARIABLE,
> +    };
> +
> +  t = *p;
> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> +  entry->symbol_type = symbol_types[t];
> +  p++;
> +  entry->section_flags = *p;
> +  p++;
> +
> +  return p;
> +}

I think we have chance to make some plan for future extensions without
introducing too many additional sections.

Currently there are 2 bytes per entry, while only 3 bits are actively
used of them.  If we invent next flag to pass we can use unused bits
however we need a way to indicate to plugin that the bit is defined.
This could be done by a simple version byte at the beggining of
ext_symtab section which will be 0 now and once we define extra bits we
bump it up to 1.

It is not that important given that even empty file results in 2k LTO
object file, but I think it would be nicer in longer run.
> +  /* This is for compatibility with older ABIs.  */
Perhaps say here that this ABI defined only "int def;"

The patch look good to me. Thanks for the work!
Honza
> +#ifdef __BIG_ENDIAN__
> +  char unused;
> +  char section_flags;
> +  char symbol_type;
> +  char def;
> +#else
> +  char def;
> +  char symbol_type;
> +  char section_flags;
> +  char unused;
> +#endif
>    int visibility;
>    uint64_t size;
>    char *comdat_key;
> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
>    LDPV_HIDDEN
>  };
>  
> +/* The type of the symbol.  */
> +
> +enum ld_plugin_symbol_type
> +{
> +  LDST_UNKNOWN,
> +  LDST_FUNCTION,
> +  LDST_VARIABLE,
> +};
> +
> +enum ld_plugin_symbol_section_flags
> +{
> +  LDSSS_BSS = 1
> +};
> +
>  /* How a symbol is resolved.  */
>  
>  enum ld_plugin_symbol_resolution
> @@ -431,7 +456,9 @@ enum ld_plugin_tag
>    LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
>    LDPT_GET_INPUT_SECTION_SIZE = 30,
>    LDPT_REGISTER_NEW_INPUT_HOOK = 31,
> -  LDPT_GET_WRAP_SYMBOLS = 32
> +  LDPT_GET_WRAP_SYMBOLS = 32,
> +  LDPT_ADD_SYMBOLS_V2 = 33,
> +  LDPT_GET_SYMBOLS_V4 = 34,
>  };
>  
>  /* The plugin transfer vector.  */
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-17 23:27                                 ` Jan Hubicka
@ 2020-03-18  8:52                                   ` Martin Liška
  2020-03-18  8:53                                     ` [PATCH] Bump LTO bytecode version Martin Liška
  2020-03-19  9:12                                     ` [PATCH][RFC] API extension for binutils (type of symbols) Richard Biener
  0 siblings, 2 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-18  8:52 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5211 bytes --]

On 3/18/20 12:27 AM, Jan Hubicka wrote:
>> Hi.
>>
>> There's updated version of the patch.
>> Changes from the previous version:
>> - comment added to ld_plugin_symbol
>> - new section renamed to ext_symtab
>> - assert added for loop iterations in produce_symtab and produce_symtab_extension
> Hi,
> I hope this is last version of the patch.

Hello.

Yes.

>>
>> 2020-03-12  Martin Liska  <mliska@suse.cz>
>>
>> 	* lto-section-in.c: Add extension_symtab.
> ext_symtab  :)

Fixed.

>> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
>> index c17dd69dbdd..78b015be696 100644
>> --- a/gcc/lto-section-in.c
>> +++ b/gcc/lto-section-in.c
>> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>>     "mode_table",
>>     "hsa",
>>     "lto",
>> -  "ipa_sra"
>> +  "ipa_sra",
>> +  "ext_symtab"
> I would move ext_symtab next to symtab so the sections remains at least
> bit reasonably ordered.

Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version.

>>   
>> +/* Write extension information for symbols (symbol type, section flags).  */
>> +
>> +static void
>> +write_symbol_extension_info (tree t)
>> +{
>> +  unsigned char c;
> Do we still use vertical whitespace after decls per GNU coding style?

Dunno. This seems to me like a nit.

>> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
>> index 25bf6c468f7..4f82b439360 100644
>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -236,6 +236,7 @@ enum lto_section_type
>>     LTO_section_ipa_hsa,
>>     LTO_section_lto,
>>     LTO_section_ipa_sra,
>> +  LTO_section_symtab_extension,
> I guess symtab_ext to match the actual section name?

No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive
enum names.

>>     LTO_N_SECTION_TYPES		/* Must be last.  */
>>   };
>>   
>> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
>> index 0ce0de10121..47f0ff27df8 100644
>> --- a/include/lto-symtab.h
>> +++ b/include/lto-symtab.h
>> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
>>       GCCPV_HIDDEN
>>     };
>>   
>> +enum gcc_plugin_symbol_type
>> +{
>> +  GCCST_UNKNOWN,
>> +  GCCST_FUNCTION,
>> +  GCCST_VARIABLE,
>> +};
>> +
>> +enum gcc_plugin_symbol_section_flags
>> +{
>> +  GCCSSS_BSS = 1
>> +};
> 
> Probably comments here?

No. There are just shadow copy of enum types from plugin-api.h which
are documented.

>> +
>>   #endif /* GCC_LTO_SYMTAB_H  */
>> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
>> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
>> +   Returns the address of the next entry. */
>> +
>> +static char *
>> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
>> +{
>> +  unsigned char t;
>> +  enum ld_plugin_symbol_type symbol_types[] =
>> +    {
>> +      LDST_UNKNOWN,
>> +      LDST_FUNCTION,
>> +      LDST_VARIABLE,
>> +    };
>> +
>> +  t = *p;
>> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
>> +  entry->symbol_type = symbol_types[t];
>> +  p++;
>> +  entry->section_flags = *p;
>> +  p++;
>> +
>> +  return p;
>> +}
> 
> I think we have chance to make some plan for future extensions without
> introducing too many additional sections.
> 
> Currently there are 2 bytes per entry, while only 3 bits are actively
> used of them.  If we invent next flag to pass we can use unused bits
> however we need a way to indicate to plugin that the bit is defined.
> This could be done by a simple version byte at the beggining of
> ext_symtab section which will be 0 now and once we define extra bits we
> bump it up to 1.

I like the suggested change, it can help us in the future.

> 
> It is not that important given that even empty file results in 2k LTO
> object file, but I think it would be nicer in longer run.
>> +  /* This is for compatibility with older ABIs.  */
> Perhaps say here that this ABI defined only "int def;"

Good point.

> 
> The patch look good to me. Thanks for the work!

Thanks. I'm sending updated patch that I've just tested on lto.exp and
both binutils master and HJ's branch that utilizes the new API.

Martin

> Honza
>> +#ifdef __BIG_ENDIAN__
>> +  char unused;
>> +  char section_flags;
>> +  char symbol_type;
>> +  char def;
>> +#else
>> +  char def;
>> +  char symbol_type;
>> +  char section_flags;
>> +  char unused;
>> +#endif
>>     int visibility;
>>     uint64_t size;
>>     char *comdat_key;
>> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
>>     LDPV_HIDDEN
>>   };
>>   
>> +/* The type of the symbol.  */
>> +
>> +enum ld_plugin_symbol_type
>> +{
>> +  LDST_UNKNOWN,
>> +  LDST_FUNCTION,
>> +  LDST_VARIABLE,
>> +};
>> +
>> +enum ld_plugin_symbol_section_flags
>> +{
>> +  LDSSS_BSS = 1
>> +};
>> +
>>   /* How a symbol is resolved.  */
>>   
>>   enum ld_plugin_symbol_resolution
>> @@ -431,7 +456,9 @@ enum ld_plugin_tag
>>     LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
>>     LDPT_GET_INPUT_SECTION_SIZE = 30,
>>     LDPT_REGISTER_NEW_INPUT_HOOK = 31,
>> -  LDPT_GET_WRAP_SYMBOLS = 32
>> +  LDPT_GET_WRAP_SYMBOLS = 32,
>> +  LDPT_ADD_SYMBOLS_V2 = 33,
>> +  LDPT_GET_SYMBOLS_V4 = 34,
>>   };
>>   
>>   /* The plugin transfer vector.  */
>> -- 
>> 2.25.1
>>
> 


[-- Attachment #2: 0002-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 15785 bytes --]

From 492e7dc5b5f792b2e9f92b5fc77e47fe9ee98da7 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/3] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-section-in.c: Add ext_symtab.
	* lto-streamer-out.c (write_symbol_extension_info): New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to ...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c    |   1 +
 gcc/lto-streamer-out.c  |  82 +++++++++++++++++-
 gcc/lto-streamer.h      |   1 +
 include/lto-symtab.h    |  12 +++
 lto-plugin/lto-plugin.c | 178 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 255 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..0923a8c0746 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -38,6 +38,7 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "function_body",
   "statics",
   "symtab",
+  "ext_symtab",
   "refs",
   "asm",
   "jmpfuncs",
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..34f02e4dda2 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&slot_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+       ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (&c, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+    }
+  lto_write_data (&section_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
    SET and VSET are cgraph/varpool node sets we are outputting.  */
 
-static void
+static unsigned int
 produce_symtab (struct output_block *ob)
 {
+  unsigned int streamed_symbols = 0;
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob)
       if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
     }
   for (lsei = lsei_start (encoder);
        !lsei_end_p (lsei); lsei_next (&lsei))
@@ -2813,8 +2835,61 @@ produce_symtab (struct output_block *ob)
       if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
+    }
+
+  lto_end_section ();
+
+  return streamed_symbols;
+}
+
+/* Symtab extension version.  */
+#define LTO_SYMTAB_EXTENSION_VERSION 1
+
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob,
+			  unsigned int previous_streamed_symbols)
+{
+  unsigned int streamed_symbols = 0;
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+					     NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  unsigned char version = LTO_SYMTAB_EXTENSION_VERSION;
+  lto_write_data (&version, 1);
+
+  /* Write the symbol table.
+     First write everything defined and then all declarations.
+     This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
+    }
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
     }
 
+  gcc_assert (previous_streamed_symbols == streamed_symbols);
   lto_end_section ();
 }
 
@@ -3001,7 +3076,10 @@ produce_asm_for_decls (void)
   /* Write the symbol table.  It is used by linker to determine dependencies
      and thus we can skip it for WPA.  */
   if (!flag_wpa)
-    produce_symtab (ob);
+    {
+      unsigned int streamed_symbols = produce_symtab (ob);
+      produce_symtab_extension (ob, streamed_symbols);
+    }
 
   /* Write command line opts.  */
   lto_write_options ();
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..76aa6fe34b8 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -219,6 +219,7 @@ enum lto_section_type
   LTO_section_function_body,
   LTO_section_static_initializer,
   LTO_section_symtab,
+  LTO_section_symtab_extension,
   LTO_section_refs,
   LTO_section_asm,
   LTO_section_jump_functions,
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..a17f61374e2 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -88,10 +88,14 @@ along with this program; see the file COPYING3.  If not see
 
 /* LTO magic section name.  */
 
-#define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
-#define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
-#define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
-#define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
+#define LTO_SYMTAB_PREFIX	    ".gnu.lto_.symtab"
+#define LTO_SYMTAB_PREFIX_LEN	    (sizeof (LTO_SYMTAB_PREFIX) - 1)
+#define LTO_SYMTAB_EXT_PREFIX	    ".gnu.lto_.ext_symtab"
+#define LTO_SYMTAB_EXT_PREFIX_LEN   (sizeof (LTO_SYMTAB_EXT_PREFIX) - 1)
+#define LTO_LTO_PREFIX		    ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	    (sizeof (LTO_LTO_PREFIX) - 1)
+#define OFFLOAD_SECTION		    ".gnu.offload_lto_.opts"
+#define OFFLOAD_SECTION_LEN	    (sizeof (OFFLOAD_SECTION) - 1)
 
 /* The part of the symbol table the plugin has to keep track of. Note that we
    must keep SYMS until all_symbols_read is called to give the linker time to
@@ -154,12 +158,12 @@ enum symbol_style
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
-static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -231,7 +235,7 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
    Returns the address of the next entry. */
 
 static char *
-parse_table_entry (char *p, struct ld_plugin_symbol *entry, 
+parse_table_entry (char *p, struct ld_plugin_symbol *entry,
 		   struct sym_aux *aux)
 {
   unsigned char t;
@@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;
 
+  entry->unused = 0;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;
 
@@ -309,6 +315,32 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   return p;
 }
 
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE,
+    };
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}
+
+
 /* Translate the IL symbol table located between DATA and END. Append the
    slots and symbols to OUT. */
 
@@ -339,6 +371,24 @@ translate (char *data, char *end, struct plugin_symtab *out)
   out->aux = aux;
 }
 
+static void
+parse_symtab_extension (char *data, char *end, struct plugin_symtab *out)
+{
+  unsigned i;
+
+  unsigned char version = *data;
+  data++;
+
+  /* Version 1 contains the following data per entry:
+     - symbol_type
+     - section_flags
+     .  */
+
+  if (version == 1)
+    for (i = 0; i < out->nsyms; i++)
+      data = parse_table_entry_extension (data, &out->syms[i]);
+}
+
 /* Free all memory that is no longer needed after writing the symbol
    resolution. */
 
@@ -431,7 +481,7 @@ finish_conflict_resolution (struct plugin_symtab *symtab,
 
   for (i = 0; i < symtab->nsyms; i++)
     { 
-      int resolution = LDPR_UNKNOWN;
+      char resolution = LDPR_UNKNOWN;
 
       if (symtab->aux[i].next_conflict == -1)
 	continue;
@@ -473,6 +523,21 @@ free_symtab (struct plugin_symtab *symtab)
   symtab->aux = NULL;
 }
 
+/* Clear new flags symbol_type and section_flags
+   from symbol table entries.  */
+
+static void
+clear_new_symtab_flags (struct plugin_symtab *out)
+{
+  unsigned i;
+
+  for (i = 0; i < out->nsyms; i++)
+    {
+      out->syms[i].symbol_type = 0;
+      out->syms[i].section_flags = 0;
+    }
+}
+
 /*  Writes the relocations to disk. */
 
 static void
@@ -495,10 +560,16 @@ write_resolution (void)
 
       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+	get_symbols_v4 (info->handle, symtab->nsyms, syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+	{
+	  clear_new_symtab_flags (symtab);
+	  if (get_symbols_v2)
+	    get_symbols_v2 (info->handle, symtab->nsyms, syms);
+	  else
+	    get_symbols (info->handle, symtab->nsyms, syms);
+	}
 
       finish_conflict_resolution (symtab, &info->conflicts);
 
@@ -951,9 +1022,9 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdatastart, *secdata;
+  char *secdatastart = NULL, *secdata;
 
-  if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
+  if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
     return 1;
 
   s = strrchr (name, '.');
@@ -995,6 +1066,59 @@ err:
   return 0;
 }
 
+/* Process one section of an object file.  */
+
+static int
+process_symtab_extension (void *data, const char *name, off_t offset,
+			  off_t length)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  char *s;
+  char *secdatastart = NULL, *secdata;
+
+  if (strncmp (name, LTO_SYMTAB_EXT_PREFIX, LTO_SYMTAB_EXT_PREFIX_LEN) != 0)
+    return 1;
+
+  s = strrchr (name, '.');
+  if (s)
+    sscanf (s, ".%" PRI_LL "x", &obj->out->id);
+  secdata = secdatastart = xmalloc (length);
+  offset += obj->file->offset;
+  if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+    goto err;
+
+  do
+    {
+      ssize_t got = read (obj->file->fd, secdata, length);
+      if (got == 0)
+	break;
+      else if (got > 0)
+	{
+	  secdata += got;
+	  length -= got;
+	}
+      else if (errno != EINTR)
+	goto err;
+    }
+  while (length > 0);
+  if (length > 0)
+    goto err;
+
+  parse_symtab_extension (secdatastart, secdata, obj->out);
+  obj->found++;
+  free (secdatastart);
+  return 1;
+
+err:
+  if (message)
+    message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  /* Force claim_file_handler to abandon this file.  */
+  obj->found = 0;
+  free (secdatastart);
+  return 0;
+}
+
+
 /* Find an offload section of an object file.  */
 
 static int
@@ -1055,8 +1179,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
     goto err;
 
-  if (obj.objfile)
-    errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+   if (obj.objfile)
+    {
+      errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj,
+					    &err);
+      if (!errmsg)
+	errmsg = simple_object_find_sections (obj.objfile,
+					      process_symtab_extension,
+					      &obj, &err);
+    }
 
   if (!obj.objfile || errmsg)
     {
@@ -1080,8 +1211,15 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	{
+	  clear_new_symtab_flags (&lto_file.symtab);
+	  status = add_symbols (file->handle, lto_file.symtab.nsyms,
+				lto_file.symtab.syms);
+	}
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,12 +1380,18 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
 	case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 	  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
 	  break;
+	case LDPT_GET_SYMBOLS_V4:
+	  get_symbols_v4 = p->tv_u.tv_get_symbols;
+	  break;
 	case LDPT_GET_SYMBOLS_V2:
 	  get_symbols_v2 = p->tv_u.tv_get_symbols;
 	  break;
-- 
2.25.1


[-- Attachment #3: 0001-Update-include-plugin-api.h.patch --]
[-- Type: text/x-patch, Size: 1850 bytes --]

From 72868074bcb5f402d1ee8ef713d85cd81fd1711c Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 12 Mar 2020 13:15:48 +0100
Subject: [PATCH 1/3] Update include/plugin-api.h.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* plugin-api.h (struct ld_plugin_symbol): Split
	int def into 4 char fields.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_symbol_section_flags): New.
	(enum ld_plugin_tag): Add LDPT_ADD_SYMBOLS_V2 and
	LDPT_GET_SYMBOLS_V4.
---
 include/plugin-api.h | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..88676a63e5d 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,19 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+  /* This is for compatibility with older ABIs.  The older ABI defined
+     only 'def' field.  */
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +135,20 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +457,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
 };
 
 /* The plugin transfer vector.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] Bump LTO bytecode version.
  2020-03-18  8:52                                   ` Martin Liška
@ 2020-03-18  8:53                                     ` Martin Liška
  2020-03-18  8:56                                       ` Richard Biener
  2020-03-19  9:12                                     ` [PATCH][RFC] API extension for binutils (type of symbols) Richard Biener
  1 sibling, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-18  8:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 108 bytes --]

Hi.

I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.

Ready for master?
Martin

[-- Attachment #2: 0003-Bump-LTO-bytecode-version.patch --]
[-- Type: text/x-patch, Size: 865 bytes --]

From b48f4187e11da79d1b0a932b120eeee2f882defc Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Wed, 18 Mar 2020 09:40:24 +0100
Subject: [PATCH 3/3] Bump LTO bytecode version.

gcc/ChangeLog:

2020-03-18  Martin Liska  <mliska@suse.cz>

	* lto-streamer.h (LTO_major_version): Bump to 10.
---
 gcc/lto-streamer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 76aa6fe34b8..9c1c7539462 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -120,7 +120,7 @@ along with GCC; see the file COPYING3.  If not see
      String are represented in the table as pairs, a length in ULEB128
      form followed by the data for the string.  */
 
-#define LTO_major_version 9
+#define LTO_major_version 10
 #define LTO_minor_version 0
 
 typedef unsigned char	lto_decl_flags_t;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] Bump LTO bytecode version.
  2020-03-18  8:53                                     ` [PATCH] Bump LTO bytecode version Martin Liška
@ 2020-03-18  8:56                                       ` Richard Biener
  2020-03-18  9:00                                         ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-18  8:56 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 18, 2020 at 9:54 AM Martin Liška <mliska@suse.cz> wrote:
>
> Hi.
>
> I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.
>
> Ready for master?

Um, is there any recent change warranting it?  The version is already different
from GCC 9s and I'd rather wait until we're closer to the actual release?  Note
the LTO major doesn't match the GCC major ...

Richard.

> Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] Bump LTO bytecode version.
  2020-03-18  8:56                                       ` Richard Biener
@ 2020-03-18  9:00                                         ` Martin Liška
  2020-03-18  9:34                                           ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-18  9:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

On 3/18/20 9:56 AM, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 9:54 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> Hi.
>>
>> I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.
>>
>> Ready for master?
> 
> Um, is there any recent change warranting it?

The API extension reshuffles lto_section_type enum values.

>  The version is already different
> from GCC 9s and I'd rather wait until we're closer to the actual release?  Note
> the LTO major doesn't match the GCC major ...

But yes, the last change happened in:

commit 86c23d9314c4081c13ebf629fd3393de4e316bf6
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu May 16 11:30:41 2019 +0200

     * lto-streamer.h (LTO_major_version): Bump to 9.
     
     From-SVN: r271284

which is right after stage1 opened.
Is it fine break LTO bytecode during the development of a new release?

Martin

> 
> Richard.
> 
>> Martin


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] Bump LTO bytecode version.
  2020-03-18  9:00                                         ` Martin Liška
@ 2020-03-18  9:34                                           ` Richard Biener
  2020-03-18 10:20                                             ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-18  9:34 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 18, 2020 at 10:00 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/18/20 9:56 AM, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 9:54 AM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> Hi.
> >>
> >> I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.
> >>
> >> Ready for master?
> >
> > Um, is there any recent change warranting it?
>
> The API extension reshuffles lto_section_type enum values.
>
> >  The version is already different
> > from GCC 9s and I'd rather wait until we're closer to the actual release?  Note
> > the LTO major doesn't match the GCC major ...
>
> But yes, the last change happened in:
>
> commit 86c23d9314c4081c13ebf629fd3393de4e316bf6
> Author: Jakub Jelinek <jakub@redhat.com>
> Date:   Thu May 16 11:30:41 2019 +0200
>
>      * lto-streamer.h (LTO_major_version): Bump to 9.
>
>      From-SVN: r271284
>
> which is right after stage1 opened.
> Is it fine break LTO bytecode during the development of a new release?

Yes, we don't really bump everytime we change something.

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Martin
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] Bump LTO bytecode version.
  2020-03-18  9:34                                           ` Richard Biener
@ 2020-03-18 10:20                                             ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-18 10:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

On 3/18/20 10:34 AM, Richard Biener wrote:
> Yes, we don't really bump everytime we change something.

Fine, then please forget about the patch.

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-18  8:52                                   ` Martin Liška
  2020-03-18  8:53                                     ` [PATCH] Bump LTO bytecode version Martin Liška
@ 2020-03-19  9:12                                     ` Richard Biener
  2020-03-19 15:00                                       ` Martin Liška
  1 sibling, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-19  9:12 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, GCC Patches

On Wed, Mar 18, 2020 at 9:52 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/18/20 12:27 AM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> There's updated version of the patch.
> >> Changes from the previous version:
> >> - comment added to ld_plugin_symbol
> >> - new section renamed to ext_symtab
> >> - assert added for loop iterations in produce_symtab and produce_symtab_extension
> > Hi,
> > I hope this is last version of the patch.
>
> Hello.
>
> Yes.
>
> >>
> >> 2020-03-12  Martin Liska  <mliska@suse.cz>
> >>
> >>      * lto-section-in.c: Add extension_symtab.
> > ext_symtab  :)
>
> Fixed.
>
> >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >> index c17dd69dbdd..78b015be696 100644
> >> --- a/gcc/lto-section-in.c
> >> +++ b/gcc/lto-section-in.c
> >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >>     "mode_table",
> >>     "hsa",
> >>     "lto",
> >> -  "ipa_sra"
> >> +  "ipa_sra",
> >> +  "ext_symtab"
> > I would move ext_symtab next to symtab so the sections remains at least
> > bit reasonably ordered.
>
> Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version.
>
> >>
> >> +/* Write extension information for symbols (symbol type, section flags).  */
> >> +
> >> +static void
> >> +write_symbol_extension_info (tree t)
> >> +{
> >> +  unsigned char c;
> > Do we still use vertical whitespace after decls per GNU coding style?
>
> Dunno. This seems to me like a nit.
>
> >> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> >> index 25bf6c468f7..4f82b439360 100644
> >> --- a/gcc/lto-streamer.h
> >> +++ b/gcc/lto-streamer.h
> >> @@ -236,6 +236,7 @@ enum lto_section_type
> >>     LTO_section_ipa_hsa,
> >>     LTO_section_lto,
> >>     LTO_section_ipa_sra,
> >> +  LTO_section_symtab_extension,
> > I guess symtab_ext to match the actual section name?
>
> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive
> enum names.
>
> >>     LTO_N_SECTION_TYPES              /* Must be last.  */
> >>   };
> >>
> >> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> >> index 0ce0de10121..47f0ff27df8 100644
> >> --- a/include/lto-symtab.h
> >> +++ b/include/lto-symtab.h
> >> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> >>       GCCPV_HIDDEN
> >>     };
> >>
> >> +enum gcc_plugin_symbol_type
> >> +{
> >> +  GCCST_UNKNOWN,
> >> +  GCCST_FUNCTION,
> >> +  GCCST_VARIABLE,
> >> +};
> >> +
> >> +enum gcc_plugin_symbol_section_flags
> >> +{
> >> +  GCCSSS_BSS = 1
> >> +};
> >
> > Probably comments here?
>
> No. There are just shadow copy of enum types from plugin-api.h which
> are documented.
>
> >> +
> >>   #endif /* GCC_LTO_SYMTAB_H  */
> >> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> >> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
> >> +   Returns the address of the next entry. */
> >> +
> >> +static char *
> >> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> >> +{
> >> +  unsigned char t;
> >> +  enum ld_plugin_symbol_type symbol_types[] =
> >> +    {
> >> +      LDST_UNKNOWN,
> >> +      LDST_FUNCTION,
> >> +      LDST_VARIABLE,
> >> +    };
> >> +
> >> +  t = *p;
> >> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> >> +  entry->symbol_type = symbol_types[t];
> >> +  p++;
> >> +  entry->section_flags = *p;
> >> +  p++;
> >> +
> >> +  return p;
> >> +}
> >
> > I think we have chance to make some plan for future extensions without
> > introducing too many additional sections.
> >
> > Currently there are 2 bytes per entry, while only 3 bits are actively
> > used of them.  If we invent next flag to pass we can use unused bits
> > however we need a way to indicate to plugin that the bit is defined.
> > This could be done by a simple version byte at the beggining of
> > ext_symtab section which will be 0 now and once we define extra bits we
> > bump it up to 1.
>
> I like the suggested change, it can help us in the future.
>
> >
> > It is not that important given that even empty file results in 2k LTO
> > object file, but I think it would be nicer in longer run.
> >> +  /* This is for compatibility with older ABIs.  */
> > Perhaps say here that this ABI defined only "int def;"
>
> Good point.
>
> >
> > The patch look good to me. Thanks for the work!
>
> Thanks. I'm sending updated patch that I've just tested on lto.exp and
> both binutils master and HJ's branch that utilizes the new API.

@@ -495,10 +560,16 @@ write_resolution (void)

       /* Version 2 of API supports IRONLY_EXP resolution that is
          accepted by GCC-4.7 and newer.  */
-      if (get_symbols_v2)
-        get_symbols_v2 (info->handle, symtab->nsyms, syms);
+      if (get_symbols_v4)
+       get_symbols_v4 (info->handle, symtab->nsyms, syms);
       else
-        get_symbols (info->handle, symtab->nsyms, syms);
+       {
+         clear_new_symtab_flags (symtab);

can you instead just avoid parsing the ext symtab?

+         if (get_symbols_v2)
+           get_symbols_v2 (info->handle, symtab->nsyms, syms);
+         else
+           get_symbols (info->handle, symtab->nsyms, syms);
+       }

I guess this also points to the fact that LDs symbol resolution
can't tell GCC it chose "BSS" (from a non-IL object) or that
it chose a variable or function.

@@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   entry->visibility = translate_visibility[t];
   p++;

+  entry->unused = 0;
+
   memcpy (&entry->size, p, sizeof (uint64_t));
   p += 8;

isn't that either not enough or too much clearing?
I'd have expected

 entry->unused = entry->section_flags = entry->symbol_type = 0;

_before_

  t = *p;
  check (t <= 4, LDPL_FATAL, "invalid symbol kind found");
  entry->def = translate_kind[t];
  p++;

?

+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+

please add a symbolic name for the value zero,
may I suggest LDSSS_DEFAULT?  I see you've
settled on symbol_section_flags rather than
section_kind (BSS is not really a flag...).

Otherwise looks OK to me.

Thanks,
Richard.

> Martin
>
> > Honza
> >> +#ifdef __BIG_ENDIAN__
> >> +  char unused;
> >> +  char section_flags;
> >> +  char symbol_type;
> >> +  char def;
> >> +#else
> >> +  char def;
> >> +  char symbol_type;
> >> +  char section_flags;
> >> +  char unused;
> >> +#endif
> >>     int visibility;
> >>     uint64_t size;
> >>     char *comdat_key;
> >> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
> >>     LDPV_HIDDEN
> >>   };
> >>
> >> +/* The type of the symbol.  */
> >> +
> >> +enum ld_plugin_symbol_type
> >> +{
> >> +  LDST_UNKNOWN,
> >> +  LDST_FUNCTION,
> >> +  LDST_VARIABLE,
> >> +};
> >> +
> >> +enum ld_plugin_symbol_section_flags
> >> +{
> >> +  LDSSS_BSS = 1
> >> +};
> >> +
> >>   /* How a symbol is resolved.  */
> >>
> >>   enum ld_plugin_symbol_resolution
> >> @@ -431,7 +456,9 @@ enum ld_plugin_tag
> >>     LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
> >>     LDPT_GET_INPUT_SECTION_SIZE = 30,
> >>     LDPT_REGISTER_NEW_INPUT_HOOK = 31,
> >> -  LDPT_GET_WRAP_SYMBOLS = 32
> >> +  LDPT_GET_WRAP_SYMBOLS = 32,
> >> +  LDPT_ADD_SYMBOLS_V2 = 33,
> >> +  LDPT_GET_SYMBOLS_V4 = 34,
> >>   };
> >>
> >>   /* The plugin transfer vector.  */
> >> --
> >> 2.25.1
> >>
> >
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19  9:12                                     ` [PATCH][RFC] API extension for binutils (type of symbols) Richard Biener
@ 2020-03-19 15:00                                       ` Martin Liška
  2020-03-19 15:46                                         ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-19 15:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 7879 bytes --]

On 3/19/20 10:12 AM, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 9:52 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 3/18/20 12:27 AM, Jan Hubicka wrote:
>>>> Hi.
>>>>
>>>> There's updated version of the patch.
>>>> Changes from the previous version:
>>>> - comment added to ld_plugin_symbol
>>>> - new section renamed to ext_symtab
>>>> - assert added for loop iterations in produce_symtab and produce_symtab_extension
>>> Hi,
>>> I hope this is last version of the patch.
>>
>> Hello.
>>
>> Yes.
>>
>>>>
>>>> 2020-03-12  Martin Liska  <mliska@suse.cz>
>>>>
>>>>       * lto-section-in.c: Add extension_symtab.
>>> ext_symtab  :)
>>
>> Fixed.
>>
>>>> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
>>>> index c17dd69dbdd..78b015be696 100644
>>>> --- a/gcc/lto-section-in.c
>>>> +++ b/gcc/lto-section-in.c
>>>> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>>>>      "mode_table",
>>>>      "hsa",
>>>>      "lto",
>>>> -  "ipa_sra"
>>>> +  "ipa_sra",
>>>> +  "ext_symtab"
>>> I would move ext_symtab next to symtab so the sections remains at least
>>> bit reasonably ordered.
>>
>> Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version.
>>
>>>>
>>>> +/* Write extension information for symbols (symbol type, section flags).  */
>>>> +
>>>> +static void
>>>> +write_symbol_extension_info (tree t)
>>>> +{
>>>> +  unsigned char c;
>>> Do we still use vertical whitespace after decls per GNU coding style?
>>
>> Dunno. This seems to me like a nit.
>>
>>>> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
>>>> index 25bf6c468f7..4f82b439360 100644
>>>> --- a/gcc/lto-streamer.h
>>>> +++ b/gcc/lto-streamer.h
>>>> @@ -236,6 +236,7 @@ enum lto_section_type
>>>>      LTO_section_ipa_hsa,
>>>>      LTO_section_lto,
>>>>      LTO_section_ipa_sra,
>>>> +  LTO_section_symtab_extension,
>>> I guess symtab_ext to match the actual section name?
>>
>> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive
>> enum names.
>>
>>>>      LTO_N_SECTION_TYPES              /* Must be last.  */
>>>>    };
>>>>
>>>> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
>>>> index 0ce0de10121..47f0ff27df8 100644
>>>> --- a/include/lto-symtab.h
>>>> +++ b/include/lto-symtab.h
>>>> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
>>>>        GCCPV_HIDDEN
>>>>      };
>>>>
>>>> +enum gcc_plugin_symbol_type
>>>> +{
>>>> +  GCCST_UNKNOWN,
>>>> +  GCCST_FUNCTION,
>>>> +  GCCST_VARIABLE,
>>>> +};
>>>> +
>>>> +enum gcc_plugin_symbol_section_flags
>>>> +{
>>>> +  GCCSSS_BSS = 1
>>>> +};
>>>
>>> Probably comments here?
>>
>> No. There are just shadow copy of enum types from plugin-api.h which
>> are documented.
>>
>>>> +
>>>>    #endif /* GCC_LTO_SYMTAB_H  */
>>>> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
>>>> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
>>>> +   Returns the address of the next entry. */
>>>> +
>>>> +static char *
>>>> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
>>>> +{
>>>> +  unsigned char t;
>>>> +  enum ld_plugin_symbol_type symbol_types[] =
>>>> +    {
>>>> +      LDST_UNKNOWN,
>>>> +      LDST_FUNCTION,
>>>> +      LDST_VARIABLE,
>>>> +    };
>>>> +
>>>> +  t = *p;
>>>> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
>>>> +  entry->symbol_type = symbol_types[t];
>>>> +  p++;
>>>> +  entry->section_flags = *p;
>>>> +  p++;
>>>> +
>>>> +  return p;
>>>> +}
>>>
>>> I think we have chance to make some plan for future extensions without
>>> introducing too many additional sections.
>>>
>>> Currently there are 2 bytes per entry, while only 3 bits are actively
>>> used of them.  If we invent next flag to pass we can use unused bits
>>> however we need a way to indicate to plugin that the bit is defined.
>>> This could be done by a simple version byte at the beggining of
>>> ext_symtab section which will be 0 now and once we define extra bits we
>>> bump it up to 1.
>>
>> I like the suggested change, it can help us in the future.
>>
>>>
>>> It is not that important given that even empty file results in 2k LTO
>>> object file, but I think it would be nicer in longer run.
>>>> +  /* This is for compatibility with older ABIs.  */
>>> Perhaps say here that this ABI defined only "int def;"
>>
>> Good point.
>>
>>>
>>> The patch look good to me. Thanks for the work!
>>
>> Thanks. I'm sending updated patch that I've just tested on lto.exp and
>> both binutils master and HJ's branch that utilizes the new API.
> 
> @@ -495,10 +560,16 @@ write_resolution (void)
> 
>         /* Version 2 of API supports IRONLY_EXP resolution that is
>            accepted by GCC-4.7 and newer.  */
> -      if (get_symbols_v2)
> -        get_symbols_v2 (info->handle, symtab->nsyms, syms);
> +      if (get_symbols_v4)
> +       get_symbols_v4 (info->handle, symtab->nsyms, syms);
>         else
> -        get_symbols (info->handle, symtab->nsyms, syms);
> +       {
> +         clear_new_symtab_flags (symtab);
> 
> can you instead just avoid parsing the ext symtab?

Yes, I simplified the changes and I bet we'll only need one new hook get_symbols_v2.
Then we can base parsing of the external symtab on that.

> 
> +         if (get_symbols_v2)
> +           get_symbols_v2 (info->handle, symtab->nsyms, syms);
> +         else
> +           get_symbols (info->handle, symtab->nsyms, syms);
> +       }
> 
> I guess this also points to the fact that LDs symbol resolution
> can't tell GCC it chose "BSS" (from a non-IL object) or that
> it chose a variable or function.
> 
> @@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
>     entry->visibility = translate_visibility[t];
>     p++;
> 
> +  entry->unused = 0;
> +
>     memcpy (&entry->size, p, sizeof (uint64_t));
>     p += 8;
> 
> isn't that either not enough or too much clearing?
> I'd have expected
> 
>   entry->unused = entry->section_flags = entry->symbol_type = 0;
> 
> _before_
> 
>    t = *p;
>    check (t <= 4, LDPL_FATAL, "invalid symbol kind found");
>    entry->def = translate_kind[t];
>    p++;
> 
> ?

Yes, that's better.

> 
> +enum ld_plugin_symbol_section_flags
> +{
> +  LDSSS_BSS = 1
> +};
> +
> 
> please add a symbolic name for the value zero,
> may I suggest LDSSS_DEFAULT?  I see you've
> settled on symbol_section_flags rather than
> section_kind (BSS is not really a flag...).

I renamed that to ld_plugin_symbol_section_kind.

May I install the 2 patches now? It's again tested on lto.exp
and both old binutils and new HJ's binutils patch and it works as expected.

Martin

> 
> Otherwise looks OK to me.
> 
> Thanks,
> Richard.
> 
>> Martin
>>
>>> Honza
>>>> +#ifdef __BIG_ENDIAN__
>>>> +  char unused;
>>>> +  char section_flags;
>>>> +  char symbol_type;
>>>> +  char def;
>>>> +#else
>>>> +  char def;
>>>> +  char symbol_type;
>>>> +  char section_flags;
>>>> +  char unused;
>>>> +#endif
>>>>      int visibility;
>>>>      uint64_t size;
>>>>      char *comdat_key;
>>>> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
>>>>      LDPV_HIDDEN
>>>>    };
>>>>
>>>> +/* The type of the symbol.  */
>>>> +
>>>> +enum ld_plugin_symbol_type
>>>> +{
>>>> +  LDST_UNKNOWN,
>>>> +  LDST_FUNCTION,
>>>> +  LDST_VARIABLE,
>>>> +};
>>>> +
>>>> +enum ld_plugin_symbol_section_flags
>>>> +{
>>>> +  LDSSS_BSS = 1
>>>> +};
>>>> +
>>>>    /* How a symbol is resolved.  */
>>>>
>>>>    enum ld_plugin_symbol_resolution
>>>> @@ -431,7 +456,9 @@ enum ld_plugin_tag
>>>>      LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
>>>>      LDPT_GET_INPUT_SECTION_SIZE = 30,
>>>>      LDPT_REGISTER_NEW_INPUT_HOOK = 31,
>>>> -  LDPT_GET_WRAP_SYMBOLS = 32
>>>> +  LDPT_GET_WRAP_SYMBOLS = 32,
>>>> +  LDPT_ADD_SYMBOLS_V2 = 33,
>>>> +  LDPT_GET_SYMBOLS_V4 = 34,
>>>>    };
>>>>
>>>>    /* The plugin transfer vector.  */
>>>> --
>>>> 2.25.1
>>>>
>>>
>>


[-- Attachment #2: 0002-API-extension-for-binutils-type-of-symbols.patch --]
[-- Type: text/x-patch, Size: 13735 bytes --]

From 668d229ed363f97d03fb62f6a159876fc4b975e0 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-section-in.c: Add ext_symtab.
	* lto-streamer-out.c (write_symbol_extension_info): New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_kind): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to ...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c    |   1 +
 gcc/lto-streamer-out.c  |  82 ++++++++++++++++++++++-
 gcc/lto-streamer.h      |   1 +
 include/lto-symtab.h    |  13 ++++
 lto-plugin/lto-plugin.c | 141 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 225 insertions(+), 13 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..0923a8c0746 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -38,6 +38,7 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "function_body",
   "statics",
   "symtab",
+  "ext_symtab",
   "refs",
   "asm",
   "jmpfuncs",
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..a219c1d0dd1 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (&slot_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+       ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (&c, 1);
+  unsigned char section_kind = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      section *s = get_variable_section (t, false);
+      if (s->common.flags & SECTION_BSS)
+	section_kind |= GCCSSK_BSS;
+    }
+  lto_write_data (&section_kind, 1);
+}
+
 /* Write an IL symbol table to OB.
    SET and VSET are cgraph/varpool node sets we are outputting.  */
 
-static void
+static unsigned int
 produce_symtab (struct output_block *ob)
 {
+  unsigned int streamed_symbols = 0;
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob)
       if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
     }
   for (lsei = lsei_start (encoder);
        !lsei_end_p (lsei); lsei_next (&lsei))
@@ -2813,8 +2835,61 @@ produce_symtab (struct output_block *ob)
       if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
       write_symbol (cache, node->decl, &seen, false);
+      ++streamed_symbols;
+    }
+
+  lto_end_section ();
+
+  return streamed_symbols;
+}
+
+/* Symtab extension version.  */
+#define LTO_SYMTAB_EXTENSION_VERSION 1
+
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob,
+			  unsigned int previous_streamed_symbols)
+{
+  unsigned int streamed_symbols = 0;
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+					     NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  unsigned char version = LTO_SYMTAB_EXTENSION_VERSION;
+  lto_write_data (&version, 1);
+
+  /* Write the symbol table.
+     First write everything defined and then all declarations.
+     This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
+    }
+  for (lsei = lsei_start (encoder);
+       !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+	continue;
+      write_symbol_extension_info (node->decl);
+      ++streamed_symbols;
     }
 
+  gcc_assert (previous_streamed_symbols == streamed_symbols);
   lto_end_section ();
 }
 
@@ -3001,7 +3076,10 @@ produce_asm_for_decls (void)
   /* Write the symbol table.  It is used by linker to determine dependencies
      and thus we can skip it for WPA.  */
   if (!flag_wpa)
-    produce_symtab (ob);
+    {
+      unsigned int streamed_symbols = produce_symtab (ob);
+      produce_symtab_extension (ob, streamed_symbols);
+    }
 
   /* Write command line opts.  */
   lto_write_options ();
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..76aa6fe34b8 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -219,6 +219,7 @@ enum lto_section_type
   LTO_section_function_body,
   LTO_section_static_initializer,
   LTO_section_symtab,
+  LTO_section_symtab_extension,
   LTO_section_refs,
   LTO_section_asm,
   LTO_section_jump_functions,
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..ef2e35f19c3 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,17 @@ enum gcc_plugin_symbol_visibility
     GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_kind
+{
+  GCCSSK_DEFAULT,
+  GCCSSK_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..ca6c84a1ffd 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -88,10 +88,14 @@ along with this program; see the file COPYING3.  If not see
 
 /* LTO magic section name.  */
 
-#define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
-#define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
-#define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
-#define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
+#define LTO_SYMTAB_PREFIX	    ".gnu.lto_.symtab"
+#define LTO_SYMTAB_PREFIX_LEN	    (sizeof (LTO_SYMTAB_PREFIX) - 1)
+#define LTO_SYMTAB_EXT_PREFIX	    ".gnu.lto_.ext_symtab"
+#define LTO_SYMTAB_EXT_PREFIX_LEN   (sizeof (LTO_SYMTAB_EXT_PREFIX) - 1)
+#define LTO_LTO_PREFIX		    ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	    (sizeof (LTO_LTO_PREFIX) - 1)
+#define OFFLOAD_SECTION		    ".gnu.offload_lto_.opts"
+#define OFFLOAD_SECTION_LEN	    (sizeof (OFFLOAD_SECTION) - 1)
 
 /* The part of the symbol table the plugin has to keep track of. Note that we
    must keep SYMS until all_symbols_read is called to give the linker time to
@@ -159,7 +163,7 @@ static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -286,6 +290,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   else
     entry->comdat_key = xstrdup (entry->comdat_key);
 
+  entry->unused = entry->section_kind = entry->symbol_type = 0;
+
   t = *p;
   check (t <= 4, LDPL_FATAL, "invalid symbol kind found");
   entry->def = translate_kind[t];
@@ -309,6 +315,32 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   return p;
 }
 
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+    {
+      LDST_UNKNOWN,
+      LDST_FUNCTION,
+      LDST_VARIABLE,
+    };
+
+  t = *p;
+  check (t <= 2, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_kind = *p;
+  p++;
+
+  return p;
+}
+
+
 /* Translate the IL symbol table located between DATA and END. Append the
    slots and symbols to OUT. */
 
@@ -339,6 +371,24 @@ translate (char *data, char *end, struct plugin_symtab *out)
   out->aux = aux;
 }
 
+static void
+parse_symtab_extension (char *data, char *end, struct plugin_symtab *out)
+{
+  unsigned i;
+
+  unsigned char version = *data;
+  data++;
+
+  /* Version 1 contains the following data per entry:
+     - symbol_type
+     - section_kind
+     .  */
+
+  if (version == 1)
+    for (i = 0; i < out->nsyms; i++)
+      data = parse_table_entry_extension (data, &out->syms[i]);
+}
+
 /* Free all memory that is no longer needed after writing the symbol
    resolution. */
 
@@ -431,7 +481,7 @@ finish_conflict_resolution (struct plugin_symtab *symtab,
 
   for (i = 0; i < symtab->nsyms; i++)
     { 
-      int resolution = LDPR_UNKNOWN;
+      char resolution = LDPR_UNKNOWN;
 
       if (symtab->aux[i].next_conflict == -1)
 	continue;
@@ -953,7 +1003,7 @@ process_symtab (void *data, const char *name, off_t offset, off_t length)
   char *s;
   char *secdatastart, *secdata;
 
-  if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
+  if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
     return 1;
 
   s = strrchr (name, '.');
@@ -995,6 +1045,59 @@ err:
   return 0;
 }
 
+/* Process one section of an object file.  */
+
+static int
+process_symtab_extension (void *data, const char *name, off_t offset,
+			  off_t length)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  char *s;
+  char *secdatastart, *secdata;
+
+  if (strncmp (name, LTO_SYMTAB_EXT_PREFIX, LTO_SYMTAB_EXT_PREFIX_LEN) != 0)
+    return 1;
+
+  s = strrchr (name, '.');
+  if (s)
+    sscanf (s, ".%" PRI_LL "x", &obj->out->id);
+  secdata = secdatastart = xmalloc (length);
+  offset += obj->file->offset;
+  if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+    goto err;
+
+  do
+    {
+      ssize_t got = read (obj->file->fd, secdata, length);
+      if (got == 0)
+	break;
+      else if (got > 0)
+	{
+	  secdata += got;
+	  length -= got;
+	}
+      else if (errno != EINTR)
+	goto err;
+    }
+  while (length > 0);
+  if (length > 0)
+    goto err;
+
+  parse_symtab_extension (secdatastart, secdata, obj->out);
+  obj->found++;
+  free (secdatastart);
+  return 1;
+
+err:
+  if (message)
+    message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  /* Force claim_file_handler to abandon this file.  */
+  obj->found = 0;
+  free (secdatastart);
+  return 0;
+}
+
+
 /* Find an offload section of an object file.  */
 
 static int
@@ -1055,8 +1158,17 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
     goto err;
 
-  if (obj.objfile)
-    errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj, &err);
+   if (obj.objfile)
+    {
+      errmsg = simple_object_find_sections (obj.objfile, process_symtab, &obj,
+					    &err);
+      /*  Parsing symtab extension should be done only for add_symbols_v2 and
+	  later versions.  */
+      if (!errmsg && add_symbols_v2 != NULL)
+	errmsg = simple_object_find_sections (obj.objfile,
+					      process_symtab_extension,
+					      &obj, &err);
+    }
 
   if (!obj.objfile || errmsg)
     {
@@ -1080,8 +1192,12 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 
   if (obj.found > 0)
     {
-      status = add_symbols (file->handle, lto_file.symtab.nsyms,
-			    lto_file.symtab.syms);
+      if (add_symbols_v2)
+	status = add_symbols_v2 (file->handle, lto_file.symtab.nsyms,
+				 lto_file.symtab.syms);
+      else
+	status = add_symbols (file->handle, lto_file.symtab.nsyms,
+			      lto_file.symtab.syms);
       check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
       num_claimed_files++;
@@ -1242,6 +1358,9 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_REGISTER_CLAIM_FILE_HOOK:
 	  register_claim_file = p->tv_u.tv_register_claim_file;
 	  break;
+	case LDPT_ADD_SYMBOLS_V2:
+	  add_symbols_v2 = p->tv_u.tv_add_symbols;
+	  break;
 	case LDPT_ADD_SYMBOLS:
 	  add_symbols = p->tv_u.tv_add_symbols;
 	  break;
-- 
2.25.1


[-- Attachment #3: 0001-Update-include-plugin-api.h.patch --]
[-- Type: text/x-patch, Size: 1805 bytes --]

From a0e10ea38d988eef85ee2514be265a40e8416cac Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 12 Mar 2020 13:15:48 +0100
Subject: [PATCH 1/2] Update include/plugin-api.h.

include/ChangeLog:

2020-03-12  Martin Liska  <mliska@suse.cz>

	* plugin-api.h (struct ld_plugin_symbol): Split
	int def into 4 char fields.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_symbol_section_kind): New.
	(enum ld_plugin_tag): Add LDPT_ADD_SYMBOLS_V2.
---
 include/plugin-api.h | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..f0f9667bf3b 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,19 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+  /* This is for compatibility with older ABIs.  The older ABI defined
+     only 'def' field.  */
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_kind;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_kind;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +135,21 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_kind
+{
+  LDSSK_DEFAULT,
+  LDSSK_BSS
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +458,8 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33
 };
 
 /* The plugin transfer vector.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 15:00                                       ` Martin Liška
@ 2020-03-19 15:46                                         ` Richard Biener
  2020-03-19 15:50                                           ` H.J. Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-19 15:46 UTC (permalink / raw)
  To: Martin Liška, Binutils; +Cc: Jan Hubicka, GCC Patches, H. J. Lu

On Thu, Mar 19, 2020 at 4:00 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/19/20 10:12 AM, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 3/18/20 12:27 AM, Jan Hubicka wrote:
> >>>> Hi.
> >>>>
> >>>> There's updated version of the patch.
> >>>> Changes from the previous version:
> >>>> - comment added to ld_plugin_symbol
> >>>> - new section renamed to ext_symtab
> >>>> - assert added for loop iterations in produce_symtab and produce_symtab_extension
> >>> Hi,
> >>> I hope this is last version of the patch.
> >>
> >> Hello.
> >>
> >> Yes.
> >>
> >>>>
> >>>> 2020-03-12  Martin Liska  <mliska@suse.cz>
> >>>>
> >>>>       * lto-section-in.c: Add extension_symtab.
> >>> ext_symtab  :)
> >>
> >> Fixed.
> >>
> >>>> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >>>> index c17dd69dbdd..78b015be696 100644
> >>>> --- a/gcc/lto-section-in.c
> >>>> +++ b/gcc/lto-section-in.c
> >>>> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >>>>      "mode_table",
> >>>>      "hsa",
> >>>>      "lto",
> >>>> -  "ipa_sra"
> >>>> +  "ipa_sra",
> >>>> +  "ext_symtab"
> >>> I would move ext_symtab next to symtab so the sections remains at least
> >>> bit reasonably ordered.
> >>
> >> Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version.
> >>
> >>>>
> >>>> +/* Write extension information for symbols (symbol type, section flags).  */
> >>>> +
> >>>> +static void
> >>>> +write_symbol_extension_info (tree t)
> >>>> +{
> >>>> +  unsigned char c;
> >>> Do we still use vertical whitespace after decls per GNU coding style?
> >>
> >> Dunno. This seems to me like a nit.
> >>
> >>>> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> >>>> index 25bf6c468f7..4f82b439360 100644
> >>>> --- a/gcc/lto-streamer.h
> >>>> +++ b/gcc/lto-streamer.h
> >>>> @@ -236,6 +236,7 @@ enum lto_section_type
> >>>>      LTO_section_ipa_hsa,
> >>>>      LTO_section_lto,
> >>>>      LTO_section_ipa_sra,
> >>>> +  LTO_section_symtab_extension,
> >>> I guess symtab_ext to match the actual section name?
> >>
> >> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive
> >> enum names.
> >>
> >>>>      LTO_N_SECTION_TYPES              /* Must be last.  */
> >>>>    };
> >>>>
> >>>> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> >>>> index 0ce0de10121..47f0ff27df8 100644
> >>>> --- a/include/lto-symtab.h
> >>>> +++ b/include/lto-symtab.h
> >>>> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> >>>>        GCCPV_HIDDEN
> >>>>      };
> >>>>
> >>>> +enum gcc_plugin_symbol_type
> >>>> +{
> >>>> +  GCCST_UNKNOWN,
> >>>> +  GCCST_FUNCTION,
> >>>> +  GCCST_VARIABLE,
> >>>> +};
> >>>> +
> >>>> +enum gcc_plugin_symbol_section_flags
> >>>> +{
> >>>> +  GCCSSS_BSS = 1
> >>>> +};
> >>>
> >>> Probably comments here?
> >>
> >> No. There are just shadow copy of enum types from plugin-api.h which
> >> are documented.
> >>
> >>>> +
> >>>>    #endif /* GCC_LTO_SYMTAB_H  */
> >>>> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> >>>> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
> >>>> +   Returns the address of the next entry. */
> >>>> +
> >>>> +static char *
> >>>> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> >>>> +{
> >>>> +  unsigned char t;
> >>>> +  enum ld_plugin_symbol_type symbol_types[] =
> >>>> +    {
> >>>> +      LDST_UNKNOWN,
> >>>> +      LDST_FUNCTION,
> >>>> +      LDST_VARIABLE,
> >>>> +    };
> >>>> +
> >>>> +  t = *p;
> >>>> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> >>>> +  entry->symbol_type = symbol_types[t];
> >>>> +  p++;
> >>>> +  entry->section_flags = *p;
> >>>> +  p++;
> >>>> +
> >>>> +  return p;
> >>>> +}
> >>>
> >>> I think we have chance to make some plan for future extensions without
> >>> introducing too many additional sections.
> >>>
> >>> Currently there are 2 bytes per entry, while only 3 bits are actively
> >>> used of them.  If we invent next flag to pass we can use unused bits
> >>> however we need a way to indicate to plugin that the bit is defined.
> >>> This could be done by a simple version byte at the beggining of
> >>> ext_symtab section which will be 0 now and once we define extra bits we
> >>> bump it up to 1.
> >>
> >> I like the suggested change, it can help us in the future.
> >>
> >>>
> >>> It is not that important given that even empty file results in 2k LTO
> >>> object file, but I think it would be nicer in longer run.
> >>>> +  /* This is for compatibility with older ABIs.  */
> >>> Perhaps say here that this ABI defined only "int def;"
> >>
> >> Good point.
> >>
> >>>
> >>> The patch look good to me. Thanks for the work!
> >>
> >> Thanks. I'm sending updated patch that I've just tested on lto.exp and
> >> both binutils master and HJ's branch that utilizes the new API.
> >
> > @@ -495,10 +560,16 @@ write_resolution (void)
> >
> >         /* Version 2 of API supports IRONLY_EXP resolution that is
> >            accepted by GCC-4.7 and newer.  */
> > -      if (get_symbols_v2)
> > -        get_symbols_v2 (info->handle, symtab->nsyms, syms);
> > +      if (get_symbols_v4)
> > +       get_symbols_v4 (info->handle, symtab->nsyms, syms);
> >         else
> > -        get_symbols (info->handle, symtab->nsyms, syms);
> > +       {
> > +         clear_new_symtab_flags (symtab);
> >
> > can you instead just avoid parsing the ext symtab?
>
> Yes, I simplified the changes and I bet we'll only need one new hook get_symbols_v2.
> Then we can base parsing of the external symtab on that.
>
> >
> > +         if (get_symbols_v2)
> > +           get_symbols_v2 (info->handle, symtab->nsyms, syms);
> > +         else
> > +           get_symbols (info->handle, symtab->nsyms, syms);
> > +       }
> >
> > I guess this also points to the fact that LDs symbol resolution
> > can't tell GCC it chose "BSS" (from a non-IL object) or that
> > it chose a variable or function.
> >
> > @@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
> >     entry->visibility = translate_visibility[t];
> >     p++;
> >
> > +  entry->unused = 0;
> > +
> >     memcpy (&entry->size, p, sizeof (uint64_t));
> >     p += 8;
> >
> > isn't that either not enough or too much clearing?
> > I'd have expected
> >
> >   entry->unused = entry->section_flags = entry->symbol_type = 0;
> >
> > _before_
> >
> >    t = *p;
> >    check (t <= 4, LDPL_FATAL, "invalid symbol kind found");
> >    entry->def = translate_kind[t];
> >    p++;
> >
> > ?
>
> Yes, that's better.
>
> >
> > +enum ld_plugin_symbol_section_flags
> > +{
> > +  LDSSS_BSS = 1
> > +};
> > +
> >
> > please add a symbolic name for the value zero,
> > may I suggest LDSSS_DEFAULT?  I see you've
> > settled on symbol_section_flags rather than
> > section_kind (BSS is not really a flag...).
>
> I renamed that to ld_plugin_symbol_section_kind.
>
> May I install the 2 patches now? It's again tested on lto.exp
> and both old binutils and new HJ's binutils patch and it works as expected.

OK if HJ is happy, lto-symtab.h is owned by binutils I think and existing
users might need to be adjusted to clear the unused fields now that the
width of 'def' changed.

Richard.

> Martin
>
> >
> > Otherwise looks OK to me.
> >
> > Thanks,
> > Richard.
> >
> >> Martin
> >>
> >>> Honza
> >>>> +#ifdef __BIG_ENDIAN__
> >>>> +  char unused;
> >>>> +  char section_flags;
> >>>> +  char symbol_type;
> >>>> +  char def;
> >>>> +#else
> >>>> +  char def;
> >>>> +  char symbol_type;
> >>>> +  char section_flags;
> >>>> +  char unused;
> >>>> +#endif
> >>>>      int visibility;
> >>>>      uint64_t size;
> >>>>      char *comdat_key;
> >>>> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
> >>>>      LDPV_HIDDEN
> >>>>    };
> >>>>
> >>>> +/* The type of the symbol.  */
> >>>> +
> >>>> +enum ld_plugin_symbol_type
> >>>> +{
> >>>> +  LDST_UNKNOWN,
> >>>> +  LDST_FUNCTION,
> >>>> +  LDST_VARIABLE,
> >>>> +};
> >>>> +
> >>>> +enum ld_plugin_symbol_section_flags
> >>>> +{
> >>>> +  LDSSS_BSS = 1
> >>>> +};
> >>>> +
> >>>>    /* How a symbol is resolved.  */
> >>>>
> >>>>    enum ld_plugin_symbol_resolution
> >>>> @@ -431,7 +456,9 @@ enum ld_plugin_tag
> >>>>      LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
> >>>>      LDPT_GET_INPUT_SECTION_SIZE = 30,
> >>>>      LDPT_REGISTER_NEW_INPUT_HOOK = 31,
> >>>> -  LDPT_GET_WRAP_SYMBOLS = 32
> >>>> +  LDPT_GET_WRAP_SYMBOLS = 32,
> >>>> +  LDPT_ADD_SYMBOLS_V2 = 33,
> >>>> +  LDPT_GET_SYMBOLS_V4 = 34,
> >>>>    };
> >>>>
> >>>>    /* The plugin transfer vector.  */
> >>>> --
> >>>> 2.25.1
> >>>>
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 15:46                                         ` Richard Biener
@ 2020-03-19 15:50                                           ` H.J. Lu
  2020-03-19 16:00                                             ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-19 15:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, Binutils, Jan Hubicka, GCC Patches

On Thu, Mar 19, 2020 at 8:46 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Mar 19, 2020 at 4:00 PM Martin Liška <mliska@suse.cz> wrote:
> >
> > On 3/19/20 10:12 AM, Richard Biener wrote:
> > > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška <mliska@suse.cz> wrote:
> > >>
> > >> On 3/18/20 12:27 AM, Jan Hubicka wrote:
> > >>>> Hi.
> > >>>>
> > >>>> There's updated version of the patch.
> > >>>> Changes from the previous version:
> > >>>> - comment added to ld_plugin_symbol
> > >>>> - new section renamed to ext_symtab
> > >>>> - assert added for loop iterations in produce_symtab and produce_symtab_extension
> > >>> Hi,
> > >>> I hope this is last version of the patch.
> > >>
> > >> Hello.
> > >>
> > >> Yes.
> > >>
> > >>>>
> > >>>> 2020-03-12  Martin Liska  <mliska@suse.cz>
> > >>>>
> > >>>>       * lto-section-in.c: Add extension_symtab.
> > >>> ext_symtab  :)
> > >>
> > >> Fixed.
> > >>
> > >>>> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> > >>>> index c17dd69dbdd..78b015be696 100644
> > >>>> --- a/gcc/lto-section-in.c
> > >>>> +++ b/gcc/lto-section-in.c
> > >>>> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> > >>>>      "mode_table",
> > >>>>      "hsa",
> > >>>>      "lto",
> > >>>> -  "ipa_sra"
> > >>>> +  "ipa_sra",
> > >>>> +  "ext_symtab"
> > >>> I would move ext_symtab next to symtab so the sections remains at least
> > >>> bit reasonably ordered.
> > >>
> > >> Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version.
> > >>
> > >>>>
> > >>>> +/* Write extension information for symbols (symbol type, section flags).  */
> > >>>> +
> > >>>> +static void
> > >>>> +write_symbol_extension_info (tree t)
> > >>>> +{
> > >>>> +  unsigned char c;
> > >>> Do we still use vertical whitespace after decls per GNU coding style?
> > >>
> > >> Dunno. This seems to me like a nit.
> > >>
> > >>>> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> > >>>> index 25bf6c468f7..4f82b439360 100644
> > >>>> --- a/gcc/lto-streamer.h
> > >>>> +++ b/gcc/lto-streamer.h
> > >>>> @@ -236,6 +236,7 @@ enum lto_section_type
> > >>>>      LTO_section_ipa_hsa,
> > >>>>      LTO_section_lto,
> > >>>>      LTO_section_ipa_sra,
> > >>>> +  LTO_section_symtab_extension,
> > >>> I guess symtab_ext to match the actual section name?
> > >>
> > >> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive
> > >> enum names.
> > >>
> > >>>>      LTO_N_SECTION_TYPES              /* Must be last.  */
> > >>>>    };
> > >>>>
> > >>>> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> > >>>> index 0ce0de10121..47f0ff27df8 100644
> > >>>> --- a/include/lto-symtab.h
> > >>>> +++ b/include/lto-symtab.h
> > >>>> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> > >>>>        GCCPV_HIDDEN
> > >>>>      };
> > >>>>
> > >>>> +enum gcc_plugin_symbol_type
> > >>>> +{
> > >>>> +  GCCST_UNKNOWN,
> > >>>> +  GCCST_FUNCTION,
> > >>>> +  GCCST_VARIABLE,
> > >>>> +};
> > >>>> +
> > >>>> +enum gcc_plugin_symbol_section_flags
> > >>>> +{
> > >>>> +  GCCSSS_BSS = 1
> > >>>> +};
> > >>>
> > >>> Probably comments here?
> > >>
> > >> No. There are just shadow copy of enum types from plugin-api.h which
> > >> are documented.
> > >>
> > >>>> +
> > >>>>    #endif /* GCC_LTO_SYMTAB_H  */
> > >>>> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> > >>>> +   by P and the result is written in ENTRY. The slot number is stored in SLOT.
> > >>>> +   Returns the address of the next entry. */
> > >>>> +
> > >>>> +static char *
> > >>>> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> > >>>> +{
> > >>>> +  unsigned char t;
> > >>>> +  enum ld_plugin_symbol_type symbol_types[] =
> > >>>> +    {
> > >>>> +      LDST_UNKNOWN,
> > >>>> +      LDST_FUNCTION,
> > >>>> +      LDST_VARIABLE,
> > >>>> +    };
> > >>>> +
> > >>>> +  t = *p;
> > >>>> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> > >>>> +  entry->symbol_type = symbol_types[t];
> > >>>> +  p++;
> > >>>> +  entry->section_flags = *p;
> > >>>> +  p++;
> > >>>> +
> > >>>> +  return p;
> > >>>> +}
> > >>>
> > >>> I think we have chance to make some plan for future extensions without
> > >>> introducing too many additional sections.
> > >>>
> > >>> Currently there are 2 bytes per entry, while only 3 bits are actively
> > >>> used of them.  If we invent next flag to pass we can use unused bits
> > >>> however we need a way to indicate to plugin that the bit is defined.
> > >>> This could be done by a simple version byte at the beggining of
> > >>> ext_symtab section which will be 0 now and once we define extra bits we
> > >>> bump it up to 1.
> > >>
> > >> I like the suggested change, it can help us in the future.
> > >>
> > >>>
> > >>> It is not that important given that even empty file results in 2k LTO
> > >>> object file, but I think it would be nicer in longer run.
> > >>>> +  /* This is for compatibility with older ABIs.  */
> > >>> Perhaps say here that this ABI defined only "int def;"
> > >>
> > >> Good point.
> > >>
> > >>>
> > >>> The patch look good to me. Thanks for the work!
> > >>
> > >> Thanks. I'm sending updated patch that I've just tested on lto.exp and
> > >> both binutils master and HJ's branch that utilizes the new API.
> > >
> > > @@ -495,10 +560,16 @@ write_resolution (void)
> > >
> > >         /* Version 2 of API supports IRONLY_EXP resolution that is
> > >            accepted by GCC-4.7 and newer.  */
> > > -      if (get_symbols_v2)
> > > -        get_symbols_v2 (info->handle, symtab->nsyms, syms);
> > > +      if (get_symbols_v4)
> > > +       get_symbols_v4 (info->handle, symtab->nsyms, syms);
> > >         else
> > > -        get_symbols (info->handle, symtab->nsyms, syms);
> > > +       {
> > > +         clear_new_symtab_flags (symtab);
> > >
> > > can you instead just avoid parsing the ext symtab?
> >
> > Yes, I simplified the changes and I bet we'll only need one new hook get_symbols_v2.
> > Then we can base parsing of the external symtab on that.
> >
> > >
> > > +         if (get_symbols_v2)
> > > +           get_symbols_v2 (info->handle, symtab->nsyms, syms);
> > > +         else
> > > +           get_symbols (info->handle, symtab->nsyms, syms);
> > > +       }
> > >
> > > I guess this also points to the fact that LDs symbol resolution
> > > can't tell GCC it chose "BSS" (from a non-IL object) or that
> > > it chose a variable or function.
> > >
> > > @@ -296,6 +300,8 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
> > >     entry->visibility = translate_visibility[t];
> > >     p++;
> > >
> > > +  entry->unused = 0;
> > > +
> > >     memcpy (&entry->size, p, sizeof (uint64_t));
> > >     p += 8;
> > >
> > > isn't that either not enough or too much clearing?
> > > I'd have expected
> > >
> > >   entry->unused = entry->section_flags = entry->symbol_type = 0;
> > >
> > > _before_
> > >
> > >    t = *p;
> > >    check (t <= 4, LDPL_FATAL, "invalid symbol kind found");
> > >    entry->def = translate_kind[t];
> > >    p++;
> > >
> > > ?
> >
> > Yes, that's better.
> >
> > >
> > > +enum ld_plugin_symbol_section_flags
> > > +{
> > > +  LDSSS_BSS = 1
> > > +};
> > > +
> > >
> > > please add a symbolic name for the value zero,
> > > may I suggest LDSSS_DEFAULT?  I see you've
> > > settled on symbol_section_flags rather than
> > > section_kind (BSS is not really a flag...).
> >
> > I renamed that to ld_plugin_symbol_section_kind.
> >
> > May I install the 2 patches now? It's again tested on lto.exp
> > and both old binutils and new HJ's binutils patch and it works as expected.
>
> OK if HJ is happy, lto-symtab.h is owned by binutils I think and existing
> users might need to be adjusted to clear the unused fields now that the
> width of 'def' changed.
>
> Richard.

I like it and I will take case of binutils side.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 15:50                                           ` H.J. Lu
@ 2020-03-19 16:00                                             ` Martin Liška
  2020-03-19 16:51                                               ` H.J. Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-19 16:00 UTC (permalink / raw)
  To: H.J. Lu, Richard Biener; +Cc: Binutils, Jan Hubicka, GCC Patches

On 3/19/20 4:50 PM, H.J. Lu wrote:
> I like it and I will take case of binutils side.
> 
> Thanks.

Great. I've just installed the 2 patches to master.

Martin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 16:00                                             ` Martin Liška
@ 2020-03-19 16:51                                               ` H.J. Lu
  2020-03-19 19:56                                                 ` Richard Biener
  0 siblings, 1 reply; 55+ messages in thread
From: H.J. Lu @ 2020-03-19 16:51 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, Binutils, Jan Hubicka, GCC Patches

On Thu, Mar 19, 2020 at 9:00 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 3/19/20 4:50 PM, H.J. Lu wrote:
> > I like it and I will take case of binutils side.
> >
> > Thanks.
>
> Great. I've just installed the 2 patches to master.
>

Here is the binutils patch:

https://sourceware.org/pipermail/binutils/2020-March/110295.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 16:51                                               ` H.J. Lu
@ 2020-03-19 19:56                                                 ` Richard Biener
  2020-03-19 20:07                                                   ` H.J. Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Biener @ 2020-03-19 19:56 UTC (permalink / raw)
  To: H.J. Lu, Martin Liška; +Cc: Binutils, Jan Hubicka, GCC Patches

On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>On Thu, Mar 19, 2020 at 9:00 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 3/19/20 4:50 PM, H.J. Lu wrote:
>> > I like it and I will take case of binutils side.
>> >
>> > Thanks.
>>
>> Great. I've just installed the 2 patches to master.
>>
>
>Here is the binutils patch:
>
>https://sourceware.org/pipermail/binutils/2020-March/110295.html

If the plugin advertises v2 support but the LTO IL files lack symtab_ext data what happens? 

Richard. 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-19 19:56                                                 ` Richard Biener
@ 2020-03-19 20:07                                                   ` H.J. Lu
  0 siblings, 0 replies; 55+ messages in thread
From: H.J. Lu @ 2020-03-19 20:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, Binutils, Jan Hubicka, GCC Patches

On Thu, Mar 19, 2020 at 12:56 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> >On Thu, Mar 19, 2020 at 9:00 AM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 3/19/20 4:50 PM, H.J. Lu wrote:
> >> > I like it and I will take case of binutils side.
> >> >
> >> > Thanks.
> >>
> >> Great. I've just installed the 2 patches to master.
> >>
> >
> >Here is the binutils patch:
> >
> >https://sourceware.org/pipermail/binutils/2020-March/110295.html
>
> If the plugin advertises v2 support but the LTO IL files lack symtab_ext data what happens?

Did you mean symbol_type == LDST_UNKNOWN? In that case, you get the
text symbol for everything.


-- 
H.J.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-09 16:56   ` Martin Liška
  2020-03-09 18:25     ` H.J. Lu
  2020-03-09 19:45     ` Michael Matz
@ 2020-03-26 16:54     ` Jeff Law
  2020-03-27  9:10       ` Martin Liška
  2 siblings, 1 reply; 55+ messages in thread
From: Jeff Law @ 2020-03-26 16:54 UTC (permalink / raw)
  To: Martin Liška, H.J. Lu; +Cc: GCC Patches, Jan Hubicka

On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
> 
> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> 
> About BSS and DATA I agree that it would be handy. I can theoretically
> covered with code in get_variable_section/bss_initializer_p. But it's
> quite logic and I'm not sure we should simulate it.
> 
> @Honza/Richi: Do you have any opinion about that?
Or could we just fix the damn broken configure scripts?  Isn't that what's
driving all this nonsense?

The HP-UX bits for this in configure work correctly.    I verified that months
ago and even have a sed script which will take a broken configure script and fix
it.

jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-26 16:54     ` Jeff Law
@ 2020-03-27  9:10       ` Martin Liška
  2020-03-27 14:21         ` Jeff Law
  0 siblings, 1 reply; 55+ messages in thread
From: Martin Liška @ 2020-03-27  9:10 UTC (permalink / raw)
  To: law, H.J. Lu; +Cc: GCC Patches, Jan Hubicka

On 3/26/20 5:54 PM, Jeff Law wrote:
> On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
>> On 3/9/20 4:36 PM, H.J. Lu wrote:
>>> We nee to support different variables, like TLS, data and bss variables.
>>
>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
>>
>> About BSS and DATA I agree that it would be handy. I can theoretically
>> covered with code in get_variable_section/bss_initializer_p. But it's
>> quite logic and I'm not sure we should simulate it.
>>
>> @Honza/Richi: Do you have any opinion about that?
> Or could we just fix the damn broken configure scripts?  Isn't that what's
> driving all this nonsense?

Hello.

Yes. Apparently libtool people were unable to come up with a reasonable
workaround for that. One can read technical details in this PR:
https://sourceware.org/bugzilla/show_bug.cgi?id=25355

> 
> The HP-UX bits for this in configure work correctly.    I verified that months
> ago and even have a sed script which will take a broken configure script and fix
> it.

How many packages will you have to fix?

The current LTO plug-in extension will work fine with gcc10 + latest bintuils release
branch. That's a viable solution for us openSUSE.

Martin

> 
> jeff
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-27  9:10       ` Martin Liška
@ 2020-03-27 14:21         ` Jeff Law
  2020-03-27 15:21           ` Martin Liška
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff Law @ 2020-03-27 14:21 UTC (permalink / raw)
  To: Martin Liška, H.J. Lu; +Cc: GCC Patches, Jan Hubicka

On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote:
> On 3/26/20 5:54 PM, Jeff Law wrote:
> > On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
> > > On 3/9/20 4:36 PM, H.J. Lu wrote:
> > > > We nee to support different variables, like TLS, data and bss variables.
> > > 
> > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> > > 
> > > About BSS and DATA I agree that it would be handy. I can theoretically
> > > covered with code in get_variable_section/bss_initializer_p. But it's
> > > quite logic and I'm not sure we should simulate it.
> > > 
> > > @Honza/Richi: Do you have any opinion about that?
> > Or could we just fix the damn broken configure scripts?  Isn't that what's
> > driving all this nonsense?
> 
> Hello.
> 
> Yes. Apparently libtool people were unable to come up with a reasonable
> workaround for that. One can read technical details in this PR:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25355
What's funny is that discussion doesn't mention the HP-UX bits which handle this
correctly.  However, I think my tests were all done without the common symbol
handling change we made late last year.  So that may complicate things.


> 
> > The HP-UX bits for this in configure work correctly.    I verified that
> > months
> > ago and even have a sed script which will take a broken configure script and
> > fix
> > it.
> 
> How many packages will you have to fix?
IIRC it's less than a half-dozen.  I found them by capturing the generated
config.h files with and without LTO enabled and comparing them.  I don't offhand
know if the differences caused the packages to mis-behave or not -- we just
failed any build where those config.h files differed.

> 
> The current LTO plug-in extension will work fine with gcc10 + latest bintuils
> release
> branch. That's a viable solution for us openSUSE.
If it's working, then great.  But I worry about having to keep binutils/gcc
consistent.  In fact, what brought me into the conversation again was them
getting out of sync in the last few days and binutils tests started failing.

jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH][RFC] API extension for binutils (type of symbols).
  2020-03-27 14:21         ` Jeff Law
@ 2020-03-27 15:21           ` Martin Liška
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Liška @ 2020-03-27 15:21 UTC (permalink / raw)
  To: law, H.J. Lu; +Cc: GCC Patches, Jan Hubicka

On 3/27/20 3:21 PM, Jeff Law wrote:
> On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote:
>> On 3/26/20 5:54 PM, Jeff Law wrote:
>>> On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
>>>> On 3/9/20 4:36 PM, H.J. Lu wrote:
>>>>> We nee to support different variables, like TLS, data and bss variables.
>>>>
>>>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
>>>>
>>>> About BSS and DATA I agree that it would be handy. I can theoretically
>>>> covered with code in get_variable_section/bss_initializer_p. But it's
>>>> quite logic and I'm not sure we should simulate it.
>>>>
>>>> @Honza/Richi: Do you have any opinion about that?
>>> Or could we just fix the damn broken configure scripts?  Isn't that what's
>>> driving all this nonsense?
>>
>> Hello.
>>
>> Yes. Apparently libtool people were unable to come up with a reasonable
>> workaround for that. One can read technical details in this PR:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=25355
> What's funny is that discussion doesn't mention the HP-UX bits which handle this
> correctly.  However, I think my tests were all done without the common symbol
> handling change we made late last year.  So that may complicate things.

Note that the switching to -fno-common was actually the trigger of the issues.
Before the change, one can easily disambiguate global vars and functions based
on the LDPK_COMMON symbol_kind.

> 
> 
>>
>>> The HP-UX bits for this in configure work correctly.    I verified that
>>> months
>>> ago and even have a sed script which will take a broken configure script and
>>> fix
>>> it.
>>
>> How many packages will you have to fix?
> IIRC it's less than a half-dozen.  I found them by capturing the generated
> config.h files with and without LTO enabled and comparing them.  I don't offhand
> know if the differences caused the packages to mis-behave or not -- we just
> failed any build where those config.h files differed.
> 
>>
>> The current LTO plug-in extension will work fine with gcc10 + latest bintuils
>> release
>> branch. That's a viable solution for us openSUSE.
> If it's working, then great.  But I worry about having to keep binutils/gcc
> consistent.  In fact, what brought me into the conversation again was them
> getting out of sync in the last few days and binutils tests started failing.

It was probably caused by H.J.'s attempt to use lto-wrapper for nm (for LTO bytecode)
which had various issues.

Martin

> 
> jeff
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2020-03-27 15:21 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-09  9:30 [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
2020-03-09 15:36 ` H.J. Lu
2020-03-09 16:56   ` Martin Liška
2020-03-09 18:25     ` H.J. Lu
2020-03-09 20:19       ` Jan Hubicka
2020-03-10 10:05         ` Martin Liška
2020-03-10 10:49           ` Richard Biener
2020-03-10 11:09             ` Jan Hubicka
2020-03-10 11:24               ` Richard Biener
2020-03-10 12:07                 ` Martin Liška
2020-03-11  9:19                   ` Martin Liška
2020-03-11 10:22                     ` Richard Biener
2020-03-11 10:30                       ` Richard Biener
2020-03-11 12:22                       ` Martin Liška
2020-03-11 12:51                         ` Richard Biener
2020-03-11 13:07                           ` Martin Liška
2020-03-11 13:24                             ` H.J. Lu
2020-03-11 13:28                               ` Martin Liška
2020-03-11 14:16                             ` Alexander Monakov
2020-03-11 14:15                           ` Martin Liška
2020-03-12 10:36                             ` Strange read for simple_object_elf_find_sections for .a file Martin Liška
2020-03-12 10:39                               ` Martin Liška
2020-03-12 12:32                             ` [PATCH][RFC] API extension for binutils (type of symbols) Martin Liška
2020-03-12 12:55                               ` Jan Hubicka
2020-03-12 13:10                                 ` Martin Liška
2020-03-12 13:15                                   ` Richard Biener
2020-03-12 13:24                                     ` Martin Liška
2020-03-12 13:31                               ` Martin Liška
2020-03-16 11:12                                 ` H.J. Lu
2020-03-16 13:50                                   ` Martin Liška
2020-03-16 14:34                                     ` H.J. Lu
2020-03-16 14:38                                       ` Martin Liška
2020-03-17 23:27                                 ` Jan Hubicka
2020-03-18  8:52                                   ` Martin Liška
2020-03-18  8:53                                     ` [PATCH] Bump LTO bytecode version Martin Liška
2020-03-18  8:56                                       ` Richard Biener
2020-03-18  9:00                                         ` Martin Liška
2020-03-18  9:34                                           ` Richard Biener
2020-03-18 10:20                                             ` Martin Liška
2020-03-19  9:12                                     ` [PATCH][RFC] API extension for binutils (type of symbols) Richard Biener
2020-03-19 15:00                                       ` Martin Liška
2020-03-19 15:46                                         ` Richard Biener
2020-03-19 15:50                                           ` H.J. Lu
2020-03-19 16:00                                             ` Martin Liška
2020-03-19 16:51                                               ` H.J. Lu
2020-03-19 19:56                                                 ` Richard Biener
2020-03-19 20:07                                                   ` H.J. Lu
2020-03-09 19:45     ` Michael Matz
2020-03-10  9:39       ` Martin Liška
2020-03-10 12:20         ` Martin Liška
2020-03-10 14:26         ` Michael Matz
2020-03-26 16:54     ` Jeff Law
2020-03-27  9:10       ` Martin Liška
2020-03-27 14:21         ` Jeff Law
2020-03-27 15:21           ` Martin Liška

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).