public inbox for gnu-gabi@sourceware.org
 help / color / mirror / Atom feed
* [RFC] Proposal for new ELF extension - "Symbol meta-information"
@ 2020-08-31 11:58 Jozef Lawrynowicz
  2020-08-31 12:23 ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Jozef Lawrynowicz @ 2020-08-31 11:58 UTC (permalink / raw)
  To: gnu-gabi

[-- Attachment #1: Type: text/plain, Size: 39657 bytes --]

Hi,

I'd like to get some feedback on whether this "Symbol Meta-Information" ELF
extension would be considered for inclusion in the GNU gABI.

"Symbol Meta-Information" is a new mechanism used to describe additional
information about symbols.

I proposed it for inclusion in the ELF gABI, however it was rejected since the
type of meta-information supported is not generic enough to be applicable to all
targets.
You can read that discussion here:
https://groups.google.com/forum/#!topic/generic-abi/QPgYf3-_Iyw

I wonder if it is appropriate for inclusion in the GNU gABI or if we should just
add it to a processor-specific ABI. However, it is being used downstream for
MSP430 and ARM targets, in GCC and Clang/LLVM respectively. So we'd like to have
it standardized somewhere generic so that different targets and toolchains can
align on it.

HTML versions of the full proposal and ELF implementation details are available
here:
http://www.mittosystems.com/metainfo/elf-symbol-meta-information-proposal.html
http://www.mittosystems.com/metainfo/elf-symbol-meta-information-implementation.html

I've attached nicely formatted PDF versions of the documents, but perhaps the
mailing list will scrub them. There are links to the PDF versions on the
web pages above.

Below is also a plain-text version of the documents.

Note that I haven't ammended any of the specific details to make the
functionality "operating system" specific; it is still just proposed as
a generic ELF extension.

I look forward to hearing your thoughts,
Jozef Lawrynowicz

------------

ELF Symbol Meta-Information

Developed by Todd Snider (Texas Instruments)
in consultation with Jozef Lawrynowicz

Written by Jozef Lawrynowicz

August 2020

Table of Contents

1 Introduction
2 Background
  2.1 Motivation
  2.2 Alternative vehicles for symbol meta-information implementation
3 Design
  Abbreviations
  3.1 Symbol Meta-Information Table
  3.2 Symbol Meta-Information Table Entries
  3.3 Symbol Meta-Information Values
    3.3.1 Restrictions on applying symbol meta-information types to symbols
    3.3.2 SMT_NOINIT use case
    3.3.3 SMT_PRINTF_FMT use case
    3.3.4 Considerations for placement of SMT_LOCATION meta-information symbols (locsyms)
    3.3.5 Initialization of locsyms at program startup
4 Using Symbol Meta-Information
  4.1 Usage example
5 Conclusion
  5.1 Symbol meta-information benefits
  5.2 Symbol meta-information as an extension to the ELF gABI


1 Introduction

Here we propose a new mechanism for describing additional 
information about ELF symbols, called Symbol Meta-Information.

Symbol Meta-Information is intended to solve the problem of how 
the compiler or assembler can communicate information about 
symbols, not supported by existing ELF constructs, to downstream 
tools such as the linker and other consumers of ELF files. These 
consumers can then change how they handle the symbols, based on 
the supplementary information.

A new ELF special section named .symtab_meta enumerates which 
symbols have meta-information, the type of meta-information, and 
the associated value of that meta-information.

The use of attributes set on symbol declarations in the source 
code provides the programmer with a simple interface to the new 
functionality.

Symbol meta-information is designed to be extensible, with plenty 
of room for new types of meta-information to be added, and 
flexible, as the value of the meta-information can take on any 
format.

2 Background

2.1 Motivation

The modular nature of toolchain components means that 
communicating information from the source code through the build 
process to downstream tools is not always straightforward. Of 
course, this is partly why formats like ELF exist, but when those 
formats are reaching the limit of information that is able to be 
precisely described by them, programmers search for alternative 
solutions.

Placing code and data into special named sections is the most 
common method used to make the linker handle specific symbols in 
some non-standard way. A modified linker script with knowledge of 
these special sections can then be used to apply specific 
properties to the sections, such as saving them from garbage 
collection or placing them at specific memory addresses.

However, it can be inconvenient for programmers to modify linker 
scripts:

• Entire applications can be written without consideration for 
  the linker script, its existence perhaps acknowledged by the 
  programmer but otherwise being an opaque part of the build 
  process. The programmer may therefore lack knowledge of the 
  syntax of the linker script, or the ability to leverage the 
  full breadth of functionality available to achieve what they 
  want.

• In the context of embedded microcontrollers, linker scripts 
  provided by semiconductor manufacturers are usually specific to 
  a particular device, describing a unique combination of the 
  memory map, peripheral register addresses, vector table etc.

  – Modifying linker scripts can therefore be bothersome when an 
    application targets different devices, each with a unique 
    linker script, or when linker script updates from 
    semiconductor manufacturers require merging of downstream and 
    upstream changes.

• Linker scripts can have a large amount of boilerplate code, and 
  modifications to this boilerplate, as a side-effect to the 
  handling of any new special sections, can be error-prone.

Another way to supply additional information about a symbol is to 
give the symbol itself a special name. This requires the ELF file 
consumer program to have knowledge of the special name, and may 
not be desirable if it interferes with the way the symbol would 
be handled if it had its original name. Furthermore, since there 
is no opportunity in the gABI for the standardization of special 
names for code and data symbols to have some unique meaning, 
there is likely to be inconsistencies between processor and 
vendor support for any toolchains trying to make use of this 
mechanism.

2.2 Alternative vehicles for symbol meta-information 
  implementation

We acknowledge some existing constructs which could be used to 
supply additional information about ELF symbols, and describe why 
they are unsuitable vehicles for the proposed symbol 
meta-information functionality.

New symbol types or bindings

• If a type of symbol meta-information implied only one existing 
  symbol type or binding attribute, then the meta-information 
  type could be implemented as a new type or binding. However, 
  since the proposed symbol meta-information types support 
  symbols with different types and different bindings, this 
  approach would not work.

• There are only 3 remaining “slots” for generic symbol types and 
  it is desirable to have more than 3 new types of symbol 
  meta-information. There are further reserved ranges for 
  operating system-specific and processor-specific types, but it 
  would not be appropriate to use these for new types which have 
  generic use.

• Fundamentally, symbol meta-information supplies additional 
  information about symbols, and does not change the intrinsic 
  type of a symbol.

st_other member of symbol table entry

• st_other is only 8 bits in size and is used as a bit-mask. Bits 
  0 and 1 are reserved, with an additional proposal currently 
  pending to reserve bit 2 as well. The remaining bits 3-7 have 
  not been officially reserved but are all in use by a variety of 
  targets. Therefore, there are no remaining bits which can be 
  used without creating a conflict with some target or operating 
  system.

• There is no standard way to provide supplemental information 
  which gives a non-boolean value for the st_other field. Further 
  modifications, such as the creation of a special section, would 
  be required to provide non-boolean values to accompany the 
  st_other value.

Solaris SymInfo

• Solaris SymInfo specifically targets dynamic symbols, and the 
  proposed functionality should be available to targets which do 
  not support the concept of dynamic linking. SymInfo “types” are 
  flags that can be augmented by extracting a value from the 
  .dynamic section.

  – The .dynamic section is identified by the sh_info field of 
    the section header, and could arguably be repurposed to point 
    to some other section in cases when there are no dynamic 
    symbols with SymInfo entries. However, this behavior would 
    not be well defined when there is also a .dynamic section in 
    the file.

• The si_flags field, which describes the properties of the 
  associated symbol, is the size of a half-word. On a target 
  implementing 32-bit ELF, this would be 16-bits. Since the flags 
  are implemented as a bit-mask with 10 types already 
  implemented, there only remains space for 6 further types. This 
  is unlikely to be enough room for all current and future 
  meta-information types, especially once factoring in any 
  additional vendor or processor-specific extensions.

New ABI-mandated “Special Sections”

• A new type of ELF “special section” could be created for each 
  of the proposed new types of symbol meta-information. ELF file 
  consumers such as the linker would then handle these sections 
  in a specific way, without assistance from the linker script. 
  However, this has some downsides:

  – The user may not want to put a symbol in it's own section 
    just to make use of the desired functionality.

  – A special section for the symbol obscures the fact that the 
    meta-information is for a symbol, not a section.

  – If the sh_info member is used to provide an accompanying 
    value for the meta-information type, then only one value can 
    be specified per section, meaning symbols with the same type 
    might not be able to be grouped together in a section.

  – An application making use of a large amount of new special 
    sections to describe symbol meta-information could pollute 
    the section header table.

3 Design

Abbreviations

metasym
  Any type of meta-information symbol
locsym
  A meta-information symbol with type SMT_LOCATION

3.1 Symbol Meta-Information Table

ELF relocatable and executable files may contain a new section 
named .symtab_meta. This section can be omitted from ELF files if 
there is no meta-information for any symbols, but if present, 
there can only be one section with this name and type.

Table 1: 
Section types, sh_type
+------------------+-------+
|      Name        | Value |
+------------------+-------+
+------------------+-------+
| SHT_SYMTAB_META  |  19   |
+------------------+-------+

Table 2: 
sh_link and sh_info interpretation
+------------------+-----------------------------------------------------------+----------------------------------+
|      Name        |    sh_link              |                         sh_info                                    |
+------------------+-----------------------------------------------------------+----------------------------------+
+------------------+-----------------------------------------------------------+----------------------------------+
| SHT_SYMTAB_META  | The section header      | The format version number of the symbol meta-information table     |
|                  | index of the associated | (ELFxx_SMH_VER), and the section header index of the .strtab_meta  |
|                  | symbol table.           | string table used by entries in this section (ELFxx_SMH_STR).      |
+------------------+-----------------------------------------------------------+----------------------------------+

Sub-Table a: 
Accessors for the sh_info field
---
#define ELF32_SMH_STR(i)    ((i)>>8)
#define ELF32_SMH_VER(i)    ((unsigned char)(i))
#define ELF32_SMH_INFO(s,v) (((s)<<8)+(unsigned char)(v))

#define ELF64_SMH_STR(i)    ((i)>>32)
#define ELF64_SMH_VER(i)    ((i)&0xffffffffL)
#define ELF64_SMH_INFO(s,v) (((s)<<32)+((v)&0xffffffffL))


Sub-Table b: 
.symtab_meta versions
+--------+-------------------------------------------------------------------------------+
| Value  |                                    Meaning                                    |
+--------+-------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------+
|   0    |                                Invalid Version                                |
+--------+-------------------------------------------------------------------------------+
|   1    |             There is no header at the beginning of .symtab_meta.              |
+--------+-------------------------------------------------------------------------------+
|   2    | A header containing the hash of .symtab is at the beginning of                |
|        | .symtab_meta.                                                                 |
+--------+-------------------------------------------------------------------------------+


Table 3: 
Special Sections
+---------------+------------------+------------+
|     Name      |      Type        | Attributes |
+---------------+------------------+------------+
+---------------+------------------+------------+
| .symtab_meta  | SHT_SYMTAB_META  |    None    |
+---------------+------------------+------------+
| .strtab_meta  |   SHT_STRTAB     |    None    |
+---------------+------------------+------------+

Version 2 of the table has a short header, and a list of symbol 
meta-information entries follows.

(
typedef struct {
  unsigned char symtab_hash[20];
} Elf32_SMhdr;

typedef struct {
  unsigned char symtab_hash[20];
} Elf64_SMhdr;
)


symtab_hash
  For version >= 2, a 20-byte SHA-1 hash of the 
  entire contents of .symtab (taken once the symbol table indices 
  have been finalized) is used to verify .symtab has not been 
  modified by tools which do not recognize .symtab_meta. These 
  tools would not update the symbol index stored in the symbol 
  meta-information table entry when making changes to the 
  program, possibly corrupting the state of .symtab_meta.

----
3.2 Symbol Meta-Information Table Entries
----

Symbol meta-information table entries describe the symbol that 
the meta-information applies to, the type of meta-information, 
and the associated value of the meta-information.

The format of symbol meta-information table entries is physically 
identical to ELF Rel relocation entries. The smi_info field 
encodes the symbol table index of the corresponding symbol and 
the type of meta-information in the same way that the symbol 
table index and type of a relocation are encoded in the r_info 
field of relocation entries.

Figure 1: 
Structure of a .symtab_meta entry
(
typedef struct {
  Elf32_Addr smi_info;
  Elf32_Word smi_value;
} Elf32_SymMetaInfo;

typedef struct {
  Elf64_Addr  smi_info;
  Elf64_Xword smi_value;
} Elf64_SymMetaInfo;
)

smi_info
  This field describes both the symbol table index of 
  the ELF symbol this symbol meta-information this applies to, 
  and the type of meta-information entry this is. A number of 
  generic types are pre-defined. There are also reserved ranges 
  for processor-specific and application-specific (i.e. 
  vendor-specific) types.

Figure 2: 
Accessors for the smi_info field
(
#define ELF32_SMI_SYM(i)    ((i)>>8)
#define ELF32_SMI_TYPE(i)   ((unsigned char)(i))
#define ELF32_SMI_INFO(s,t) (((s)<<8)+(unsigned char)(t))

#define ELF64_SMI_SYM(i)    ((i)>>32)
#define ELF64_SMI_TYPE(i)   ((i)&0xffffffffL)
#define ELF64_SMI_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))
)

smi_value
  The interpretation depends on the associated type. 
  The value could be interpreted as a boolean, symbol table 
  index, address, string table index etc.


Figure 3: 
Symbol Meta-Information Types
+--------+-----------------+--------------------+
| Value  |      Type       |  Format of Value   |
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
|   0    |    SMT_NONE     |        None        |
+--------+-----------------+--------------------+
|   1    |   SMT_RETAIN    |      Boolean       |
+--------+-----------------+--------------------+
|   2    |  SMT_LOCATION   |      Address       |
+--------+-----------------+--------------------+
|   3    |   SMT_NOINIT    |      Boolean       |
+--------+-----------------+--------------------+
|   4    | SMT_PRINTF_FMT  |      Integer       |
+--------+-----------------+--------------------+
| 0xC0   |   SMT_LOPROC    |                    |
+--------+-----------------+ Processor-specific |
| 0xDF   |   SMT_HIPROC    |                    |
+--------+-----------------+--------------------+
| 0xE0   |   SMT_LOUSER    |                    |
+--------+-----------------+ Vendor-specific    |
| 0xFF   |   SMT_HIUSER    |                    |
+--------+-----------------+--------------------+


SMT_NONE
  This indicates an invalid or incomplete entry.

SMT_RETAIN
  A value of 1 indicates the associated symbol should 
  be retained in the output executable file, even it appears 
  unused and so the linker would normally garbage collect it. 
  Other values result in the type being ignored.

SMT_LOCATION
  The VMA of the associated symbol in the output 
  executable file should be set to the specified the value.

SMT_NOINIT
  A value of 1 indicates the associated data symbol 
  should not be initialized by the runtime support code at 
  program startup. Other values result in the type being ignored.

SMT_PRINTF_FMT
  The value indicates a byte offset into the 
  .strtab_meta section. The section header table index of 
  .strtab_meta is extracted from the sh_info value of 
  .symtab_meta, using the ELFxx_SMH_STR accessor.
  The null-terminated string extracted from the string table is a 
  de-duplicated list of format specifiers used by calls to 
  printf-like functions, in the function whose symbol is pointed 
  to by this entry.
  For example, the following C code:
    printf (“%d / %d = %f\n”, ...);
  would generate the following string in .strtab_meta:
    “%d%f”.

SMT_LOPROC..SMT_HIPROC
  Values in this range are reserved for 
  processor-specific semantics.

SMT_LOUSER..SMT_HIUSER
  Values in this range are reserved for 
  vendor-specific semantics.

----
3.3 Symbol Meta-Information Values
----

3.3.1 Restrictions on applying symbol meta-information types to 
  symbols

Symbol meta-information entries are always tied to a symbol in 
the symbol table, so there are no special rules regarding 
different symbols with the same name; the standard symbol binding 
rules apply.

No two entries in .symtab_meta can have the same smi_info value - 
each symbol must only have one value for a given meta-information 
type.

Figure 4: 
Symbol bindings and types permitted for metasyms
+-------------------------------+---------------------------+--------------------------------------+
| Symbol Meta-Information Type  | Permitted Symbol Binding  |        Permitted Symbol Type         |
+-------------------------------+---------------------------+--------------------------------------+
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_RETAIN           |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|         SMT_LOCATION          |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_NOINIT           |      Any < STB_LOOS       |       STT_OBJECT                     |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|        SMT_PRINTF_FMT         |      Any < STB_LOOS       |       STT_FUNC                       |
+-------------------------------+---------------------------+--------------------------------------+

3.3.2 SMT_NOINIT use case

When a piece of data is not initialized to a constant value, but 
does not need to be zero-initialized, SMT_NOINIT indicates that 
it can be skipped by runtime startup code that would normally 
initialize it, to save time when starting the program.

Alternatively, when a piece of data is initialized to a constant 
value when the program is loaded, but should not be 
re-initialized when the processor resets, SMT_NOINIT can also be 
applied.

3.3.3 SMT_PRINTF_FMT use case

When the size of an application is a concern to the programmer, 
limiting the format specifiers supported by printf-like functions 
can reduce the code and data usage of these functions in the 
application.

By storing the required format specifiers in the symbol 
meta-information table, the linker can examine each of the 
SMT_PRINTF_FMT entries for functions that will be used in the 
final linked executable, and link in the minimal implementation 
of the printf function required to support all the format 
specifiers used by the application.

3.3.4 Considerations for placement of SMT_LOCATION 
  meta-information symbols (locsyms)

Locsyms are intended to augment a well-defined linker script. The 
linker validates the address provided for the locsym by examining 
the permissions of the segment (p_flags) which contains the 
specified VMA. For example, the linker must ensure that a locsym 
for a read/write symbol with type STT_OBJECT is not placed in a 
segment without write (PF_W) permissions, and emit an error if 
the segment containing the address is invalid.

The linker may need to place the input section of a locsym within 
an output section, within which it would not normally be placed. 
For example, consider an application with a large .text output 
section, which spans most of ROM. If a locsym corresponding to a 
piece of read-only data has an address within range of that .text 
section, and there is no way to offset the .text section within 
ROM such that the read-only data can be placed directly at the 
location, that read-only data can be placed amongst the .text 
input sections at the requested address. As long as the output 
section flags are not changed by adding the new input section, 
there should not be any problems mixing sections in this way.

3.3.5 Initialization of locsyms at program startup

Data which requires initialization at program startup (e.g. 
copying data from their LMA to VMA) has long been handled by the 
associated runtime library. When all data requiring 
initialization is within a range of addresses defined by known 
__*_start and __*_end symbols, only a fixed number of 
target-dependent initialization functions need to be run. 
However, when code and data can reside alone at disparate 
locations in memory, there must be a mechanism to initialize each 
of these as required. The procedure for initializing this data is 
not enforced by this ABI. It is expected that an entry in 
.init_array is created for a function which will run through 
entries in a table describing how to copy data or initialize 
variables as required.

Note that this functionality can be leveraged to easily allow 
functions to be executed from a memory region without persistent 
storage e.g. RAM. When the linker sees that the segment 
containing the VMA of the function has a different LMA and VMA, a 
copy table entry is created, and the runtime startup code will 
copy the contents of this function from the LMA to VMA, in the 
same way it would with a piece of data.

4 Using Symbol Meta-Information

4.1 Usage example

The programmer does not need to be aware of the symbol 
meta-information mechanism itself to be able to make use of the 
different types and apply special handling to symbols. An 
attribute set in the source code will cause the compiler to emit 
an assembler directive describing the meta-information, the 
assembler then creates the .symtab_meta section, which the linker 
absorbs, performs any required actions, and then outputs a new 
.symtab_meta section with all accumulated metasyms from input 
object files.

Figure 5: 
Example
Compiler source code:
[
uint32_t __attribute__((retain,location(0x1000)))
  core0_key = 0x1234;
]

Compiled assembly code:
[
        .global core0_key
        .type   core0_key, @object
        .sym_meta_info  core0_key, SMT_RETAIN, 1
        .sym_meta_info  core0_key, SMT_LOCATION, 0x1000
]

.symtab_meta dump from assembled object file:
[
SYMBOL META-INFORMATION TABLE:
Idx     Kind            Value       Sym idx Name
0:      SMT_RETAIN      0x1         7       core0_key
1:      SMT_LOCATION    0x1000      7       core0_key
]


5 Conclusion

5.1 Symbol meta-information benefits

Ease of use
  The application of an attribute to a symbol 
  declaration in the source code is now enough to achieve what 
  previously required both source code and linker script 
  modifications. For programmers without strong knowledge of 
  linker script functionality, there is an even clearer benefit 
  as functionality which may have previously seemed overwhelming 
  to implement is now possible without leaving the source code. 
  Many toolchains supporting ELF are very powerful, and in the 
  hands of an experienced user, behavior supported by symbol 
  meta-information can already be achieved. In this case, symbol 
  meta-information will at least reduce the number of steps the 
  programmer must take to implement the desired behavior.

Record of operations
  In relocatable files, the symbol 
  meta-information table serves as a list of transformations to 
  be made later in the build process. In executable files, the 
  table shows which transformations have been made. With the 
  assistance of a dump program which has understanding of the 
  format of .symtab_meta, a formatted dump of the table makes it 
  clear which symbols have supplemental information.
  When linker script modifications are used to alter the handling 
  of certain symbols, that file has to be studied by the 
  programmer, possibly in conjunction with the source code, to 
  understand what special handling is going to be applied. The 
  standard boilerplate linker script code required for regular 
  operation is likely to further obscure which symbols have 
  supplemental information.

Clear, defined purpose
  Each symbol meta-information type has a 
  specific purpose. When putting symbols into sections with the 
  aim of having them later be treated in some special way by the 
  linker script, it may not always be clear what is trying to be 
  achieved without examining the relationship between the section 
  and symbol at different stages of the build process.

No limitations
  A type of symbol meta-information can be 
  implemented such that its value describes an offset into the 
  string table, or the section number of a section containing 
  additional information. Therefore, since the true value is not 
  limited to the size of the value in the symbol meta-information 
  table itself, there are many possibilities for what can be 
  accomplished using the meta-information.

5.2 Symbol meta-information as an extension to the ELF gABI

As for why this functionality should be added to the generic ABI, 
and not a processor-specific or vendor-specific ABI, we see this 
functionality helping other targets and vendors solve problems 
previously requiring non-standard and inventive solutions.

Initial versions of this functionality are already implemented 
for the MSP430 target within the MSP430-GCC fork, and for TI ARM 
targets in Texas Instruments’ Clang/LLVM fork. By making this 
available in the gABI and introducing the changes to the upstream 
mainline branches, other targets and vendors can leverage the 
generic functionality immediately. The overall meta-information 
mechanism can then be extended in generic, processor-specific, or 
vendor-specific ways, as required, to further improve the 
toolchain's feature-set.

================================================================
================================================================


ELF Symbol Meta-Information Implementation Details

August 2020

This document describes the precise changes to be made to the ELF 
gABI to implement Symbol Meta-Information.



4 Object Files

====
Sections
====


-------------------------------------------


Table 1: 
Section types, sh_type
+------------------+-------+
|      Name        | Value |
+------------------+-------+
+------------------+-------+
| SHT_SYMTAB_META  |  19   |
+------------------+-------+

-------------------------------------------

SHT_SYMTAB_META
  This section contains the symbol 
  meta-information entries for the file. The section might begin 
  with a header, which contains some supplemental information.


Figure 1: 
.symtab_meta Header
(
typedef struct {
  unsigned char symtab_hash[20];
} Elf32_SMhdr;

typedef struct {
  unsigned char symtab_hash[20];
} Elf64_SMhdr;
)


symtab_hash
  For .symtab_meta format version >= 2, a 20-byte 
  SHA-1 hash of the entire contents of .symtab.

-------------------------------------------

Table 2: 
sh_link and sh_info interpretation
+------------------+-----------------------------------------------------------+----------------------------------+
|      Name        |    sh_link              |                         sh_info                                    |
+------------------+-----------------------------------------------------------+----------------------------------+
+------------------+-----------------------------------------------------------+----------------------------------+
| SHT_SYMTAB_META  | The section header      | The format version number of the symbol meta-information table     |
|                  | index of the associated | (ELFxx_SMH_VER), and the section header index of the .strtab_meta  |
|                  | symbol table.           | string table used by entries in this section (ELFxx_SMH_STR).      |
+------------------+-----------------------------------------------------------+----------------------------------+


Sub-Table a: 
Accessors for the sh_info field
---
#define ELF32_SMH_STR(i)    ((i)>>8)
#define ELF32_SMH_VER(i)    ((unsigned char)(i))
#define ELF32_SMH_INFO(s,v) (((s)<<8)+(unsigned char)(v))

#define ELF64_SMH_STR(i)    ((i)>>32)
#define ELF64_SMH_VER(i)    ((i)&0xffffffffL)
#define ELF64_SMH_INFO(s,v) (((s)<<32)+((v)&0xffffffffL))


Sub-Table b: 
.symtab_meta versions
+--------+-------------------------------------------------------------------------------+
| Value  |                                    Meaning                                    |
+--------+-------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------+
|   0    |                                Invalid Version                                |
+--------+-------------------------------------------------------------------------------+
|   1    |             There is no header at the beginning of .symtab_meta.              |
+--------+-------------------------------------------------------------------------------+
|   2    | A header containing the hash of .symtab is at the beginning of                |
|        | .symtab_meta.                                                                 |
+--------+-------------------------------------------------------------------------------+

-------------------------------------------


====
Special Sections
====

-------------------------------------------


Table 3: 
Special Sections
+---------------+------------------+------------+
|     Name      |      Type        | Attributes |
+---------------+------------------+------------+
+---------------+------------------+------------+
| .symtab_meta  | SHT_SYMTAB_META  |    None    |
+---------------+------------------+------------+
| .strtab_meta  |   SHT_STRTAB     |    None    |
+---------------+------------------+------------+

-------------------------------------------


.symtab_meta
  This section holds additional “meta-information” 
  about symbols in .symtab. The different types of 
  meta-information are described in “Symbol Meta-Information”.

.strtab_meta
  If required, this section holds strings used as a 
  value to certain types of symbol meta-information. It can be 
  omitted if no symbol meta-information types require it.


-------------

  Symbol Meta-Information

[ Note: This is a new subsection, intended to be placed at the end 
of the “Symbol Table” section, after the “Symbol Values” 
subsection. ]

ELF relocatable and executable files may contain a new section 
named .symtab_meta. This section describes additional information 
about symbols in .symtab. The section can be omitted from ELF 
files if there is no meta-information for any symbols, but if 
present, there can only be one section with this name and type.

  Symbol Meta-Information Table Entries

Following the initial header of .symtab_meta, there is an array 
of symbol meta-information entries.


-------------------------------------------

(
typedef struct {
  Elf32_Addr smi_info;
  Elf32_Word smi_value;
} Elf32_SymMetaInfo;

typedef struct {
  Elf64_Addr  smi_info;
  Elf64_Xword smi_value;
} Elf64_SymMetaInfo;
)

-------------------------------------------

smi_info
  This field describes both the symbol table index of 
  the ELF symbol this symbol meta-information this applies to, 
  and the type of meta-information entry this is. A number of 
  generic types are pre-defined. There are also reserved ranges 
  for processor-specific and application-specific (i.e. 
  vendor-specific) types.

-------------------------------------------

(
#define ELF32_SMI_SYM(i)    ((i)>>8)
#define ELF32_SMI_TYPE(i)   ((unsigned char)(i))
#define ELF32_SMI_INFO(s,t) (((s)<<8)+(unsigned char)(t))

#define ELF64_SMI_SYM(i)    ((i)>>32)
#define ELF64_SMI_TYPE(i)   ((i)&0xffffffffL)
#define ELF64_SMI_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))
)

-------------------------------------------

smi_value
  The interpretation depends on the associated type. 
  The value could be interpreted as a boolean, symbol table 
  index, address, string table index etc.

-------------------------------------------

Figure 5: 
Symbol Meta-Information Types
+--------+-----------------+--------------------+
| Value  |      Type       |  Format of Value   |
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
|   0    |    SMT_NONE     |        None        |
+--------+-----------------+--------------------+
|   1    |   SMT_RETAIN    |      Boolean       |
+--------+-----------------+--------------------+
|   2    |  SMT_LOCATION   |      Address       |
+--------+-----------------+--------------------+
|   3    |   SMT_NOINIT    |      Boolean       |
+--------+-----------------+--------------------+
|   4    | SMT_PRINTF_FMT  |      Integer       |
+--------+-----------------+--------------------+
| 0xC0   |   SMT_LOPROC    |                    |
+--------+-----------------+ Processor-specific |
| 0xDF   |   SMT_HIPROC    |                    |
+--------+-----------------+--------------------+
| 0xE0   |   SMT_LOUSER    |                    |
+--------+-----------------+ Vendor-specific    |
| 0xFF   |   SMT_HIUSER    |                    |
+--------+-----------------+--------------------+

-------------------------------------------

SMT_NONE
  This indicates an invalid or incomplete entry.

SMT_RETAIN
  A value of 1 indicates the associated symbol should 
  be retained in the output executable file, even it appears 
  unused and so the linker would normally garbage collect it. 
  Other values result in the type being ignored.

SMT_LOCATION
  The VMA of the associated symbol in the output 
  executable file should be set to the specified the value.

SMT_NOINIT
  A value of 1 indicates the associated data symbol 
  should not be initialized by the runtime support code at 
  program startup. Other values result in the type being ignored.

SMT_PRINTF_FMT
  The value indicates a byte offset into the 
  .strtab_meta section. The section header table index of 
  .strtab_meta is extracted from the sh_info value of 
  .symtab_meta, using the ELFxx_SMH_STR accessor.
  The null-terminated string extracted from the string table is a 
  de-duplicated list of format specifiers used by calls to 
  printf-like functions, in the function whose symbol is pointed 
  to by this entry.
  For example, the following C code:
    printf (“%d / %d = %f\n”, ...);
  would generate the following string in .strtab_meta:
    “%d%f”.

SMT_LOPROC..SMT_HIPROC
  Values in this range are reserved for 
  processor-specific semantics.

SMT_LOUSER..SMT_HIUSER
  Values in this range are reserved for 
  vendor-specific semantics.

====
Restrictions on applying symbol meta-information types to 
  symbols
====

Symbol meta-information entries are always tied to a symbol in 
the symbol table, so there are no special rules regarding 
different symbols with the same name; the standard symbol binding 
rules apply.

No two entries in .symtab_meta can have the same smi_info value - 
each symbol must only have one value for a given meta-information 
type.

Figure 6: 
Symbol bindings and types permitted for metasyms
+-------------------------------+---------------------------+--------------------------------------+
| Symbol Meta-Information Type  | Permitted Symbol Binding  |        Permitted Symbol Type         |
+-------------------------------+---------------------------+--------------------------------------+
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_RETAIN           |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|         SMT_LOCATION          |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_NOINIT           |      Any < STB_LOOS       |       STT_OBJECT                     |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|        SMT_PRINTF_FMT         |      Any < STB_LOOS       |       STT_FUNC                       |
+-------------------------------+---------------------------+--------------------------------------+

[-- Attachment #2: elf-symbol-meta-information-implementation.pdf --]
[-- Type: application/pdf, Size: 212464 bytes --]

[-- Attachment #3: elf-symbol-meta-information-proposal.pdf --]
[-- Type: application/pdf, Size: 229039 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-08-31 11:58 [RFC] Proposal for new ELF extension - "Symbol meta-information" Jozef Lawrynowicz
@ 2020-08-31 12:23 ` Florian Weimer
  2020-08-31 13:14   ` Jozef Lawrynowicz
  2020-08-31 13:45   ` James Y Knight
  0 siblings, 2 replies; 9+ messages in thread
From: Florian Weimer @ 2020-08-31 12:23 UTC (permalink / raw)
  To: Jozef Lawrynowicz; +Cc: gnu-gabi

* Jozef Lawrynowicz:

> I wonder if it is appropriate for inclusion in the GNU gABI or if we
> should just add it to a processor-specific ABI. However, it is being
> used downstream for MSP430 and ARM targets, in GCC and Clang/LLVM
> respectively. So we'd like to have it standardized somewhere generic
> so that different targets and toolchains can align on it.

Is there an expectation to upstream these changes?  In the present state
there does not seem to be need for such coordination.

>     3.3.3 SMT_PRINTF_FMT use case

Can this achieved in C++ with a library-only solution?  So that

  printf ("%s", str);

and

  printf ("%f", num);

resolve to different printf symbols externally?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-08-31 12:23 ` Florian Weimer
@ 2020-08-31 13:14   ` Jozef Lawrynowicz
  2020-08-31 13:45   ` James Y Knight
  1 sibling, 0 replies; 9+ messages in thread
From: Jozef Lawrynowicz @ 2020-08-31 13:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: gnu-gabi

On Mon, Aug 31, 2020 at 02:23:57PM +0200, Florian Weimer wrote:
> * Jozef Lawrynowicz:
> 
> > I wonder if it is appropriate for inclusion in the GNU gABI or if we
> > should just add it to a processor-specific ABI. However, it is being
> > used downstream for MSP430 and ARM targets, in GCC and Clang/LLVM
> > respectively. So we'd like to have it standardized somewhere generic
> > so that different targets and toolchains can align on it.
> 
> Is there an expectation to upstream these changes?  In the present state
> there does not seem to be need for such coordination.

Yes, one way or another I will upstream the functionality to the GNU
toolchain, as an MSP430-specific feature if that is what ends up being
the only approved route.

TI are also looking to upstream their ARM Clang/LLVM implementation.

I posted an RFC for an initial version of the functionality to Binutils
back in February, but H.J. Lu pointed out that discussions about ELF
extensions should be had on the ELF gABI mailing list.
https://sourceware.org/pipermail/binutils/2020-February/110236.html

So I've been looking for where best to place this functionality before
finalizing the spec and implementation.

The functionality has been improved and extended since then, I would say
the initial version is generally ready for upstreaming after a bit of
additional tidying.

> 
> >     3.3.3 SMT_PRINTF_FMT use case
> 
> Can this achieved in C++ with a library-only solution?  So that
> 
>   printf ("%s", str);
> 
> and
> 
>   printf ("%f", num);
> 
> resolve to different printf symbols externally?

When code size is a concern, we'd always avoid using C++ for a library.

The user would probably rather just manually choose the correct printf
function by defining a symbol at link time than take on the C++ burden
of having it done automatically.

Below I've included some additional clarifications on SMT_PRINTF_FMT that
came up from the discussions on the ELF gABI mailing list.

Thanks,
Jozef

> 
> Thanks,
> Florian
> 

---
On Thu, Aug 27, 2020 at 08:43:56PM +0000, Joseph Myers wrote:
> I think this is rather under-specified.  A format conversion specification
> has six parts in ISO C: the initial '%', any flags, any field width, any
> precision, any length modifier, the conversion specifier character.  
> POSIX extends this by allowing the initial '%' to be of the form '%n$'
> instead to specify the position of the argument converted, and also allows
> '*m$' as a form of width and precision, similarly.
>
> Do you intend the .strtab_meta entry to contain all those parts of the
> conversion specification, or only some of them?

The intention would be to omit any digits used by the width and
precision specifiers, and the '.' used by the precision specifier,
but otherwise store all parts of the format string, de-duplicating
parts of it as necessary.

The exact behavior of how to encode the format string certainly does need
clearer specification. '%' doesn't actually need to be stored in the condensed
string, so the behavior should instead be like the below examples:

printf ("%-*.*hhd %#hx", ...);
  yields the following NUL terminated string
-*hhd#hx

printf ("%+ld % 8.8lld %-6.6lld", ...);
  yields the following NUL terminated string
+ld lld-

printf ("%*2$.*3$lld %4$*5$.*6$ld", ...);
  yields the following NUL terminated string
*$lldld

It's assumed that the number of the positional argument used with "%n$"
or "*m$" isn't important, but the presence of '$' is required as it
indicates an additional feature which might need to be supported by
printf.

There also needs to be a clarification on how to group the parts of the
format string when performing the de-duplication. For example,
"%ld %lld %lf" should *not* be condensed into "ldllf", as the linker may
want to know which length modifier is used by a given conversion format
specifier. Instead it should be condensed to "ldlldlf".

  The length modifier and format specifier are considered an atomic part
  of the format string for the purposes of de-duplication.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-08-31 12:23 ` Florian Weimer
  2020-08-31 13:14   ` Jozef Lawrynowicz
@ 2020-08-31 13:45   ` James Y Knight
  2020-09-01 11:20     ` Florian Weimer
  1 sibling, 1 reply; 9+ messages in thread
From: James Y Knight @ 2020-08-31 13:45 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jozef Lawrynowicz, gnu-gabi

On Mon, Aug 31, 2020 at 8:24 AM Florian Weimer via Gnu-gabi <
gnu-gabi@sourceware.org> wrote:

> >     3.3.3 SMT_PRINTF_FMT use case
>
> Can this achieved in C++ with a library-only solution?  So that
>   printf ("%s", str);
> and
>   printf ("%f", num);
> resolve to different printf symbols externally?
>

The LLVM backend optimizer already does this automatically for XCore, TCE,
and Emscripten targets, without interrogating the format string, or adding
anything to the object format.

On all three: if there are no floating-point arguments to the call, it will
translate {s,f,}printf -> i{s,f,}printf. Otherwise, on emscripten only, if
there are no 128-bit float arguments, it will translate {s,f,}printf ->
small_{s,f,}printf

MSVC (and therefore also LLVM targeting windows) uses a slightly different
scheme: the compiler emits a reference to a global "_fltused" whenever
there's any floating-point instructions in the program (related to printf
or not). Then, the undefined reference to that symbol pulls in the
floating-point support for printf/scanf in the MS libc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-08-31 13:45   ` James Y Knight
@ 2020-09-01 11:20     ` Florian Weimer
  2020-09-01 12:19       ` Jozef Lawrynowicz
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2020-09-01 11:20 UTC (permalink / raw)
  To: James Y Knight via Gnu-gabi

* James Y. Knight via Gnu-gabi:

> On Mon, Aug 31, 2020 at 8:24 AM Florian Weimer via Gnu-gabi <
> gnu-gabi@sourceware.org> wrote:
>
>> >     3.3.3 SMT_PRINTF_FMT use case
>>
>> Can this achieved in C++ with a library-only solution?  So that
>>   printf ("%s", str);
>> and
>>   printf ("%f", num);
>> resolve to different printf symbols externally?
>>
>
> The LLVM backend optimizer already does this automatically for XCore, TCE,
> and Emscripten targets, without interrogating the format string, or adding
> anything to the object format.
>
> On all three: if there are no floating-point arguments to the call, it will
> translate {s,f,}printf -> i{s,f,}printf. Otherwise, on emscripten only, if
> there are no 128-bit float arguments, it will translate {s,f,}printf ->
> small_{s,f,}printf

It's not what I had in mind with my C++ comment (I thought about using a
constexpr construct to parse the format strings), but it's simpler to
just look at the types.

I think we could guide this by some attribute machinery for C,
especially if it is completely type-dependent.  If the symbol choice is
determined by that, it is not necessary to maintain the symbol selection
in very different places (the library implementation *and* the linker).
This is main thing I do not like about SMT_PRINTF_FMT: it needs very
library-specific code in the link editor.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-09-01 11:20     ` Florian Weimer
@ 2020-09-01 12:19       ` Jozef Lawrynowicz
  2020-09-01 12:48         ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Jozef Lawrynowicz @ 2020-09-01 12:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: James Y Knight via Gnu-gabi

On Tue, Sep 01, 2020 at 01:20:16PM +0200, Florian Weimer via Gnu-gabi wrote:
> * James Y. Knight via Gnu-gabi:
> 
> > On Mon, Aug 31, 2020 at 8:24 AM Florian Weimer via Gnu-gabi <
> > gnu-gabi@sourceware.org> wrote:
> >
> >> >     3.3.3 SMT_PRINTF_FMT use case
> >>
> >> Can this achieved in C++ with a library-only solution?  So that
> >>   printf ("%s", str);
> >> and
> >>   printf ("%f", num);
> >> resolve to different printf symbols externally?
> >>
> >
> > The LLVM backend optimizer already does this automatically for XCore, TCE,
> > and Emscripten targets, without interrogating the format string, or adding
> > anything to the object format.
> >
> > On all three: if there are no floating-point arguments to the call, it will
> > translate {s,f,}printf -> i{s,f,}printf. Otherwise, on emscripten only, if
> > there are no 128-bit float arguments, it will translate {s,f,}printf ->
> > small_{s,f,}printf
> 
> It's not what I had in mind with my C++ comment (I thought about using a
> constexpr construct to parse the format strings), but it's simpler to
> just look at the types.
> 
> I think we could guide this by some attribute machinery for C,
> especially if it is completely type-dependent.  If the symbol choice is
> determined by that, it is not necessary to maintain the symbol selection
> in very different places (the library implementation *and* the linker).
> This is main thing I do not like about SMT_PRINTF_FMT: it needs very
> library-specific code in the link editor.

I am not against removing SMT_PRINTF_FMT as a generic metainfo type,
since it is not necessarily the most straightforward way to achieve the desired
behavior.
TI are using it in conjunction with another vendor-specific metainfo type which
tries to work out the format-specifiers used by a printf() call which has
a non-constant string used for the format argument.

So perhaps SMT_PRINTF_FMT is only really required when used with that
other type which requires visibility of the full program, not individual
compilation units.

I can imagine how the behavior could be implemented without any special
handling from the linker, if the compiler instead maps the printf calls to the
minimum required printf implementation, and the library has something
like this:

  printf_double (const char *fmt, ...)
  {
    return printf_generic (...);
  }

  printf_generic (const char *fmt, ...)
  {
    if (printf_double != 0)      /* Check for compiler emitted symbol.  */
      return printf_double_1 (...)  /* Call real worker function.  */
    else if (printf_float != 0)
      return printf_float_1 (...)
    else if (printf_int != 0)
      return printf_int_1 (...)
      ....
  }

Do you have any opinions on the inclusion of the symbol meta-information
mechanism itself within the GNU gABI?

Thanks,
Jozef

> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-09-01 12:19       ` Jozef Lawrynowicz
@ 2020-09-01 12:48         ` Florian Weimer
  2020-09-02 10:26           ` Jozef Lawrynowicz
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2020-09-01 12:48 UTC (permalink / raw)
  To: Jozef Lawrynowicz; +Cc: James Y Knight via Gnu-gabi

* Jozef Lawrynowicz:

> I can imagine how the behavior could be implemented without any special
> handling from the linker, if the compiler instead maps the printf calls to the
> minimum required printf implementation, and the library has something
> like this:

Yes, exactly my thought.  It's definitely less action at a distance.

> Do you have any opinions on the inclusion of the symbol meta-information
> mechanism itself within the GNU gABI?

In the past, we just added a parallel table to the symbol table when we
needed to extend it.  I think SHT_GNU_versym is the most widely used
example.  This has the advantage that it is so much simpler.

The main risk is processing objects with tools that don't know about the
symbol table relationship of this parallel tables.  To deal with that,
having a special section that lists the section types that absolutely
need to be understood by tools in order to process the object in various
ways (a distinction between reading, linking, and outputting might make
sense) could really be helpful.  You would get a precise error, rather
than a corrupt output file.

I'm not sure if it is possible to have meaningful symbol metadata that
can be processed by old tools in some meaningful way, tools that have no
prior knowledge of it.  It's very hard to design these things,
especially when we have only three established use cases.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-09-01 12:48         ` Florian Weimer
@ 2020-09-02 10:26           ` Jozef Lawrynowicz
  2020-09-03 16:49             ` Jozef Lawrynowicz
  0 siblings, 1 reply; 9+ messages in thread
From: Jozef Lawrynowicz @ 2020-09-02 10:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: James Y Knight via Gnu-gabi

On Tue, Sep 01, 2020 at 02:48:04PM +0200, Florian Weimer wrote:
> * Jozef Lawrynowicz:
> 
> > I can imagine how the behavior could be implemented without any special
> > handling from the linker, if the compiler instead maps the printf calls to the
> > minimum required printf implementation, and the library has something
> > like this:
> 
> Yes, exactly my thought.  It's definitely less action at a distance.
> 
> > Do you have any opinions on the inclusion of the symbol meta-information
> > mechanism itself within the GNU gABI?
> 
> In the past, we just added a parallel table to the symbol table when we
> needed to extend it.  I think SHT_GNU_versym is the most widely used
> example.  This has the advantage that it is so much simpler.

It seems that the benefits of having a parallel symbol table outweigh
any concerns about wasted space and the large amount of symbol metainfo
entries which would not have any content.
Since entry size is fixed, if you stored the header information in the
initial NULL entry then there is the additional benefit that you could
theoretically keep .symtab_meta in sync with .symtab by
adding/removing symbols at a given index as required.

> 
> The main risk is processing objects with tools that don't know about the
> symbol table relationship of this parallel tables.  To deal with that,
> having a special section that lists the section types that absolutely
> need to be understood by tools in order to process the object in various
> ways (a distinction between reading, linking, and outputting might make
> sense) could really be helpful.  You would get a precise error, rather
> than a corrupt output file.
> 
> I'm not sure if it is possible to have meaningful symbol metadata that
> can be processed by old tools in some meaningful way, tools that have no
> prior knowledge of it.  It's very hard to design these things,
> especially when we have only three established use cases.

Some standardization of parallel symbol tables would certainly be
forward thinking and enable future extensions which use parallel symbol
tables to be kept in sync with .symtab, even if those older tools don't
understand the extension itself.

If we also consider SMT_NOINIT removed as a generic metainfo type, based
on the feedback received from the ELF gABI discussions, then that leaves
the SMT_RETAIN and SMT_LOCATION metainfo types.

These types in particular generally operate on the section level,
at least, the behavior the linker will eventually
apply will be to the containing section, not the symbol itself.

So they could be implemented using new section flags (sh_flags).
There is plenty of space for additional sh_flags (unlike symbol
flags).

- SMT_RETAIN
  The linker will only garbage collect sections, not individual
  symbols.
  A new section flag indicates that the section should be retained
  by the linker, even if it would be garbage collected. 
  Whatever section contains the symbol that the "retain"
  attribute is applied to will have the "retain" sh_flags bit set.

- SMT_LOCATION
  The linker can only place sections at a specific address,
  not individual symbols.
  A new section flag indicates that the section should be placed at a
  specific VMA in the executable file by the linker.
  The address could be set in the sh_addr field, or encoded in the
  section name.

Since the overall aim is to support the "retain" and "location" C/C++
attributes, perhaps this is the most appropriate way forward...

Thanks,
Jozef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Proposal for new ELF extension - "Symbol meta-information"
  2020-09-02 10:26           ` Jozef Lawrynowicz
@ 2020-09-03 16:49             ` Jozef Lawrynowicz
  0 siblings, 0 replies; 9+ messages in thread
From: Jozef Lawrynowicz @ 2020-09-03 16:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: gnu-gabi

On Wed, Sep 02, 2020 at 11:26:23AM +0100, Jozef Lawrynowicz wrote:
> On Tue, Sep 01, 2020 at 02:48:04PM +0200, Florian Weimer wrote:
> > * Jozef Lawrynowicz:
> > 
> > > I can imagine how the behavior could be implemented without any special
> > > handling from the linker, if the compiler instead maps the printf calls to the
> > > minimum required printf implementation, and the library has something
> > > like this:
> > 
> > Yes, exactly my thought.  It's definitely less action at a distance.
> > 
> > > Do you have any opinions on the inclusion of the symbol meta-information
> > > mechanism itself within the GNU gABI?
> > 
> > In the past, we just added a parallel table to the symbol table when we
> > needed to extend it.  I think SHT_GNU_versym is the most widely used
> > example.  This has the advantage that it is so much simpler.
> 
> It seems that the benefits of having a parallel symbol table outweigh
> any concerns about wasted space and the large amount of symbol metainfo
> entries which would not have any content.
> Since entry size is fixed, if you stored the header information in the
> initial NULL entry then there is the additional benefit that you could
> theoretically keep .symtab_meta in sync with .symtab by
> adding/removing symbols at a given index as required.

Do you think the symbol meta-information functionality implemented as a
parallel symbol table, and supporting the SMT_RETAIN and SMT_LOCATION
types, would be accepted as a GNU gABI extension?

Or should we pursue the other previously discussed approach of getting
"retain" and "location" attribute support into the GNU toolchain?

I think there are advantages and disadvantages to both methods, but the
impression I get is that the approach which requires the least drastic
changes to the ABI is preferable, which leads me to believe we should
look to implement the types using new ELF section flags instead.

Thanks,
Jozef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-03 16:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31 11:58 [RFC] Proposal for new ELF extension - "Symbol meta-information" Jozef Lawrynowicz
2020-08-31 12:23 ` Florian Weimer
2020-08-31 13:14   ` Jozef Lawrynowicz
2020-08-31 13:45   ` James Y Knight
2020-09-01 11:20     ` Florian Weimer
2020-09-01 12:19       ` Jozef Lawrynowicz
2020-09-01 12:48         ` Florian Weimer
2020-09-02 10:26           ` Jozef Lawrynowicz
2020-09-03 16:49             ` Jozef Lawrynowicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).