public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Fangrui Song <maskray@gcc.gnu.org>
To: Jan Beulich <jbeulich@suse.com>
Cc: Cary Coutant <ccoutant@gmail.com>,
	binutils@sourceware.org, gcc@gcc.gnu.org
Subject: Re: CREL relocation format for ELF
Date: Fri, 29 Mar 2024 00:24:01 -0700	[thread overview]
Message-ID: <CAN30aBF05Dsqc8PJ8=rr9xU9iWt=eNu9oRkVirxfSQLEppSYtA@mail.gmail.com> (raw)
In-Reply-To: <dcd9daf2-f2fb-4eca-94c8-8c878683b986@suse.com>

On Thu, Mar 28, 2024 at 2:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.03.2024 08:43, Fangrui Song wrote:
> > On Fri, Mar 22, 2024 at 6:51 PM Fangrui Song <maskray@gcc.gnu.org> wrote:
> >>
> >> On Thu, Mar 14, 2024 at 5:16 PM Fangrui Song <maskray@gcc.gnu.org> wrote:
> >>>
> >>> The relocation formats REL and RELA for ELF are inefficient. In a
> >>> release build of Clang for x86-64, .rela.* sections consume a
> >>> significant portion (approximately 20.9%) of the file size.
> >>>
> >>> I propose RELLEB, a new format offering significant file size
> >>> reductions: 17.2% (x86-64), 16.5% (aarch64), and even 32.4% (riscv64)!
> >>>
> >>> Your thoughts on RELLEB are welcome!
> >>>
> >>> Detailed analysis:
> >>> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf
> >>> generic ABI (ELF specification):
> >>> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw
> >>> binutils feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=31475
> >>> LLVM: https://discourse.llvm.org/t/rfc-relleb-a-compact-relocation-format-for-elf/77600
> >>>
> >>> Implementation primarily involves binutils changes. Any volunteers?
> >>> For GCC, a driver option like -mrelleb in my Clang prototype would be
> >>> needed. The option instructs the assembler to use RELLEB.
> >>
> >> The format was tentatively named RELLEB. As I refine the original pure
> >> LEB-based format, “RELLEB” might not be the most fitting name.
> >>
> >> I have switched to SHT_CREL/DT_CREL/.crel and updated
> >> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf
> >> and
> >> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/eiBcYxSfAQAJ
> >>
> >> The new format is simpler and better than RELLEB even in the absence
> >> of the shifted offset technique.
> >>
> >> Dynamic relocations using CREL are even smaller than Android's packed
> >> relocations.
> >>
> >> // encodeULEB128(uint64_t, raw_ostream &os);
> >> // encodeSLEB128(int64_t, raw_ostream &os);
> >>
> >> Elf_Addr offsetMask = 8, offset = 0, addend = 0;
> >> uint32_t symidx = 0, type = 0;
> >> for (const Reloc &rel : relocs)
> >>   offsetMask |= crels[i].r_offset;
> >> int shift = std::countr_zero(offsetMask)
> >> encodeULEB128(relocs.size() * 4 + shift, os);
> >> for (const Reloc &rel : relocs) {
> >>   Elf_Addr deltaOffset = (rel.r_offset - offset) >> shift;
> >>   uint8_t b = deltaOffset * 8 + (symidx != rel.r_symidx) +
> >>               (type != rel.r_type ? 2 : 0) + (addend != rel.r_addend ? 4 : 0);
> >>   if (deltaOffset < 0x10) {
> >>     os << char(b);
> >>   } else {
> >>     os << char(b | 0x80);
> >>     encodeULEB128(deltaOffset >> 4, os);
> >>   }
> >>   if (b & 1) {
> >>     encodeSLEB128(static_cast<int32_t>(rel.r_symidx - symidx), os);
> >>     symidx = rel.r_symidx;
> >>   }
> >>   if (b & 2) {
> >>     encodeSLEB128(static_cast<int32_t>(rel.r_type - type), os);
> >>     type = rel.r_type;
> >>   }
> >>   if (b & 4) {
> >>     encodeSLEB128(std::make_signed_t<uint>(rel.r_addend - addend), os);
> >>     addend = rel.r_addend;
> >>   }
> >> }
> >>
> >> ---
> >>
> >> While alternatives like PrefixVarInt (or a suffix-based variant) might
> >> excel when encoding larger integers, LEB128 offers advantages when
> >> most integers fit within one or two bytes, as it avoids the need for
> >> shift operations in the common one-byte representation.
> >>
> >> While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert
> >> SLEB128-encoded type/addend to use ULEB128 instead, the generate code
> >> is inferior to or on par with SLEB128 for one-byte encodings.
> >
> >
> > We can introduce a gas option --crel, then users can specify `gcc
> > -Wa,--crel a.c` (-flto also gets -Wa, options).
> >
> > I propose that we add another gas option --implicit-addends-for-data
> > (does the name look good?) to allow non-code sections to use implicit
> > addends to save space
> > (https://sourceware.org/PR31567).
> > Using implicit addends primarily benefits debug sections such as
> > .debug_str_offsets, .debug_names, .debug_addr, .debug_line, but also
> > data sections such as .eh_frame, .data., .data.rel.ro, .init_array.
> >
> > -Wa,--implicit-addends-for-data can be used on its own (6.4% .o
> > reduction in a clang -g -g0 -gpubnames build)

> And this option will the switch from RELA to REL relocation sections, effectively in violation of most ABIs I'm aware of?

This does violate x86-64 LP64 psABI and PPC64 ELFv2. The AArch64 psABI
allows REL while the RISC-V psABI doesn't say anything about REL/RELA.

x86-64:

    The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation
entries with explicit addends. The r_addend member serves as the
relocation addend.
    The AMD64 ILP32 ABI architecture uses only Elf32_Rela relocation
entries in relocatable files. Executable files or shared objects may
use either Elf32_Rela or Elf32_Rel relocation entries.

AArch64:

    A binary file may use ``REL`` or ``RELA`` relocations or a mixture
of the two (but multiple relocations of the same place must use only
one type).

    The initial addend for a ``REL``-type relocation is formed
according to the following rules.

    - If the relocation relocates data (`Static Data relocations`_)
the initial value in the place is sign-extended to 64 bits.

    - If the relocation relocates an instruction the immediate field
of the instruction is extracted, scaled as required by the instruction
field encoding, and sign-extended to 64 bits.

    A ``RELA`` format relocation must be used if the initial addend
cannot be encoded in the place.

    There is no PC bias to accommodate in the relocation of a place
containing an instruction that formulates a PC- relative address. The
program counter reflects the address of the currently executing
instruction.

PPC64 ELFv2:

    The 64-bit OpenPOWER Architecture uses Elf64_Rela relocation
entries exclusively.

> Furthermore, why just data? x86 at least could benefit almost as much for code. Hence maybe better --implicit-addends=data, with an option for architectures to also permit --implicit-addends=text.

I agree that the design is not great. I am thinking about an option
that applies to all sections:
During fixup conversion to relocations, check if the relocation type
can accommodate the addend as a "data relocation type."
If any relocation within a section encounters an oversized addend,
switch from REL to RELA.
However, the feasibility of this approach needs evaluation regarding
implementation complexity.

---

I have made `clang -g -gz=zstd` experiments, building lld for both
`-O0` and `-O2`:

```
.o size    | reloc size | .debug size |.debug_addr|.c?rela?.debug_addr
1453265896 |  467465160 |  200379733  |     77894 |  51123648
| -g -gz=zstd
1361904480 |  345821648 |  230681356  |   1628142 |  34082432
| -g -gz=zstd -Wa,--implicit-addends-for-data
1042317288 |   56517599 |  200378501  |     77894 |   5000201
| -g -gz=zstd -Wa,--crel
1057438728 |   41336040 |  230681552  |   1628142 |   3720546
| -g -gz=zstd -Wa,--crel,--implicit-addends-for-data

 626745136 |  292634688 |  225932160  |     77920 |  47820480
| -O2 -g -gz=zstd
 564322008 |  201200656 |  254962205  |   3104850 |  31880320
| -O2 -g -gz=zstd -Wa,--implicit-addends-for-data
 363224200 |   29114818 |  225930949  |     77920 |   4513572
| -O2 -g -gz=zstd -Wa,--crel
 377970016 |   14829524 |  254962382  |   3104850 |   2118037
| -O2 -g -gz=zstd -Wa,--crel,--implicit-addends-for-data
```

Observations:

* With or without -gz=zstd (another experiment not shown here), the .o
size reduction ratios with REL are close.
* Implicit addends make .debug* sections less compressible. If the
focus is .debug* and .rela.debug* sections, REL is a loss with
-gz=zstd.
* REL -gz=zstd is still smaller than RELA -gz=zstd, which is not
surprising as we compare uncompressed REL/RELA (larger difference) and
compressed non-zero/zero `.debug` contents (smaller difference).

A few points about CREL:

* For CREL -gz=zstd, using implicit addends increases .o file sizes
likely because the "less compressible" factor is more significant when
the relocation size becomes negligible.
* CREL reduction ratio becomes incredible with -gz=zstd at a high
optimization level: for -O2 -g -gz=zstd, it's a 42.0% reduction in the
.o size!
* CREL with implicit addends might not be worth doing if the priority
is debug sections.

  reply	other threads:[~2024-03-29  7:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15  0:16 RELLEB " Fangrui Song
2024-03-23  1:51 ` CREL relocation format for ELF (was: RELLEB) Fangrui Song
2024-03-28  7:43   ` Fangrui Song
2024-03-28  9:23     ` CREL relocation format for ELF Jan Beulich
2024-03-29  7:24       ` Fangrui Song [this message]
2024-03-28 13:04   ` CREL relocation format for ELF (was: RELLEB) Alan Modra
2024-03-28 16:26     ` Fangrui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN30aBF05Dsqc8PJ8=rr9xU9iWt=eNu9oRkVirxfSQLEppSYtA@mail.gmail.com' \
    --to=maskray@gcc.gnu.org \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).