From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com [209.85.221.172]) by sourceware.org (Postfix) with ESMTPS id 877E63858D39 for ; Fri, 29 Mar 2024 07:24:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 877E63858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 877E63858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.221.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711697064; cv=none; b=UIjayPqhftft2fJY1IHVe611zckzXMl2QDkYnDww/SzaNsTcIuXf6rIKvwN1ro5OIyBxwjErUz3DNTEAj+JbMI94QLOgAOcF7J5UkOVIQljcFhW4rk/lu0W20LM0PXqnWzVGr/h77VigEy8j511ZQwPBKTfIvNTznzgebbp9+OA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711697064; c=relaxed/simple; bh=4nblXGvIqWfXENwa8TpSWcpYzBHidZ24I7+7OkKwDs8=; h=MIME-Version:From:Date:Message-ID:Subject:To; b=Jl6URGgwYXSMmeFsOxPb4RMzlJAcKNM8nyY97Wa7f3aTBf66af5jlfMlvWxkfX306dtBoFfIGrJZxt5lBHGHF6lEkZuhGXQfhZcK5z6Eo6HdknxuymG/IMVQLz6w9VTrphKkbUy8m5WIyZ7GcCv3su7Crljqi3BHcC9gUPXAA2U= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-vk1-f172.google.com with SMTP id 71dfb90a1353d-4d42d18bd63so526958e0c.2 for ; Fri, 29 Mar 2024 00:24:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711697054; x=1712301854; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4c2D5D+ZuMp6EI60DzYU4GAlWraopcWCBxy846tHuR4=; b=pAiNEdjdE+0zjX2nl/VsYRKMX75qtTheMzGAvy/FEwqZOgVV6dD/kT635TRY1+HZzh pqfShdxCzDQCRYGZyHOfQF5vqsQ1LwFeReMnuF4xISKpk7zNUosIrcj7JfG7c6WPCn3j tsgFOMrS1ZCynKoTxFj0P+52DsbleEbiTmBFU7hZo1kGdQqVQUHmxaxAiNme8+lAL8NW M1osLaggtnDEAZkWAda7YcWqCVEdy8S4HjgjZgjKgf7VDIUFLJwQTTRrjGsoSqnWdqMz hPSFNR/vu+qy9AG12H/JqEa3LSwjeCHKFTGnE70IAoC+BdBN/zoZ+2eHnq1lv2IcWiF0 Eapw== X-Forwarded-Encrypted: i=1; AJvYcCUmXrT5O/uflpI9rDhH9IClNGOBhggrx81rqYiznvGpbxyGrC1oTdr4ZfFN6DTaA9+sA2DnVRRRI4Br6H4xhCc= X-Gm-Message-State: AOJu0YynSsrHGR1txhmjqfOw3sPOgs9HxWJFLWKqNhkz5xrr4LQYQJz4 aF0yzIoJfdYAaa+RRUGOjgup9CU9nJBnySEpUc+C11uE42jkXGWMtZQoa3m6 X-Google-Smtp-Source: AGHT+IHrl+4zsDwyrebj1zvBDjGbqvZ50IOOKkxiksq4EydrGww19cU1+t9O4HABUq97anI9C78T6w== X-Received: by 2002:a1f:e402:0:b0:4d4:32e1:e7b4 with SMTP id b2-20020a1fe402000000b004d432e1e7b4mr1216013vkh.4.1711697053559; Fri, 29 Mar 2024 00:24:13 -0700 (PDT) Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com. [209.85.222.47]) by smtp.gmail.com with ESMTPSA id ee3-20020a056122478300b004d895c72d56sm267074vkb.50.2024.03.29.00.24.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Mar 2024 00:24:13 -0700 (PDT) Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-7e102005fd3so438782241.2 for ; Fri, 29 Mar 2024 00:24:13 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVRE5+UYsCgXmdJit39bsGKETTU5hRbfSoJNMw7oKs8ES3lY+fJD+8tvodbB2JDUCFepzTriE/HFgCmrJ26XN8= X-Received: by 2002:a05:6122:200d:b0:4d4:ef9:71b0 with SMTP id l13-20020a056122200d00b004d40ef971b0mr1303139vkd.7.1711697052649; Fri, 29 Mar 2024 00:24:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fangrui Song Date: Fri, 29 Mar 2024 00:24:01 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: CREL relocation format for ELF To: Jan Beulich Cc: Cary Coutant , binutils@sourceware.org, gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_INFOUSMEBIZ,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Mar 28, 2024 at 2:23=E2=80=AFAM Jan Beulich wro= te: > > On 28.03.2024 08:43, Fangrui Song wrote: > > On Fri, Mar 22, 2024 at 6:51=E2=80=AFPM Fangrui Song wrote: > >> > >> On Thu, Mar 14, 2024 at 5:16=E2=80=AFPM Fangrui Song wrote: > >>> > >>> The relocation formats REL and RELA for ELF are inefficient. In a > >>> release build of Clang for x86-64, .rela.* sections consume a > >>> significant portion (approximately 20.9%) of the file size. > >>> > >>> I propose RELLEB, a new format offering significant file size > >>> reductions: 17.2% (x86-64), 16.5% (aarch64), and even 32.4% (riscv64)= ! > >>> > >>> Your thoughts on RELLEB are welcome! > >>> > >>> Detailed analysis: > >>> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-el= f > >>> generic ABI (ELF specification): > >>> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw > >>> binutils feature request: https://sourceware.org/bugzilla/show_bug.cg= i?id=3D31475 > >>> LLVM: https://discourse.llvm.org/t/rfc-relleb-a-compact-relocation-fo= rmat-for-elf/77600 > >>> > >>> Implementation primarily involves binutils changes. Any volunteers? > >>> For GCC, a driver option like -mrelleb in my Clang prototype would be > >>> needed. The option instructs the assembler to use RELLEB. > >> > >> The format was tentatively named RELLEB. As I refine the original pure > >> LEB-based format, =E2=80=9CRELLEB=E2=80=9D might not be the most fitti= ng name. > >> > >> I have switched to SHT_CREL/DT_CREL/.crel and updated > >> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf > >> and > >> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/eiBcYxSfAQAJ > >> > >> The new format is simpler and better than RELLEB even in the absence > >> of the shifted offset technique. > >> > >> Dynamic relocations using CREL are even smaller than Android's packed > >> relocations. > >> > >> // encodeULEB128(uint64_t, raw_ostream &os); > >> // encodeSLEB128(int64_t, raw_ostream &os); > >> > >> Elf_Addr offsetMask =3D 8, offset =3D 0, addend =3D 0; > >> uint32_t symidx =3D 0, type =3D 0; > >> for (const Reloc &rel : relocs) > >> offsetMask |=3D crels[i].r_offset; > >> int shift =3D std::countr_zero(offsetMask) > >> encodeULEB128(relocs.size() * 4 + shift, os); > >> for (const Reloc &rel : relocs) { > >> Elf_Addr deltaOffset =3D (rel.r_offset - offset) >> shift; > >> uint8_t b =3D deltaOffset * 8 + (symidx !=3D rel.r_symidx) + > >> (type !=3D rel.r_type ? 2 : 0) + (addend !=3D rel.r_adde= nd ? 4 : 0); > >> if (deltaOffset < 0x10) { > >> os << char(b); > >> } else { > >> os << char(b | 0x80); > >> encodeULEB128(deltaOffset >> 4, os); > >> } > >> if (b & 1) { > >> encodeSLEB128(static_cast(rel.r_symidx - symidx), os); > >> symidx =3D rel.r_symidx; > >> } > >> if (b & 2) { > >> encodeSLEB128(static_cast(rel.r_type - type), os); > >> type =3D rel.r_type; > >> } > >> if (b & 4) { > >> encodeSLEB128(std::make_signed_t(rel.r_addend - addend), os)= ; > >> addend =3D rel.r_addend; > >> } > >> } > >> > >> --- > >> > >> While alternatives like PrefixVarInt (or a suffix-based variant) might > >> excel when encoding larger integers, LEB128 offers advantages when > >> most integers fit within one or two bytes, as it avoids the need for > >> shift operations in the common one-byte representation. > >> > >> While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert > >> SLEB128-encoded type/addend to use ULEB128 instead, the generate code > >> is inferior to or on par with SLEB128 for one-byte encodings. > > > > > > We can introduce a gas option --crel, then users can specify `gcc > > -Wa,--crel a.c` (-flto also gets -Wa, options). > > > > I propose that we add another gas option --implicit-addends-for-data > > (does the name look good?) to allow non-code sections to use implicit > > addends to save space > > (https://sourceware.org/PR31567). > > Using implicit addends primarily benefits debug sections such as > > .debug_str_offsets, .debug_names, .debug_addr, .debug_line, but also > > data sections such as .eh_frame, .data., .data.rel.ro, .init_array. > > > > -Wa,--implicit-addends-for-data can be used on its own (6.4% .o > > reduction in a clang -g -g0 -gpubnames build) > And this option will the switch from RELA to REL relocation sections, eff= ectively in violation of most ABIs I'm aware of? This does violate x86-64 LP64 psABI and PPC64 ELFv2. The AArch64 psABI allows REL while the RISC-V psABI doesn't say anything about REL/RELA. x86-64: The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation entries with explicit addends. The r_addend member serves as the relocation addend. The AMD64 ILP32 ABI architecture uses only Elf32_Rela relocation entries in relocatable files. Executable files or shared objects may use either Elf32_Rela or Elf32_Rel relocation entries. AArch64: A binary file may use ``REL`` or ``RELA`` relocations or a mixture of the two (but multiple relocations of the same place must use only one type). The initial addend for a ``REL``-type relocation is formed according to the following rules. - If the relocation relocates data (`Static Data relocations`_) the initial value in the place is sign-extended to 64 bits. - If the relocation relocates an instruction the immediate field of the instruction is extracted, scaled as required by the instruction field encoding, and sign-extended to 64 bits. A ``RELA`` format relocation must be used if the initial addend cannot be encoded in the place. There is no PC bias to accommodate in the relocation of a place containing an instruction that formulates a PC- relative address. The program counter reflects the address of the currently executing instruction. PPC64 ELFv2: The 64-bit OpenPOWER Architecture uses Elf64_Rela relocation entries exclusively. > Furthermore, why just data? x86 at least could benefit almost as much for= code. Hence maybe better --implicit-addends=3Ddata, with an option for arc= hitectures to also permit --implicit-addends=3Dtext. I agree that the design is not great. I am thinking about an option that applies to all sections: During fixup conversion to relocations, check if the relocation type can accommodate the addend as a "data relocation type." If any relocation within a section encounters an oversized addend, switch from REL to RELA. However, the feasibility of this approach needs evaluation regarding implementation complexity. --- I have made `clang -g -gz=3Dzstd` experiments, building lld for both `-O0` and `-O2`: ``` .o size | reloc size | .debug size |.debug_addr|.c?rela?.debug_addr 1453265896 | 467465160 | 200379733 | 77894 | 51123648 | -g -gz=3Dzstd 1361904480 | 345821648 | 230681356 | 1628142 | 34082432 | -g -gz=3Dzstd -Wa,--implicit-addends-for-data 1042317288 | 56517599 | 200378501 | 77894 | 5000201 | -g -gz=3Dzstd -Wa,--crel 1057438728 | 41336040 | 230681552 | 1628142 | 3720546 | -g -gz=3Dzstd -Wa,--crel,--implicit-addends-for-data 626745136 | 292634688 | 225932160 | 77920 | 47820480 | -O2 -g -gz=3Dzstd 564322008 | 201200656 | 254962205 | 3104850 | 31880320 | -O2 -g -gz=3Dzstd -Wa,--implicit-addends-for-data 363224200 | 29114818 | 225930949 | 77920 | 4513572 | -O2 -g -gz=3Dzstd -Wa,--crel 377970016 | 14829524 | 254962382 | 3104850 | 2118037 | -O2 -g -gz=3Dzstd -Wa,--crel,--implicit-addends-for-data ``` Observations: * With or without -gz=3Dzstd (another experiment not shown here), the .o size reduction ratios with REL are close. * Implicit addends make .debug* sections less compressible. If the focus is .debug* and .rela.debug* sections, REL is a loss with -gz=3Dzstd. * REL -gz=3Dzstd is still smaller than RELA -gz=3Dzstd, which is not surprising as we compare uncompressed REL/RELA (larger difference) and compressed non-zero/zero `.debug` contents (smaller difference). A few points about CREL: * For CREL -gz=3Dzstd, using implicit addends increases .o file sizes likely because the "less compressible" factor is more significant when the relocation size becomes negligible. * CREL reduction ratio becomes incredible with -gz=3Dzstd at a high optimization level: for -O2 -g -gz=3Dzstd, it's a 42.0% reduction in the .o size! * CREL with implicit addends might not be worth doing if the priority is debug sections.