From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 75136 invoked by alias); 25 Apr 2017 18:02:21 -0000 Mailing-List: contact gnu-gabi-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: gnu-gabi-owner@sourceware.org Received: (qmail 75101 invoked by uid 89); 25 Apr 2017 18:02:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=million, percent X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-qk0-f179.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=QF8tV4a4bXbEcksLnNF0ldAgtcclHi3CC5a7VWsp200=; b=k9FdF4rkWR7LKQ0pEi4b0H1ldqhTS8fpmfPNje+uRKXBmKgA9rfapyOyYMX+NWw2r9 CdQSa3RZGOzmkJCWSso4HrhO4nnyZVnm1H2uLngVlhDwCmYCbcZ5nsh5lfX+yWlJwzJc pTzyx9mKbsfzx8D/Q/tA1eaA9MFOw+07StRNYj0+KSpJZ9XZuiaZgGpa1HWad3a5M2Gg hq5PKqq7WrRBRaB6QNFpq1R2O+4C3FZNqbUaH8CMSt/Pm5cZtYu0kU1i5USpocWG9ubj 7OIqXDnynA6nNXefuru2TaQS0xJlenE9ANOSkn1DWtzFzfyldrcbllIgWGzVj6jOf4Fl mwhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=QF8tV4a4bXbEcksLnNF0ldAgtcclHi3CC5a7VWsp200=; b=EI1HIqTrQiS4nEzU0Ti9RbL/+W+mEQgjOAjOZJKzN7XkDqwUHO7UaS3OimBW010kjZ PYI4S7XPZ4vEAxPl+TqkFOhAr6XpZrVrGxunr3z2WnQ1rCvUvdTha65gWKpcKd/ejQhw G8jYtj6en7hMVs/vGqn6PeUdhjeJ2wmWfHKxn6KWHITosAWn4HqUd3ZnFZRi2271ee3B NHCieGUdbomz2wHaLJ0ISkY/oAcNG2C0TQDZSZCUKcbxgiXXIIy/NB7XbbuV+H8nFQ42 Mjz+xYdRCkaUukcBG7+5tyZsXI4axBAyTfSZtCKUFqMGL0GM04aabc8ItBWtgzjEbD1C 7KKQ== X-Gm-Message-State: AN3rC/7+a3r/wk2P4EDvOp4nFyOv5NqEXIuVQCClKBYThHCF/7c5ucjf UPyXdb5HWXXNYFv+FtTWcmwSPgBL/Q== X-Received: by 10.55.162.84 with SMTP id l81mr22226871qke.275.1493143338853; Tue, 25 Apr 2017 11:02:18 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: "H.J. Lu" Date: Sun, 01 Jan 2017 00:00:00 -0000 Message-ID: Subject: Re: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section To: Sriraman Tallam Cc: gnu-gabi@sourceware.org, binutils , Xinliang David Li , Cary Coutant , Sterling Augustine , Paul Pluzhnikov , Ian Lance Taylor , Rahul Chaudhry , Luis Lozano , =?UTF-8?Q?Rafael_Esp=C3=ADndola?= , Peter Collingbourne , Rui Ueyama Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2017-q2/txt/msg00001.txt.bz2 On Tue, Apr 25, 2017 at 10:12 AM, Sriraman Tallam wro= te: > We identified a problem with PIE executables, more than 5% code size > bloat compared to non-PIE and we have a few proposals to reduce the > bloat. Please take a look and let us know what you think. > > * What is the problem? > > PIE is a security hardening feature that enables ASLR (Address Space > Layout Randomization) and enables the executable to be loaded at a > random virtual address upon every execution instance. On an average, a > binary when built as PIE is larger by 5% to 9%, as measured on a suite > of benchmarks used at Google where the average text size is ~100MB, > when compared to the one built without PIE. This is also independent > of the target architecture and we found this to be true for x86_64, > arm64 and power. We noticed that the primary reason for this code > size bloat is due to the extra dynamic relocations that are generated > in order to make the binary position independent. This proposal > introduces new ways to represent these dynamic relocations that can > reduce the code size bloat to just a few percent. > > As an example, to show the bloat in code size, here is the data from > one of our larger binaries, > > Without PIE, the binary=E2=80=99s code size in bytes is this as displayed= by > the =E2=80=98size=E2=80=99 command: > > text data bss dec > 504663285 16242884 9130248 530036417 > > With PIE, the binary=E2=80=99s code size in bytes is this as displayed by= the > =E2=80=98size=E2=80=99 command: > > text data bss dec > 539781977 16242900 9130248 565155125 > > The text size of the binary grew by 7% and the total size by 6.6%. > Our experiments have shown that the binary sizes grow anywhere from 5% > to 9% with PIE on almost all benchmarks we looked at. Notice that > almost all the code bloat comes from the =E2=80=9Ctext=E2=80=9D segment o= f the binary, > which contains the executable code of the application and any > read-only data. We looked into this segment to see why this is > happening and found that the size of the section that contains the > dynamic relocations for a binary explodes with PIE. For instance, > without PIE, for the above binary the dynamic relocation section > contains 46 entries whereas with PIE, the same section contains > 1463325 entries. It takes 24 bytes to store one entry, that is 3 > integer values each of size 8 bytes. So, the dynamic relocations > alone need an extra space of (1463325 - 46) * 8 bytes which is 35 > million bytes which is almost all the bloat incurred!. > > * What are these extra dynamic relocations that are created for PIE execu= tables? > > We noticed that these extra relocations for PIE binaries have a common > pattern and are needed for the reason that it is not known until > run-time where the binary will be loaded. All of these extra dynamic > relocations are of the ELF type R_X86_64_RELATIVE. Let us show using > an example what these relocations do. > Let us take an example of a program that stores the address of a global: > > #include > > const int a =3D 10; > > const int *b =3D &a; > > int main() { > > printf (=E2=80=9Cb =3D %p\n=E2=80=9D, b); > > } > > First, let us look at the binary built without PIE. Let=E2=80=99s look a= t the > data section where =E2=80=98b=E2=80=99 and =E2=80=98a=E2=80=99 are alloca= ted. > > 00000000004007d0 : > 4007d0: 0a 00 > > > 0000000000401b10 : > 401b10: d0 07 > 401b12: 40 00 00 > > Variable =E2=80=98a=E2=80=99 is allocated at address 0x4007d0 which match= es the output > when running the binary. =E2=80=98b=E2=80=99 is allocated at address 0x4= 01b10 and its > contents in little-endian byte order is the address of =E2=80=98a=E2=80= =99. > > Now, lets us examine the contents of the PIE binary: > > 00000000000008d8 : > 8d8: 0a 00 > > 0000000000001c50 : > 1c50: d8 08 > 1c50: R_X86_64_RELATIVE *ABS*+0x8d8 > 1c52: 00 00 > 1c54: 00 00 > > > Notice there is a dynamic relocation here which tells the dynamic > linker that this value needs to be fixed at run-time. This is needed > because ASLR can load this binary anywhere in the address space and > this relocation fixes the address after it is loaded. > > > * More details about R_X86_64_RELATIVE relocations > > This relocation is worth 24 bytes and has three fields > > Offset > > Type - here it is R_X86_64_RELATIVE > > Addend (what extra value needs to be added) > > The offset field of this relocation is the address offset from the > start where this relocation applies. The type field indicates the > type of the dynamic relocation but we are interested in particularly > one type of dynamic relocation, R_X86_64_RELATIVE. This is important > because in the motivating example that we presented above, all the > extra dynamic relocations were of this type! > > > * We have these proposals to reduce the size of the dynamic relocations s= ection: > There are 3 pieces of run-time relocation information: 1. Type and symbol. 4 or 8 bytes 2. Offset. 4 or 8 bytes 3. Addend. 4 or 8 bytes If we use REL instead of RELA, addend can be implicit and stored in-place. If we limit the type to relative relocation, we only need offset. This is for PIC, not just for PIE. An we can use special encoding scheme for offset table, which can be placed in DT_GNU_RELATIVE_REL with DT_GNU_RELATIVE_RELSZ. --=20 H.J.