From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 48016 invoked by alias); 25 Apr 2017 18:30:51 -0000 Mailing-List: contact gnu-gabi-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: gnu-gabi-owner@sourceware.org Received: (qmail 47633 invoked by uid 89); 25 Apr 2017 18:30:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=million, H*i:sk:w@mail., H*i:sk:CAMe9rO, H*f:sk:w@mail. X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-HELO: mail-it0-f46.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=2yWeGbhvjWxm2dvu8kYU4SqQuKOSOxhVAgHgHrRQe34=; b=qEJo0bLx8tMKBBkeu37fs9R2gkegbU7DRLjj5xwRLa5qi5lodQJvR7GrjZj+oOr/Hf aDTCsmfHIszS9ONSy9xkRCOtVXnPMO5Se/Kz/o5pZzSZdpTF68ZgFEbLnoX+PwG1th1T we+PYpfXwcOjhpr1tYKIsJqdn/xNexGqL7M3jDM93SCuSH7HhT0eahcAvAAPaHdFwQEz S2/fDrTt8injMuDST2J9Kh4d+5IdCWj33IlBDu/YQV2hDz0I6SXPr2mdZXBq7Rz+AV08 bSguO8mphgCwcWApsQ5h5cXknQDPzSrs8ZHekIc69w20yclGvQ4Lsv6HmnS0ECJF12dg KXYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=2yWeGbhvjWxm2dvu8kYU4SqQuKOSOxhVAgHgHrRQe34=; b=Fz0N3qPg99DyACN9ePrzXfq2ou9uYx27fMpCAGXv8OX8cADj2YlJV/RRL5z4OgIzC9 j39a1vjuIrdwHW+wGBoIMOXJG0SK4sQIOireHgon2w+Af7yU4+raDZFXVwMG+XMOkXzu Zl3nhU99eOXcoeNCJEOZxlljjPnP2BQx6vU4Ls+p2D5e9fS2o4cWhhsHlNXbpUjPhnFt 9Onfe2hpJGmT7MPF9Y0M8zEOlYs51ykiYr7e4DawwNPsjuE+ogwsmDgrKp1maOxIzXB+ 7C4lW/VEnQI7vlniK1hpEX2lzE6G7m3rbUBMfJfQKXhiDHInaj2TMhF0t2uAC7++G5JE ga7g== X-Gm-Message-State: AN3rC/4h6Iek8znpzf5I0CGkLAbzBM1L8sRuPsZT4u71g9VJNPL7WW38 C19TFSBXYhhGvSxP3SxoqFU1jhIPkGGI X-Received: by 10.36.93.4 with SMTP id w4mr6373475ita.32.1493145045718; Tue, 25 Apr 2017 11:30:45 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Sriraman Tallam Date: Sun, 01 Jan 2017 00:00:00 -0000 Message-ID: Subject: Re: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section To: "H.J. Lu" Cc: gnu-gabi@sourceware.org, binutils , Xinliang David Li , Cary Coutant , Sterling Augustine , Paul Pluzhnikov , Ian Lance Taylor , Rahul Chaudhry , Luis Lozano , =?UTF-8?Q?Rafael_Esp=C3=ADndola?= , Peter Collingbourne , Rui Ueyama Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2017-q2/txt/msg00002.txt.bz2 On Tue, Apr 25, 2017 at 11:02 AM, H.J. Lu wrote: > On Tue, Apr 25, 2017 at 10:12 AM, Sriraman Tallam w= rote: >> We identified a problem with PIE executables, more than 5% code size >> bloat compared to non-PIE and we have a few proposals to reduce the >> bloat. Please take a look and let us know what you think. >> >> * What is the problem? >> >> PIE is a security hardening feature that enables ASLR (Address Space >> Layout Randomization) and enables the executable to be loaded at a >> random virtual address upon every execution instance. On an average, a >> binary when built as PIE is larger by 5% to 9%, as measured on a suite >> of benchmarks used at Google where the average text size is ~100MB, >> when compared to the one built without PIE. This is also independent >> of the target architecture and we found this to be true for x86_64, >> arm64 and power. We noticed that the primary reason for this code >> size bloat is due to the extra dynamic relocations that are generated >> in order to make the binary position independent. This proposal >> introduces new ways to represent these dynamic relocations that can >> reduce the code size bloat to just a few percent. >> >> As an example, to show the bloat in code size, here is the data from >> one of our larger binaries, >> >> Without PIE, the binary=E2=80=99s code size in bytes is this as displaye= d by >> the =E2=80=98size=E2=80=99 command: >> >> text data bss dec >> 504663285 16242884 9130248 530036417 >> >> With PIE, the binary=E2=80=99s code size in bytes is this as displayed b= y the >> =E2=80=98size=E2=80=99 command: >> >> text data bss dec >> 539781977 16242900 9130248 565155125 >> >> The text size of the binary grew by 7% and the total size by 6.6%. >> Our experiments have shown that the binary sizes grow anywhere from 5% >> to 9% with PIE on almost all benchmarks we looked at. Notice that >> almost all the code bloat comes from the =E2=80=9Ctext=E2=80=9D segment = of the binary, >> which contains the executable code of the application and any >> read-only data. We looked into this segment to see why this is >> happening and found that the size of the section that contains the >> dynamic relocations for a binary explodes with PIE. For instance, >> without PIE, for the above binary the dynamic relocation section >> contains 46 entries whereas with PIE, the same section contains >> 1463325 entries. It takes 24 bytes to store one entry, that is 3 >> integer values each of size 8 bytes. So, the dynamic relocations >> alone need an extra space of (1463325 - 46) * 8 bytes which is 35 >> million bytes which is almost all the bloat incurred!. >> >> * What are these extra dynamic relocations that are created for PIE exec= utables? >> >> We noticed that these extra relocations for PIE binaries have a common >> pattern and are needed for the reason that it is not known until >> run-time where the binary will be loaded. All of these extra dynamic >> relocations are of the ELF type R_X86_64_RELATIVE. Let us show using >> an example what these relocations do. >> Let us take an example of a program that stores the address of a global: >> >> #include >> >> const int a =3D 10; >> >> const int *b =3D &a; >> >> int main() { >> >> printf (=E2=80=9Cb =3D %p\n=E2=80=9D, b); >> >> } >> >> First, let us look at the binary built without PIE. Let=E2=80=99s look = at the >> data section where =E2=80=98b=E2=80=99 and =E2=80=98a=E2=80=99 are alloc= ated. >> >> 00000000004007d0 : >> 4007d0: 0a 00 >> >> >> 0000000000401b10 : >> 401b10: d0 07 >> 401b12: 40 00 00 >> >> Variable =E2=80=98a=E2=80=99 is allocated at address 0x4007d0 which matc= hes the output >> when running the binary. =E2=80=98b=E2=80=99 is allocated at address 0x= 401b10 and its >> contents in little-endian byte order is the address of =E2=80=98a=E2=80= =99. >> >> Now, lets us examine the contents of the PIE binary: >> >> 00000000000008d8 : >> 8d8: 0a 00 >> >> 0000000000001c50 : >> 1c50: d8 08 >> 1c50: R_X86_64_RELATIVE *ABS*+0x8d8 >> 1c52: 00 00 >> 1c54: 00 00 >> >> >> Notice there is a dynamic relocation here which tells the dynamic >> linker that this value needs to be fixed at run-time. This is needed >> because ASLR can load this binary anywhere in the address space and >> this relocation fixes the address after it is loaded. >> >> >> * More details about R_X86_64_RELATIVE relocations >> >> This relocation is worth 24 bytes and has three fields >> >> Offset >> >> Type - here it is R_X86_64_RELATIVE >> >> Addend (what extra value needs to be added) >> >> The offset field of this relocation is the address offset from the >> start where this relocation applies. The type field indicates the >> type of the dynamic relocation but we are interested in particularly >> one type of dynamic relocation, R_X86_64_RELATIVE. This is important >> because in the motivating example that we presented above, all the >> extra dynamic relocations were of this type! >> >> >> * We have these proposals to reduce the size of the dynamic relocations = section: >> > > There are 3 pieces of run-time relocation information: > > 1. Type and symbol. 4 or 8 bytes > 2. Offset. 4 or 8 bytes > 3. Addend. 4 or 8 bytes > > If we use REL instead of RELA, addend can be implicit and stored in-place. > If we limit the type to relative relocation, we only need offset. > This is for PIC, > not just for PIE. An we can use special encoding scheme for offset table, > which can be placed in DT_GNU_RELATIVE_REL with > DT_GNU_RELATIVE_RELSZ. I have not done an intrusive change like this before, so I am wondering what are the various tools/pieces that need to be modified. Pointers to how to go about this would be really helpful. I can think of these: * Linker - gold, lld, gnuld * Dynamic Linker * readelf * objdump * ABI changes - what is involved here? Thanks Sri > > -- > H.J.