From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <binutils-return-96642-listarch-binutils=sources.redhat.com@sourceware.org>
Received: (qmail 61157 invoked by alias); 25 Apr 2017 00:22:04 -0000
Mailing-List: contact binutils-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <binutils.sourceware.org>
List-Subscribe: <mailto:binutils-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: binutils-owner@sourceware.org
Received: (qmail 59849 invoked by uid 89); 25 Apr 2017 00:22:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-0.0 required=5.0 tests=BAYES_20,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=let=e2, taught, Sterling, sterling?=
X-HELO: mail-wr0-f169.google.com
Received: from mail-wr0-f169.google.com (HELO mail-wr0-f169.google.com) (209.85.128.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 25 Apr 2017 00:22:01 +0000
Received: by mail-wr0-f169.google.com with SMTP id z109so100309503wrb.1        for <binutils@sourceware.org>; Mon, 24 Apr 2017 17:22:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc         :content-transfer-encoding;        bh=8NnSMAuFkDc4idFoBFtPw316F/P5zAAui7yQzLg3T4U=;        b=sRWQRRcDBDAraSBHIpBPyBZbo8haVNlSpJEvftLlccq/syQSXmZhuImSvcPE31VO98         VAoalH3mEdpKHDxP69oCOCjZ5GdpzpjyBTVnWVHfq4bZdgEJ6HRjhJbn9r9atEBoBBhc         qiNIe8adVzN7RIPS6Cs01ijuU0lQ/S1rHC/Fi0+wkeQ8j/E7POTV3+/OFSW3nTFI319T         xiT7DyCJM0YhqHyAqCMn6Vh3oFLwT1MISpitMFLAeSWY9U78Iyx4sa3Ra5k9Cr6mflfQ         qpKwB6CEx/Rtil8VZQ/lTvqXT0m2tTjh3Zht1e9saZ3Q4bzHT+RvVrkhDENvu1KtWTro         rcvg==
X-Gm-Message-State: AN3rC/69jYNrv8lKkTZTeWKQ/pArwZ1F+JyFD3gM4d97cEwHGdtr0DDA	XTdsqymeTENGVEESrJii6GRloFLJRKDq
X-Received: by 10.223.161.222 with SMTP id v30mr8318865wrv.132.1493079721476; Mon, 24 Apr 2017 17:22:01 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.87.137 with HTTP; Mon, 24 Apr 2017 17:22:00 -0700 (PDT)
From: "Sriraman Tallam via binutils" <binutils@sourceware.org>
Reply-To: Sriraman Tallam <tmsriram@google.com>
Date: Tue, 25 Apr 2017 00:22:00 -0000
Message-ID: <CAAs8HmwojZTgUL1huLcPp43vb9Nz7fakO=OnY79DCmP5SBZ2ww@mail.gmail.com>
Subject: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
To: Cary Coutant <ccoutant@gmail.com>, binutils <binutils@sourceware.org>, 	"H.J. Lu" <hjl.tools@gmail.com>
Cc: Sterling Augustine <saugustine@google.com>, Paul Pluzhnikov <ppluzhnikov@google.com>, 	Ian Lance Taylor <iant@google.com>, Xinliang David Li <davidxl@google.com>, 	Rahul Chaudhry <rahulchaudhry@google.com>, Luis Lozano <llozano@google.com>, 	Simon Baldwin <simonb@google.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2017-04/txt/msg00214.txt.bz2

We identified a problem with PIE executables, more than 5% code size
bloat compared to non-PIE and we have a few proposals to reduce the
bloat.  Please take a look and let us know what you think.

* What is the problem?

PIE is a security hardening feature that enables ASLR (Address Space
Layout Randomization) and enables the executable to be loaded at a
random virtual address upon every execution instance.

On an average, a binary when built as PIE is larger by 5% to 9%, as
measured on a suite of benchmarks used at Google where the average
text size is ~100MB, when compared to the one built without PIE.  This
is also independent of the target architecture and we found this to be
true for x86_64, arm64 and power.  We noticed that the primary reason
for this code size bloat is due to the extra dynamic relocations that
are generated in order to make the binary position independent.  This
proposal introduces new ways to represent these dynamic relocations
that can reduce the code size bloat to just a few percent.

As an example,  to show the bloat in code size, here is the data from
one of our larger  binaries,

Without PIE, the binary=E2=80=99s code size in bytes is this as displayed by
the =E2=80=98size=E2=80=99 command:

 text             data            bss           dec
504663285 16242884 9130248 530036417

With PIE, the binary=E2=80=99s code size in bytes is this as displayed by t=
he
=E2=80=98size=E2=80=99 command:

 text            data           bss           dec
539781977 16242900 9130248 565155125

The text size of the binary grew by 7% and the total size by 6.6%.
Our experiments have shown that the binary sizes grow anywhere from 5%
to 9%  with PIE on almost all benchmarks we looked at.

Notice that almost all the code bloat comes from the =E2=80=9Ctext=E2=80=9D=
 segment of
the binary, which contains the executable code of the application and
any read-only data.  We looked into this segment to see why this is
happening and found that the size of the section that contains the
dynamic relocations for a binary explodes with PIE.  For instance,
without PIE, for the above binary the dynamic relocation section
contains 46 entries whereas with PIE, the same section contains
1463325 entries.  It takes 24 bytes to store one entry, that is 3
integer values each of size 8 bytes.  So, the dynamic relocations
alone need an extra space of (1463325 - 46) * 8 bytes which is 35
million bytes which is almost all the bloat incurred!.

* What are these extra dynamic relocations that are created for PIE executa=
bles?

We noticed that these extra relocations for PIE binaries have a common
pattern and are needed for the reason that it is not known until
run-time where the binary will be loaded.  All of these extra dynamic
relocations are of the ELF type R_X86_64_RELATIVE.   Let us show using
an example what these relocations do.

Let us take an example of a program that stores the address of a global:

#include <stdio.h>

const int a =3D 10;

const int *b =3D &a;

int main() {

 printf (=E2=80=9Cb =3D %p\n=E2=80=9D, b);

}

First, let us look at the binary built without PIE.  Let=E2=80=99s look at =
the
data section where =E2=80=98b=E2=80=99 and =E2=80=98a=E2=80=99 are allocate=
d.

00000000004007d0 <a>:
 4007d0:       0a 00


0000000000401b10 <b>:
 401b10:       d0 07
 401b12:       40 00 00

Variable =E2=80=98a=E2=80=99 is allocated at address 0x4007d0 which matches=
 the output
when running the binary.  =E2=80=98b=E2=80=99 is allocated at address 0x401=
b10 and its
contents in little-endian byte order is the address of =E2=80=98a=E2=80=99.

Now, lets us examine the contents of the PIE binary:

00000000000008d8 <a>:
8d8:   0a 00

0000000000001c50 <b>:
   1c50:       d8 08
                    1c50: R_X86_64_RELATIVE *ABS*+0x8d8
   1c52:       00 00
   1c54:       00 00


Notice there is a dynamic relocation here which tells the dynamic
linker that this value needs to be fixed at run-time.  This is needed
because ASLR can load this binary anywhere in the address space and
this relocation fixes the address after it is loaded.


* More details about R_X86_64_RELATIVE relocations

This relocation is worth 24 bytes  and has three fields

Offset

Type - here it is R_X86_64_RELATIVE

Addend (what extra value needs to be added)

The offset field of this relocation is the address offset from the
start where this relocation applies.  The type field indicates the
type of the dynamic relocation but we are interested in particularly
one type of dynamic relocation, R_X86_64_RELATIVE.   This is important
because in the motivating example that we presented above, all the
extra dynamic relocations were of this type!


* We have these proposals to reduce the size of the dynamic relocations sec=
tion:


Idea A: Shrinking the size of the dynamic relocations

1. Use the offset of the relocation to store the addend

2. Create a separate section for storing R_X86_64_RELATIVE type
relocations, =E2=80=9C.pie.dyn=E2=80=9D.

3. Create a new relocation type, R_X86_64_RELATIVE_OFFSET that has
exactly one field, the offset.

4. The type field is not necessary as these is only one type of
relocation in this new section.

5. Store the addend at the offset, which has enough space to store 8
byte addends.

6. The dynamic linker can be thought to read the value from this
offset for the addend rather than reading this from the entry.

7. This new section is just a sequence of offsets which are
R_X86_64_RELATIVE relocations.  The dynamic linker needs to be taught
to apply these relocations correctly.

8. Figure out a way to either prevent this from running or warning at
startup when trying to run this program with old dynamic linkers.

9. Eliminating two-thirds of the relocation reduces the size of these
sections by 66%.  Additionally, this section can also be compressed
and a compression factor of 10 is possible which makes the total size
of these sections 3% to 4% of the original.

10. The decompression by the dynamic linker can be done in chunks to
keep the memory overhead fixed and small.  Notice that the
decompressed chunk=E2=80=99s contents can be discarded after the relocation=
 is
applied.

Idea B: Compressing the dynamic relocation section into .zrela.dyn

1. Simpler to implement.

2. The dynamic linker needs to decompress and apply these relocations
without  memory overhead, which otherwise defeats the purpose of doing
this, by decompressing in chunks.

3. The decompressed chunks can be discarded as and when the
relocations are applied so a fixed size buffer for decompression
should work well.

4. Figure out a way to either prevent this from running or warning at
startup when trying to run this program with unsupported dynamic
linkers.

5. Compression factors that can be achieved with gzip are a factor of
10 which makes the total size 10% of the original.


Thanks to Sterling Augustine, Ian Lance Taylor, Paul Pluzhnikov, and
David Li for the numerous  inputs!  Please let us know what you think
and what would be suitable for implementation.

Thanks
Sri