From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 87762 invoked by alias); 13 Dec 2017 00:53:37 -0000 Mailing-List: contact gnu-gabi-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: gnu-gabi-owner@sourceware.org Received: (qmail 87736 invoked by uid 89); 13 Dec 2017 00:53:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Roland X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-HELO: mail-yw0-f176.google.com Received: from mail-yw0-f176.google.com (HELO mail-yw0-f176.google.com) (209.85.161.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 13 Dec 2017 00:53:34 +0000 Received: by mail-yw0-f176.google.com with SMTP id g191so258530ywe.7 for ; Tue, 12 Dec 2017 16:53:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=iaOQ6zdBXgZI2BchZncq38m0jzBMbQNEAs58mH4Ufko=; b=XUgX/B+/khgtepkEXrVaBZtZ6IOlMhYvjdRS+Inkh3zc6WT8xorEq/TX9t/zbACIX6 lMTVyyKZvS1EehwyU2IXuM4CpliopWq221aNI063rIsKUJFRkOjHoeXlUlswdDV3Zi+s 7vYVNNkEJPXUTQfP8IVyJM3IYAlqVGZxK+X8ASf6BWrGEyPE436jwgSLuRfepXO7OQqI np/La8sq/ZBdbwdH8OPH8zjPb8Lb68FBBatWu/n170Avm5LxnE7dOtDLts9Lr+qZd1qn 0lLleYCgbB6hd9tgXmTELTOp5lf8BILKWXML/DOreEfbM2lAZNGydoF803bIGVwpSwEF +EVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=iaOQ6zdBXgZI2BchZncq38m0jzBMbQNEAs58mH4Ufko=; b=HYwwkaNdTHlrGV/qz88T0pc8gCZto7KenW0oo4ixiQkPLlQoBQ0rPXsaiOw0AtJzLN NHoXeRJ61QKq3z1U5OH3hIb+ksIvwNuAALPKcApVzFw4KblVwHkeCl3XcWDRcSMQ9DBd g1AECVKoY51Db73IMvipsOmFnO685ig3KEwKwAiTPsIGvxrKoHqlqAlmKicJv4K9No5e kr62Kvd2WW3UwIy0xava75/w27NMi6Bigg7BwO0Q6oyYCAq+cbTdNu/PqEr/nJn8L7p6 inDfHhVA/Mmc5z9f9lHz629Rnl5VVL2DcU4Im+L9CmDC1iYAX/IVRcBk0/ZZV8PlkP7k aTtQ== X-Gm-Message-State: AKGB3mLoeo4q6yp3aL+3BLS9iMyt6rBRXn3G2Vg1iKK9jQ+z00xkpjLx aoQLwjLqpeI+yPSK7IJFfGqmyJDBmsphZUOpX4cUqw== X-Google-Smtp-Source: ACJfBot+yDA1HQO0Yu+/SwW3v6v3891ieeDxE54DznrhFHafSsD0R1+bt4zjm8+v5nV248g9K0lyypaZ9hXOePKsYAk= X-Received: by 10.129.99.8 with SMTP id x8mr544948ywb.260.1513126412264; Tue, 12 Dec 2017 16:53:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.104.69 with HTTP; Tue, 12 Dec 2017 16:53:31 -0800 (PST) In-Reply-To: References: <8737cosnym.fsf@localhost.localdomain.i-did-not-set--mail-host-address--so-tickle-me> <7e698a5f-32d7-6549-7e23-8850b85e6c10@gmail.com> <874lozec25.fsf@mid.deneb.enyo.de> From: "Rahul Chaudhry via gnu-gabi" Reply-To: Rahul Chaudhry Date: Sun, 01 Jan 2017 00:00:00 -0000 Message-ID: Subject: Re: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section To: Roland McGrath Cc: Sriraman Tallam , Florian Weimer , Rahul Chaudhry via gnu-gabi , Suprateeka R Hegde , Florian Weimer , David Edelsohn , Rafael Avila de Espindola , Binutils Development , Alan Modra , Cary Coutant , Xinliang David Li , Sterling Augustine , Paul Pluzhnikov , Ian Lance Taylor , "H.J. Lu" , Luis Lozano , Peter Collingbourne , Rui Ueyama , llvm-dev@lists.llvm.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2017-q4/txt/msg00017.txt.bz2 On Mon, Dec 11, 2017 at 6:14 PM, Roland McGrath wrot= e: > > On Mon, Dec 11, 2017 at 3:50 PM Rahul Chaudhry via gnu-gabi wrote: >> >> A simple combination of delta-encoding and run_length-encoding is one of= the >> first schemes we experimented with (32-bit entries with 24-bit 'delta' a= nd an >> 8-bit 'count'). This gave really good results, but as Sri mentions, we o= bserved >> several cases where the relative relocations were not on consecutive off= sets. >> There were common cases where the relocations applied to alternate words= , and >> that totally wrecked the scheme (a bunch of entries with delta=3D=3D16 a= nd >> count=3D=3D1). > > > For the same issue in a different context, I recently implemented a schem= e using run-length-encoding but using a variable stride. So for a run of a= lternate words, you still get a single entry, but with stride 16 instead of= 8. In my application, most cases of strides > 8 are a run of only 2 or 3 = but there are a few cases of dozens or hundreds with a stride of 16. My ca= se is a solution tailored to exactly one application (a kernel), so there i= s a closed sample set that's all that matters and the trade-off between sim= plicity of the analysis and compactness of the results is different than th= e general case you're addressing (my "analysis" consists of a few lines of = AWK). But I wonder if it might be worthwhile to study the effect a variabl= e-stride RLE scheme or adding the variable-stride ability into your hybrid = scheme has on your sample applications. > > Since we're talking about specifying a new ABI that will be serving us fo= r many years to come and will be hard to change once deployed, it seems wor= th spending quite a bit of effort up front to come to the most compact sche= me that's feasible. I agree. Can you share more details of the encoding scheme that you found useful (size of each entry, number of bits used for stride/count etc.)? I just ran some experiments with an encoding with 32-bit entries: 16-bits f= or delta, 8-bits for stride, and 8-bits for count. Here are the numbers, inlin= ed with those from the previous schemes for comparison: 1. Chrome browser (x86_64, built as PIE): 605159 relocation entries (24 bytes each) in '.rela.dyn' 594542 are R_X86_64_RELATIVE relocations (98.25%) 14269008 bytes (13.61MB) in use in '.rela.dyn' section 385420 bytes (0.37MB) using delta+count encoding 232540 bytes (0.22MB) using delta+stride+count encoding 109256 bytes (0.10MB) using jump+bitmap encoding 2. Go net/http test binary (x86_64, 'go test -buildmode=3Dpie -c net/http') 83810 relocation entries (24 bytes each) in '.rela.dyn' 83804 are R_X86_64_RELATIVE relocations (99.99%) 2011296 bytes (1.92MB) in use in .rela.dyn section 204476 bytes (0.20MB) using delta+count encoding 132568 bytes (0.13MB) using delta+stride+count encoding 43744 bytes (0.04MB) using jump+bitmap encoding 3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64) 6680 relocation entries (24 bytes each) in '.rela.dyn' 6272 are R_X86_64_RELATIVE relocations (93.89%) 150528 bytes (0.14MB) in use in .rela.dyn section 14388 bytes (0.01MB) using delta+count encoding 7000 bytes (0.01MB) using delta+stride+count encoding 1992 bytes (0.00MB) using jump+bitmap encoding delta+count encoding is using 32-bit entries: 24-bit delta: number of bytes since last offset. 8-bit count: number of relocations to apply (consecutive words). delta+stride+count encoding is using 32-bit entries: 16-bit delta: number of bytes since last offset. 8-bit stride: stride (in bytes) for applying 'count' relocations. 8-bit count: number of relocations to apply (using 'stride'). jump+bitmap encoding is using 64-bit entries: 8-bit jump: number of words since last offset. 56-bit bitmap: bitmap for which words to apply relocations to. While adding a 'stride' field is definitely an improvement over simple delta+count encoding, it doesn't compare well against the bitmap based encoding. I took a look inside the encoding for the Vim binary. There are some instan= ces in the bitmap based encoding like [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...] that encode sequences of relocations applying to alternate words. The stride based encoding works very well on these and turns it into much more compact [0x0ff010ff 0x0ff010ff 0x0ff010ff ...] using stride=3D=3D0x10 and count=3D=3D0xff. However, for the vast majority of cases, the stride based encoding ends up = with count <=3D 2, and that kills it in the end. I could try something more complex with 16-bit entries, but that can only g= ive 2x improvement at best, so it still won't be better than the bitmap approac= h. Thanks, Rahul > -- > > > Thanks, > Roland