From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 90573 invoked by alias); 21 Feb 2019 23:09:30 -0000 Mailing-List: contact gnu-gabi-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: gnu-gabi-owner@sourceware.org Received: (qmail 89618 invoked by uid 89); 21 Feb 2019 23:09:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=evidence X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-HELO: mail-oi1-f196.google.com Received: from mail-oi1-f196.google.com (HELO mail-oi1-f196.google.com) (209.85.167.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 21 Feb 2019 23:09:28 +0000 Received: by mail-oi1-f196.google.com with SMTP id b4so276289oif.6 for ; Thu, 21 Feb 2019 15:09:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Uu9P05x+vH/GHOpcyAGwGqGZR+MmViIXYt5jydCJc9A=; b=l+2dZGuHqjSjbhpEmWzUvAgy0xhbF1QgsZPCWqepgQQSaZTtm92nib37n0YyYS/9/G R3BsdWZQL0Ohsh379flAXVba295CKFuAV8f2LJK3uM5hXKJN1/pfFQYXPN6LXTrXRDW6 qHxiOEWlkRQBRKHSZKw8pGkW4XAkj0Ftkr+Vp4Rxyg5PEz4NG+4decdzvYJMo/0duWUh vJHSNg/M3eVPPpi7+mzmeAlYzgJk3o+nFoJLgv6s7F2ksTJToBdS9U7GBU4OejTTg0dV VwDB4tL355NK9ir8QIBzOkYhyMYrmhAlGvlT6NKCZoSA2SaoNIMNAKXhVAnbSgxsSMtl m25A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Uu9P05x+vH/GHOpcyAGwGqGZR+MmViIXYt5jydCJc9A=; b=OYfFbkNTbGL2IXGfdwTm/kWCYrypKDwGbpObnJHtNiA5CLEjoqG7u0KMG6XpN2EVss Vu08ayktdpm43RkYwdYTyMcOD3cZf67F77NU2K/wlHOU3RC5vUSo0fCGDijMjXDxQ2T2 9OEiqn8JYMTLFNEBXPFeyTgEZYs5vCh2bhVrjhvLu/fXToqUY2Cr4UmfGAKC3zriNWT8 wPzSQepYXH4FygLiWQ4PT9q+tsed5aD2An36gjxHqXmkpYGFmNxtlaSgCU1Tr+ECPmby qCDDVcGPNI4w68pzXnrgaqh82gRI5dWKEpwfYV/kdxq7xd0jIGHmCryP1BodvFHuQIWx hK0w== X-Gm-Message-State: AHQUAub7Dhed5hnAKNa6tvMmy7vrmV+ySjLnguHUaGt1JO5s5VGg64yw t2+Nk2hpId7oP664iuH0oCu3meuYO7ps9W/LLfI= X-Google-Smtp-Source: AHgI3IZp3FWrQTLbRcERMLk/OsOUWQ6P0EuA1aMa8RsERfAwLRIH7aDKMAg4U5uZJf8QN3kbV9bOVpb7qIEvfdlYBrg= X-Received: by 2002:aca:88b:: with SMTP id 133mr675248oii.95.1550790566310; Thu, 21 Feb 2019 15:09:26 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Tue, 01 Jan 2019 00:00:00 -0000 Message-ID: Subject: Re: RFC: Update x86 psABIs to support IBT To: Rui Ueyama Cc: IA32 System V Application Binary Interface , "x86-64-abi@googlegroups.com" , gnu-gabi@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019-q1/txt/msg00008.txt.bz2 On Thu, Feb 21, 2019 at 2:30 PM Rui Ueyama wrote: > > On Thu, Feb 21, 2019 at 11:22 AM H.J. Lu wrote: >> >> On Thu, Feb 21, 2019 at 11:18 AM Rui Ueyama wrote: >> > >> > On Wed, Feb 20, 2019 at 7:01 PM H.J. Lu wrote: >> >> >> >> On Wed, Feb 20, 2019 at 4:30 PM Rui Ueyama wrote: >> >> > >> >> > Hi H.J.Lu, >> >> > >> >> > I'm replying because I was wondering why the 2-PLT scheme was chose= n to support Intel CET. >> >> > >> >> > On Tue, Feb 19, 2019 at 8:36 PM H.J. Lu wrote: >> >> >> >> >> >> On Tue, Jun 20, 2017 at 9:38 AM H.J. Lu wrot= e: >> >> >> > >> >> >> > On Tue, Jun 13, 2017 at 12:11 PM, H.J. Lu = wrote: >> >> >> > > To support ENDBR in Intel Control-flow Enforcement Technology = (CET) >> >> >> > > instructions: >> >> >> > > >> >> >> > > https://software.intel.com/sites/default/files/managed/4d/2a/c= ontrol-flow-enforcement-technology-preview.pdf >> >> >> > > >> >> >> > > following changes to i386 psABI are required. >> >> >> > >> >> >> > Here is the updated extension for both i386 and x86-64 psABI to >> >> >> > support IBT. I will post a binutls patch later. >> >> >> > >> >> >> > Any comments? >> >> >> > >> >> >> > -- >> >> >> > H.J. >> >> >> > --- >> >> >> > To support indirect branch tracking (IBT) in Intel Control-flow = Enforcement >> >> >> > Technology (CET) instructions: >> >> >> > >> >> >> > https://software.intel.com/sites/default/files/managed/4d/2a/con= trol-flow-enforcement-technology-preview.pdf >> >> >> > >> >> >> > following changes to x86 psABI are required. >> >> >> > >> >> >> > To program properties, add >> >> >> > >> >> >> > #define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002 >> >> >> > >> >> >> > #define GNU_PROPERTY_X86_FEATURE_1_IBT (1U << 0) >> >> >> > >> >> >> > to indicate that all executable sections are compatible with IBT= when >> >> >> > ENDBR instruction is inserted at: >> >> >> > >> >> >> > a. All function entries whose addresses may be taken. >> >> >> > b. All branch targets whose addresses have been taken. >> >> >> > >> >> >> > GNU_PROPERTY_X86_FEATURE_1_IBT is set on output only if it is se= t on >> >> >> > all relocatable inputs, which means that the C library must be c= ompiled >> >> >> > with IBT-enabled compiler. >> >> >> > >> >> >> > The followings changes are made to the Procedure Linkage Table (= PLT) to >> >> >> > enable IBT: >> >> >> > >> >> >> > 1. For 64-bit x86-64, PLT is changed to: >> >> >> > >> >> >> > PLT0: push GOT[1] >> >> >> > bnd jmp *GOT[2] >> >> >> > nop >> >> >> > ... >> >> >> > PLTn: endbr64 >> >> >> > push namen_reloc_index >> >> >> > bnd jmp PLT0 >> >> >> > >> >> >> > together with the second PLT section: >> >> >> > >> >> >> > PLTn: endbr64 >> >> >> > bnd jmp *GOT[namen_index] >> >> >> > nop >> >> >> > >> >> >> > BND prefix is also added so that IBT-enabled PLT is compatible w= ith MPX. >> >> >> > >> >> >> > 2. For 32-bit x86-64 (x32) and i386, PLT is changed to >> >> >> > >> >> >> > PLT0: push GOT[1] >> >> >> > jmp *GOT[2] >> >> >> > nop >> >> >> > ... >> >> >> > PLTn: endbr64 # endbr32 for i38= 6. >> >> >> > push namen_reloc_index >> >> >> > jmp PLT0 >> >> >> > >> >> >> > together with the second PLT section: >> >> >> > >> >> >> > PLTn: endbr64 # endbr32 for i38= 6. >> >> >> > jmp *GOT[namen_index] >> >> >> > nop >> >> >> > >> >> >> > BND prefix isn't used since MPX isn't supported on x32 and BND r= egisters >> >> >> > aren't used in parameter passing on i386. >> >> >> > >> >> >> >> >> >> There are 2 reasons for this 2-PLT scheme: >> >> >> >> >> >> 1. Provide compatibility with other tools that have an hardcoded = limit of 16 >> >> >> bytes for an x86 PLT entry. >> >> > >> >> > >> >> > I don't think that the 2-PLT scheme actually provides compatibility= with existing tools. The new PLT uses different code instructions, and the= usage of the .plt section has changed as well. IIUC, foo@PLT is now resolv= ed to its entry in the second PLT instead of the first regular PLT. >> >> > >> >> > I know that some existing tools even crash if we change the PLT ent= ry size, so keeping the PLT entry size would at least keep them from crashi= ng. But I'd think compatibility means more than that. >> >> > >> >> >> >> We are doing the best we can. >> >> >> >> >> 2. Improve code cache locality: since most of the instructions in= .plt would be >> >> >> executed only the first time a symbol is resolved they would waste= space in >> >> >> the cache and, by having a .plt.sec, only instructions that are of= ten executed >> >> >> would be cached. >> >> > >> >> > >> >> > This is personally much more convincing answer than keeping the com= patibility. The PLT section could be hot, and separating hot code from rela= tively cold code could have an performance impact. But do you know how much= is the impact? I wonder if there's a measurable difference if you simply e= xtend the PLT size to 32-byte. >> >> > >> >> >> >> We don't have such data. >> > >> > >> > Then it could be a premature optimization. The single PLT scheme would= be undeniably much simpler, so unless it is shown to not work, we probably= shouldn't have splitted a PLT into two, no? >> > >> >> Simpler to implement, yes. We designed it with performance in mind. >> We have implemented it many years ago starting from MPX. It shouldn't >> be changed just because it is "hard" to implement. > > > I can see that the 2-PLT scheme performs better in theory. That being sai= d, I don't think I'm convinced that the design is better in practice if the= expected advantage was not measured. We went with better in theory in our design. We may not see performance differences in practice in most cases. In some cases, PLT section can be quite large: libLLVM-7.0.1.so: [11] .plt PROGBITS 0000000000658020 658020 043bd0 10 AX 0 0 16 [12] .plt.sec PROGBITS 000000000069bbf0 69bbf0 043bc0 10 AX 0 0 16 > I don't think I'm requesting a change to the spec at least at the moment.= What I'm trying to do is to understand the rationale behind a choice of th= e spec before implementing it to our linker, lld. Even if there's no eviden= ce that the 2-PLT scheme performs better than the 1-PLT scheme, we might st= ill want to implement as the spec says, considering the cost of breaking AB= I compatibility. But if we take the route, we'd like to document that fact = as-is. Sure. We'd like to get as many feedbacks and inputs as we can when we prop= ose ABI changes. We encourage you participate in future discussions. --=20 H.J.