From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id 172993858D35 for ; Sun, 27 Feb 2022 15:07:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 172993858D35 Received: by mail-pg1-x533.google.com with SMTP id z4so9281140pgh.12 for ; Sun, 27 Feb 2022 07:07:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nu48R5PPA34Fxj9JDyxRDJ9SyAP+KFI6VRSDfzWIhgY=; b=tsyrhvgfJHhV/vWn17Aun5jflsv0LlmAhU9+psx4hqDdCtl6W2pHuB7YwAjYk9Vtw1 SlCZWKU/48tVbyNlD92QfZXvV1wzarKavfKzOisi8SHoRs7f7GkguCxI+k54l4h2UswG 76iKqet2iGZx3unuo02mQtMCRHrfudxlDhbSFg5SRtNxPKeToBIRvEjdqKYiAI1ZquLs T5iLf52x3beHHaNB0ybrX0c9MUCu3BUg4r6C32zyhqb3AAYYoRpfvnGJ4bubiEEaUDvz h9NkScfjQqsqz1y4w9cWFbc/XE+J/hGpYJgFfOZGq4a6kBlD9wzeK5YwYznACJlJ+SIn ak2A== X-Gm-Message-State: AOAM531V50INGr1qpmshpgkU1UO8bOJLNCy9lCYO8hVxGhS5d+etVYmh BWgibaiV6V8Hb2a1PtTHC5wTu5BbVF3VvggC8HE= X-Google-Smtp-Source: ABdhPJyL9UpdjjrmyC15hnI4vvyAPZ++jNYPYmweXFpGqNsqEIjgd1mIhhMJVBRjZLZPk9ZrbC+ovMWccyAVq5rVCn4= X-Received: by 2002:a63:af06:0:b0:378:3582:a49f with SMTP id w6-20020a63af06000000b003783582a49fmr9174960pge.125.1645974433098; Sun, 27 Feb 2022 07:07:13 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Sun, 27 Feb 2022 07:06:37 -0800 Message-ID: Subject: Re: x86-64: new CET-enabled PLT format proposal To: Rui Ueyama , Andi Kleen , x86-64-abi Cc: Binutils Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3020.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Feb 2022 15:07:15 -0000 On Sat, Feb 26, 2022 at 7:19 PM Rui Ueyama via Binutils wrote: > > Hello, > > I'd like to propose an alternative instruction sequence for the Intel > CET-enabled PLT section. Compared to the existing one, the new scheme is > simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not > require a separate second PLT section (.plt.sec). > > Here is the proposed code sequence: > > PLT0: > > f3 0f 1e fa // endbr64 > 41 53 // push %r11 > ff 35 00 00 00 00 // push GOT[1] > ff 25 00 00 00 00 // jmp *GOT[2] > 0f 1f 40 00 // nop > 0f 1f 40 00 // nop > 0f 1f 40 00 // nop > 66 90 // nop > > PLTn: > > f3 0f 1e fa // endbr64 > 41 bb 00 00 00 00 // mov $namen_reloc_index %r11d > ff 25 00 00 00 00 // jmp *GOT[namen_index] All PLT calls will have an extra MOV. > GOT[namen_index] is initialized to PLT0 for all PLT entries, so that when a > PLT entry is called for the first time, the control is passed to PLT0 to call > the resolver function. > > It uses %r11 as a scratch register. x86-64 psABI explicitly allows PLT entries > to clobber this register (*1), and the resolve function (__dl_runtime_resolve) > already clobbers it. > > (*1) x86-64 psABI p.24 footnote 17: "Note that %r11 is neither required to be > preserved, nor is it used to pass arguments. Making this register available as > scratch register means that code in the PLT need not spill any registers when > computing the address to which control needs to be transferred." > > FYI, this is the current CET-enabled PLT: > > PLT0: > > ff 35 00 00 00 00 // push GOT[0] > f2 ff 25 e3 2f 00 00 // bnd jmp *GOT[1] > 0f 1f 00 // nop > > PLTn in .plt: > > f3 0f 1e fa // endbr64 > 68 00 00 00 00 // push $namen_reloc_index > f2 e9 e1 ff ff ff // bnd jmpq PLT0 > 90 // nop > > PLTn in .plt.sec: > > f3 0f 1e fa // endbr64 > f2 ff 25 ad 2f 00 00 // bnd jmpq *GOT[namen_index] > 0f 1f 44 00 00 // nop > > In the proposed format, PLT0 is 32 bytes long and each entry is 16 bytes. In > the existing format, PLT0 is 16 bytes and each entry is 32 bytes. Usually, we > have many PLT sections while we have only one header, so in practice, the > proposed format is almost 50% smaller than the existing one. Does it have any impact on performance? .plt.sec can be placed in a different page from .plt. > The proposed PLT does not use jump instructions with BND prefix, as Intel MPX > has been deprecated. > > I already implemented the proposed scheme to my linker > (https://github.com/rui314/mold) and it looks like it's working fine. > > Any thoughts? I'd like to see visible performance improvements or new features in a new PLT layout. I cced x86-64 psABI mailing list. -- H.J.