From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by sourceware.org (Postfix) with ESMTPS id BF80B3858D37 for ; Sun, 27 Feb 2022 03:19:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BF80B3858D37 Received: by mail-lf1-x12a.google.com with SMTP id u20so15889850lff.2 for ; Sat, 26 Feb 2022 19:19:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=V+stl5Y91ZxnEiejvsmtHVArfk5YyAMI8OmbQKLiZDU=; b=Wprl4vEObIYVPIA/QugT/ZwWjKn3Hjondi7clGKDxNGO7j4shjDHMzQqfb9inEHz4U Ky5dpfBykcWoD1h5A0kNJy6z8Hw0/JTNxMeR8xE7JObMIj7UGvNHqJdjghVlHirEG+13 51Fd+b4/6UWlRYtf+5N5GoXoe7BUr7+2M5B+jHmyeXhWJjCZbxl3N4ilrmTxTv0Ga0fB ZMjqKLXPv/NumvKcXKLZ0Tp40V0/XhSekLUW4bJzmtuvcxWUN5qI9m1w1b8FvEitCggG EhwYihq9WyDrT62eRrO9FYrO+eTWTX4s1J1xTR7Lbsfj2FM89x2Pv82wv7MoyY2JSos3 6bOw== X-Gm-Message-State: AOAM530ziSnn5Vn+X03MhM920VldCiv4qJNO2uhrq48rqSR1PkdiaPUr FsFF4ZuVCa7TQCBAjkc5thA/MgonZP9nehWBNW4ckxRDsdc= X-Google-Smtp-Source: ABdhPJwTWW8Rd5Isoa6qpqXB7Yj7i4+VB6F9Q0gAcfhiqye4Bku3B5BZY463v9v6hBTnJblDKtRIYYHuwA5GeH+5XGk= X-Received: by 2002:a05:6512:31d6:b0:442:ba64:e687 with SMTP id j22-20020a05651231d600b00442ba64e687mr9243181lfe.495.1645931938247; Sat, 26 Feb 2022 19:18:58 -0800 (PST) MIME-Version: 1.0 From: Rui Ueyama Date: Sun, 27 Feb 2022 12:18:47 +0900 Message-ID: Subject: x86-64: new CET-enabled PLT format proposal To: binutils@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, FROM_FMBLA_NEWDOM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Feb 2022 03:19:02 -0000 Hello, I'd like to propose an alternative instruction sequence for the Intel CET-enabled PLT section. Compared to the existing one, the new scheme is simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not require a separate second PLT section (.plt.sec). Here is the proposed code sequence: PLT0: f3 0f 1e fa // endbr64 41 53 // push %r11 ff 35 00 00 00 00 // push GOT[1] ff 25 00 00 00 00 // jmp *GOT[2] 0f 1f 40 00 // nop 0f 1f 40 00 // nop 0f 1f 40 00 // nop 66 90 // nop PLTn: f3 0f 1e fa // endbr64 41 bb 00 00 00 00 // mov $namen_reloc_index %r11d ff 25 00 00 00 00 // jmp *GOT[namen_index] GOT[namen_index] is initialized to PLT0 for all PLT entries, so that when a PLT entry is called for the first time, the control is passed to PLT0 to call the resolver function. It uses %r11 as a scratch register. x86-64 psABI explicitly allows PLT entries to clobber this register (*1), and the resolve function (__dl_runtime_resolve) already clobbers it. (*1) x86-64 psABI p.24 footnote 17: "Note that %r11 is neither required to be preserved, nor is it used to pass arguments. Making this register available as scratch register means that code in the PLT need not spill any registers when computing the address to which control needs to be transferred." FYI, this is the current CET-enabled PLT: PLT0: ff 35 00 00 00 00 // push GOT[0] f2 ff 25 e3 2f 00 00 // bnd jmp *GOT[1] 0f 1f 00 // nop PLTn in .plt: f3 0f 1e fa // endbr64 68 00 00 00 00 // push $namen_reloc_index f2 e9 e1 ff ff ff // bnd jmpq PLT0 90 // nop PLTn in .plt.sec: f3 0f 1e fa // endbr64 f2 ff 25 ad 2f 00 00 // bnd jmpq *GOT[namen_index] 0f 1f 44 00 00 // nop In the proposed format, PLT0 is 32 bytes long and each entry is 16 bytes. In the existing format, PLT0 is 16 bytes and each entry is 32 bytes. Usually, we have many PLT sections while we have only one header, so in practice, the proposed format is almost 50% smaller than the existing one. The proposed PLT does not use jump instructions with BND prefix, as Intel MPX has been deprecated. I already implemented the proposed scheme to my linker (https://github.com/rui314/mold) and it looks like it's working fine. Any thoughts?