From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.tambre.ee (mail.tambre.ee [IPv6:2a01:7e01:e001:cc::3]) by sourceware.org (Postfix) with ESMTPS id 8AFBE385842E for ; Fri, 15 Oct 2021 14:07:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8AFBE385842E Received: from [IPv6:2001:7d0:8a2e:aa80:21f0:3cc4:423c:3450] (3450-423c-3cc4-21f0-aa80-8a2e-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8a2e:aa80:21f0:3cc4:423c:3450]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: raul) by mail.tambre.ee (Postfix) with ESMTPSA id 702D97F91C; Fri, 15 Oct 2021 17:07:51 +0300 (EEST) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.tambre.ee 702D97F91C DKIM-Filter: OpenDKIM Filter v2.11.0 mail.tambre.ee 702D97F91C To: binutils@sourceware.org From: Raul Tambre Subject: Purpose of NULL SHN_ABS symbols for version script nodes? Message-ID: <6d347539-b9fb-a7ce-ab8a-979c8bd541ed@tambre.ee> Date: Fri, 15 Oct 2021 17:07:48 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------59846B14C6D2BF4C52AF09D8" Content-Language: en-GB X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_INFOUSMEBIZ, KAM_LOTSOFHASH, SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2021 14:07:57 -0000 This is a multi-part message in MIME format. --------------59846B14C6D2BF4C52AF09D8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit What is the reasoning behind GNU ld and gold generating NULL SHN_ABS symbols for version script nodes? test.sym: LIBTEST_1 { }; clang -shared /dev/null -o example.so -Wl,--version-script=test.sym -fuse-ld=ld llvm-nm -D example.so: 0000000000000000 A LIBTEST_1@@LIBTEST_1 w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable w __cxa_finalize w __gmon_start__ LLVM LLD does not. I stumbled upon this when building a Debian package that used version scripts (systemd) and that the packaging was tracking symbols for using dpkg-gensymbols. As a result I ended up reporting Debian bug #992796 () and developing a patch for dpkg-gensymbols to ignore these symbols. However the dpkg maintainer has requested I clarify this with binutils. The existence of the version node symbol can be detected by dlerror() being non-NULL, but I'm not sure what this could realistically could be useful for. I contacted one of the LLD maintainers, Fangrui Song, who was of the opinion of these symbols being useless. I have attached the thread for reference with their permission. They noted that they'd have expected to have received reports if this was made use of somewhere, since quite a few large projects use LLD these days. The generation of these symbols seems to have been added in dbe717effbdf31236088837f4686fd5ad5e71893 along with the rest of version script support. Currently it's done here (). I haven't been able to locate the reponsible piece of code in GNU ld. --------------59846B14C6D2BF4C52AF09D8 Content-Type: message/rfc822; name="Re Version script node symbols ld vs lld - Fangrui Song (i@maskray.me) - 2021-09-19 0027.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="Re Version script node symbols ld vs lld - Fangrui Song (i@"; filename*1="maskray.me) - 2021-09-19 0027.eml" Return-Path: Delivered-To: raul@tambre.ee Received: from mail.tambre.ee by tambre with LMTP id z2uiEtlZRmGOQgYAI96AqQ (envelope-from ) for ; Sun, 19 Sep 2021 00:27:53 +0300 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.tambre.ee (Postfix) with ESMTPS id E68E57F8D3 for ; Sun, 19 Sep 2021 00:27:52 +0300 (EEST) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.tambre.ee E68E57F8D3 Authentication-Results: mail.tambre.ee; dmarc=none (p=none dis=none) header.from=maskray.me Authentication-Results: mail.tambre.ee; spf=pass smtp.mailfrom=gmail.com DKIM-Filter: OpenDKIM Filter v2.11.0 mail.tambre.ee E68E57F8D3 Received: by mail-pg1-f180.google.com with SMTP id r2so13320347pgl.10 for ; Sat, 18 Sep 2021 14:27:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+X93HnjxJmAcp/Ddd1FZQzB7x3m2cB5DpE66N82+C0Y=; b=kgC3V4MQhap8YH84kJ0IlGzZr9FNGRCrHJe+joNl6vwUrasHBwJ45/eFJAMme/yyd0 p/ggaU3PbHbfx7SXy3smUvcKDchJ9B/kUdZyRtAja0uwAw+QlWciGFOeXJIyoT+/xnkA AkiOKQyTiJz4hVbn6/AARr1u8lsv+ttbPZH9Nn7s5tTQ6DnKdEI4iMyuKopGQCxqyOpl /MMX1+NFKtAC3G7JIWQZO2NJ1wfonXnSj7+sO9ouEtiYTN9NCpfFJ7M8Ea02Ns7oUa/S jRh1BlFQyTALc3+PjHzJEKCFkY4m6Rra6tHF+G86d7zRbHLaA4oKlEoxfU1ZA451vd+V xVzQ== X-Gm-Message-State: AOAM531n7nDSGFnd1fSUQ7K+1pxDo1qWsOtj9738IzjgAH6Vfuiyskn+ T+XZq7QR5dpxEBN9gBwLofYVepNCSOw= X-Google-Smtp-Source: ABdhPJz1YUpQ4MAL59L36MqycvipFlNeCGjlE29nJmItk/sGYhroBv8pibyoCd2sEfalsjEgt2bM/g== X-Received: by 2002:a62:1a09:0:b0:435:61bd:2d09 with SMTP id a9-20020a621a09000000b0043561bd2d09mr17906643pfa.71.1632000470819; Sat, 18 Sep 2021 14:27:50 -0700 (PDT) Received: from localhost ([2601:647:6300:b760:5176:7e4d:6fb9:7f83]) by smtp.gmail.com with ESMTPSA id f127sm10035452pfa.25.2021.09.18.14.27.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Sep 2021 14:27:50 -0700 (PDT) Date: Sat, 18 Sep 2021 14:27:49 -0700 From: Fangrui Song To: Raul Tambre Subject: Re: Version script node symbols ld vs lld Message-ID: <20210918212749.rhsrf5j2rtzfgjmp@gmail.com> References: <20210829015406.odftzbmcbqhfvu2t@gmail.com> <93bb4a94-b018-2e4c-f345-bb93f8a0e6b4@tambre.ee> <0877145d-6171-9c7e-5713-560bc8bc01e5@tambre.ee> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <0877145d-6171-9c7e-5713-560bc8bc01e5@tambre.ee> On 2021-09-18, Raul Tambre wrote: >The dpkg maintainer hasn't quite agreed with the reasoning in my bug report. > > > >I have attached an example that can be compiled using: >clang dlsym.c -o dlsym -rdynamic -ldl -Wl,--version-script=dlsym.sym -fuse-ld=ld > >dlerror() returns an undefined symbol error for LIBTEST_1@@LIBTEST_1 >if using LLD. While unlikely to ever cause any trouble, I think this >can be considered an ABI breaking difference. I'm forced to agree with >the maintainer that the current reasoning for ignoring them in >dpkg-gensymbols is invalid. >They also found some relevant code in binutils. > > > This is a .dynsym difference (SHN_ABS) between LLD and GNU linkers, but I don't think any package leverages the property said by the comment. (If they did, I think I'd already received a report from FreeBSD/Android/Chrome OS.) The comment seems incorrect to me: -u is used to force archive member extraction. However, archive files (unlinked) do not have SHN_ABS definitions. >I'm not too familiar with linker internals, so I don't comprehend, but >this behaviour does seem to be used for something. Would you be >willing/able to offer any further insights? > >May I also attach our email thread, once concluded, to the Debian bug >report for reference? Sure:) >#define _GNU_SOURCE > >#include >#include > >void example() >{ >} > >int main(int argc, char** argv) >{ > void* sym = dlsym(0, "LIBTEST_1"); > void* vsym = dlvsym(0, "LIBTEST_1", "LIBTEST_1"); > printf("LIBTEST_1 sym=%p vsym=%p error=%s\n", sym, vsym, dlerror()); > > sym = dlsym(0, "example"); > vsym = dlvsym(0, "example", "LIBTEST_1"); > printf("example sym=%p vsym=%p error=%s\n", sym, vsym, dlerror()); >} >LIBTEST_1 >{ >global: > example; > >local: > *; >}; --------------59846B14C6D2BF4C52AF09D8 Content-Type: message/rfc822; name="Re Version script node symbols ld vs lld - Raul Tambre (raul@tambre.ee) - 2021-08-30 1205.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="Re Version script node symbols ld vs lld - Raul Tambre (rau"; filename*1="l@tambre.ee) - 2021-08-30 1205.eml" Subject: Re: Version script node symbols ld vs lld To: Fangrui Song References: <20210829015406.odftzbmcbqhfvu2t@gmail.com> From: Raul Tambre Message-ID: <93bb4a94-b018-2e4c-f345-bb93f8a0e6b4@tambre.ee> Date: Mon, 30 Aug 2021 12:05:12 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210829015406.odftzbmcbqhfvu2t@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Thanks for the detailed answer and all the extra information! > Thanks for the fix:) I cleaned up and posted the fix for review on the dpkg-dev mailing list: https://lists.debian.org/debian-dpkg/2021/08/msg00003.html > [This appears to be unrelated to the SHN_ABS symbols.] > > Technically the software can, but that won't make much sense. A new version, > if introduces new dynamic symbols, should attach them to the current version > node, rather an arbitrary old version node to confuse users. That part was rather me trying to come up with possible uses for these. But I agree, it'd be quite farfetched and there's likely no actual software doing such trickery. --------------59846B14C6D2BF4C52AF09D8 Content-Type: message/rfc822; name="Re Version script node symbols ld vs lld - Raul Tambre (raul@tambre.ee) - 2021-09-18 2035.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="Re Version script node symbols ld vs lld - Raul Tambre (rau"; filename*1="l@tambre.ee) - 2021-09-18 2035.eml" Subject: Re: Version script node symbols ld vs lld From: Raul Tambre To: Fangrui Song References: <20210829015406.odftzbmcbqhfvu2t@gmail.com> <93bb4a94-b018-2e4c-f345-bb93f8a0e6b4@tambre.ee> Message-ID: <0877145d-6171-9c7e-5713-560bc8bc01e5@tambre.ee> Date: Sat, 18 Sep 2021 20:35:41 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <93bb4a94-b018-2e4c-f345-bb93f8a0e6b4@tambre.ee> Content-Type: multipart/mixed; boundary="------------0D963E9EC9B1AD6362D9788A" Content-Language: en-GB This is a multi-part message in MIME format. --------------0D963E9EC9B1AD6362D9788A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit The dpkg maintainer hasn't quite agreed with the reasoning in my bug report. I have attached an example that can be compiled using: clang dlsym.c -o dlsym -rdynamic -ldl -Wl,--version-script=dlsym.sym -fuse-ld=ld dlerror() returns an undefined symbol error for LIBTEST_1@@LIBTEST_1 if using LLD. While unlikely to ever cause any trouble, I think this can be considered an ABI breaking difference. I'm forced to agree with the maintainer that the current reasoning for ignoring them in dpkg-gensymbols is invalid. They also found some relevant code in binutils. I'm not too familiar with linker internals, so I don't comprehend, but this behaviour does seem to be used for something. Would you be willing/able to offer any further insights? May I also attach our email thread, once concluded, to the Debian bug report for reference? --------------0D963E9EC9B1AD6362D9788A Content-Type: text/plain; charset=UTF-8; name="dlsym.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="dlsym.c" I2RlZmluZSBfR05VX1NPVVJDRQoKI2luY2x1ZGUgPGRsZmNuLmg+CiNpbmNsdWRlIDxzdGRp by5oPgoKdm9pZCBleGFtcGxlKCkKewp9CgppbnQgbWFpbihpbnQgYXJnYywgY2hhcioqIGFy Z3YpCnsKCXZvaWQqIHN5bSA9IGRsc3ltKDAsICJMSUJURVNUXzEiKTsKCXZvaWQqIHZzeW0g PSBkbHZzeW0oMCwgIkxJQlRFU1RfMSIsICJMSUJURVNUXzEiKTsKCXByaW50ZigiTElCVEVT VF8xIHN5bT0lcCB2c3ltPSVwIGVycm9yPSVzXG4iLCBzeW0sIHZzeW0sIGRsZXJyb3IoKSk7 CgoJc3ltID0gZGxzeW0oMCwgImV4YW1wbGUiKTsKCXZzeW0gPSBkbHZzeW0oMCwgImV4YW1w bGUiLCAiTElCVEVTVF8xIik7CglwcmludGYoImV4YW1wbGUgc3ltPSVwIHZzeW09JXAgZXJy b3I9JXNcbiIsIHN5bSwgdnN5bSwgZGxlcnJvcigpKTsKfQo= --------------0D963E9EC9B1AD6362D9788A Content-Type: text/plain; charset=UTF-8; name="dlsym.sym" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="dlsym.sym" TElCVEVTVF8xCnsKZ2xvYmFsOgogIGV4YW1wbGU7Cgpsb2NhbDoKICAqOwp9Owo= --------------0D963E9EC9B1AD6362D9788A-- --------------59846B14C6D2BF4C52AF09D8 Content-Type: message/rfc822; name="Version script node symbols ld vs lld - Raul Tambre (raul@tambre.ee) - 2021-08-27 1455.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="Version script node symbols ld vs lld - Raul Tambre (raul@ta"; filename*1="mbre.ee) - 2021-08-27 1455.eml" To: i@maskray.me From: Raul Tambre Subject: Version script node symbols ld vs lld Message-ID: Date: Fri, 27 Aug 2021 14:55:42 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Given a version script "test.sym": LIBTEST_1 { }; clang -shared /dev/null -o example.so -Wl,--version-script=test.sym -fuse-ld=ld llvm-nm -D example.so: 0000000000000000 A LIBTEST_1@@LIBTEST_1 w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable w __cxa_finalize w __gmon_start__ LLD however doesn't generate the "LIBTEST_1@@LIBTEST_1" symbol. Debian's dpkg-gensymbols tracks such symbols. When building Debian packages with symbols files and whose upstream uses version scripts files using LLD there's typically a bunch of symbols that have gone "missing" (i.e. an ABI break). I eagerly filed Debian bug #992796 and prototyped a "fix" for dpkg . However, technically functions may be added to older version nodes in newer versions. Say, in an alternative independently-versioned binary-compatible implementation of a library. In such cases it would make sense and be useful to track the first appearance of a version node. Would it be right to rather consider this a deficiency in LLD? I trawled the binutils source code, but didn't manage to find the piece of code responsible for generating those, nevermind the reasoning or history. I didn't read too thoroughly, but your symbol versioning blog post doesn't seem to touch on this specific behaviour either. Thanks in advance, Raul --------------59846B14C6D2BF4C52AF09D8 Content-Type: message/rfc822; name="Re Version script node symbols ld vs lld - Fangrui Song (i@maskray.me) - 2021-08-29 0454.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="Re Version script node symbols ld vs lld - Fangrui Song (i@"; filename*1="maskray.me) - 2021-08-29 0454.eml" Return-Path: Delivered-To: raul@tambre.ee Received: from mail.tambre.ee by tambre with LMTP id 6X4HJMLoKmGqNwQAI96AqQ (envelope-from ) for ; Sun, 29 Aug 2021 04:54:10 +0300 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.tambre.ee (Postfix) with ESMTPS id 8552D7F87F for ; Sun, 29 Aug 2021 04:54:10 +0300 (EEST) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.tambre.ee 8552D7F87F Received: by mail-pl1-f180.google.com with SMTP id m17so6445478plc.6 for ; Sat, 28 Aug 2021 18:54:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Q5TIYOvM7IICBleEeI/LOf+M8aa6jhlhlIHewib7tQQ=; b=W973vHkRFNaakY7Hg2NyZsUqt9qTovDhnN2HAKKYYnW4QqW7tZ1fj6ugCSJtVFrkVQ d3a9iDdWEuu1zSzDomjx+zEfNyC0oHXyJnoLkW21sbS2x85ZU2/+WAoP8NlHbofAM8yf HqdsloVxt5FmkN1rGeayKA3e/mcL9WcRNpbNV8rwXstGvOmq9xnNKggzZ39dMS4jrS01 QkpcRGpA+O21Mi/E/h/yvLJd+1PmR//u/biueDbQvv4PpQA0ZzOvy9vkzg1cCWaOOeeq stDx7iRMGNnXUrGnxPHgHAS7oZFv/jnQbiC4bFolMUklBdLa8hd4/IHu8HkIcUiZyBOL hD1w== X-Gm-Message-State: AOAM532YSpp9TuR3nJm9EaReEEs23mZ0kceCed+1OOQ8RYAyt2GfrbEO qCdwPmuKns2SkEAMbt+Q3Y1d8u/L3ug= X-Google-Smtp-Source: ABdhPJxLoicyBlDzW9SB2ET+SYGMmi+w+kV2M8YRxPEph0xl/j+TYIBKsZ0zijS05WWnMbyvNAZW2Q== X-Received: by 2002:a17:90a:680a:: with SMTP id p10mr18884800pjj.179.1630202048342; Sat, 28 Aug 2021 18:54:08 -0700 (PDT) Received: from localhost ([2601:647:4b01:ae80:8135:205:a04:69b9]) by smtp.gmail.com with ESMTPSA id c124sm10836078pfc.216.2021.08.28.18.54.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Aug 2021 18:54:07 -0700 (PDT) Date: Sat, 28 Aug 2021 18:54:06 -0700 From: Fangrui Song To: Raul Tambre Subject: Re: Version script node symbols ld vs lld Message-ID: <20210829015406.odftzbmcbqhfvu2t@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: On 2021-08-27, Raul Tambre wrote: >Given a version script "test.sym": > > LIBTEST_1 > { > }; > >clang -shared /dev/null -o example.so -Wl,--version-script=test.sym -fuse-ld=ld >llvm-nm -D example.so: > > 0000000000000000 A LIBTEST_1@@LIBTEST_1 > w _ITM_deregisterTMCloneTable > w _ITM_registerTMCloneTable > w __cxa_finalize > w __gmon_start__ > >LLD however doesn't generate the "LIBTEST_1@@LIBTEST_1" symbol. Right. The SHN_ABS symbol associated to a version node seems completely useless to me. Its order is pretty arbitrary (readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6: GLIBC_2.3.3). >Debian's dpkg-gensymbols tracks such symbols. When building Debian >packages with symbols files and whose upstream uses version scripts >files using LLD there's typically a bunch of symbols that have gone >"missing" (i.e. an ABI break). >I eagerly filed Debian bug #992796 > and >prototyped a "fix" for dpkg . Thanks for the fix:) The SHN_ABS symbols don't convey extra information. In the output of readelf -V, .gnu.version_d lists the used versions. >However, technically functions may be added to older version nodes in >newer versions. Say, in an alternative independently-versioned >binary-compatible implementation of a library. >In such cases it would make sense and be useful to track the first >appearance of a version node. [This appears to be unrelated to the SHN_ABS symbols.] Technically the software can, but that won't make much sense. A new version, if introduces new dynamic symbols, should attach them to the current version node, rather an arbitrary old version node to confuse users. >Would it be right to rather consider this a deficiency in LLD? I think no. LLD just doesn't support the SHN_ABS symbols which are rather useless. >I trawled the binutils source code, but didn't manage to find the >piece of code responsible for generating those, nevermind the >reasoning or history. Wow! I tried a bit but gave up. GNU ld's version symbol code is rather messy... gold has more compatibility problems than LLD... >I didn't read too thoroughly, but your symbol versioning blog post > >doesn't seem to touch on this specific behaviour either. Thanks! I'll mention such SHN_ABS symbols. >Thanks in advance, >Raul --------------59846B14C6D2BF4C52AF09D8--