From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936]) by sourceware.org (Postfix) with ESMTPS id AD09C3944424 for ; Wed, 12 Jan 2022 20:12:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AD09C3944424 Received: by mail-ua1-x936.google.com with SMTP id i5so7017483uaq.10 for ; Wed, 12 Jan 2022 12:12:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=yG1ygQJgXDH8Sjhh98Rc8hkclw/HGDbGdJSQ8E64Bxg=; b=HLefx5vKXj9ySlqOwFuigIzd5a5v6+WNLqceiFItM9uoluXE7mSngzEf9WsTysrmUc 9CiKObVV4l2oPpyuvHv1ooTPLBZ1sob/y/SIkEn7izctskJGT1DsXBuwL8MtiMdfcDDY vfgFmMoCkQEWIFHkQZE9g8W1R0ysGhB+4hBg29vw5yGGQlN39CXvsVF2sNAlQ7qMFYEm Y0twKgp2RZ9nLlFAA7bHiGQmIdb5aGcgvVO+Df5dhpnpuyoKlEBWtFySTrg4AK+qcYKK lMpj7jhp53av66wDuZwPKI/ZzZUaIlFQfjsyCSodd2diVb+nYnaJTy+3sQysvEH1wulf vVqQ== X-Gm-Message-State: AOAM530r+C0XE/ot+T306KYqPxyyvShq1qt6B8AuXzX47m3csuMudfBP B4IiIabKktaACn2Sxii0HNN7lg== X-Google-Smtp-Source: ABdhPJxvaLzMnJuLRrltSYjnET9BIzD4kF/9Y0xHM74BoYer70EVlE9tnZXxZDGBXvmx7vB7WxkgCg== X-Received: by 2002:ab0:b8e:: with SMTP id c14mr918772uak.101.1642018357098; Wed, 12 Jan 2022 12:12:37 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:9d3:c0ae:6d7d:7ec1:f457? ([2804:431:c7ca:9d3:c0ae:6d7d:7ec1:f457]) by smtp.gmail.com with ESMTPSA id l202sm434782vkl.40.2022.01.12.12.12.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Jan 2022 12:12:36 -0800 (PST) Message-ID: <3d6799fe-2b9d-4780-254d-dbd0799483ae@linaro.org> Date: Wed, 12 Jan 2022 17:12:34 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH] elf/dl-deps.c: Make _dl_build_local_scope breadth first Content-Language: en-US To: Mark Hatle , Khem Raj , Libc-alpha , Carlos O'Donell References: <20211209235354.1558088-1-raj.khem@gmail.com> <018ad7e3-c020-3507-94be-ccb21c90899f@linaro.org> <62d5866a-ea76-a56a-7063-dada34b3fe66@kernel.crashing.org> From: Adhemerval Zanella In-Reply-To: <62d5866a-ea76-a56a-7063-dada34b3fe66@kernel.crashing.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jan 2022 20:12:39 -0000 On 12/01/2022 16:08, Mark Hatle wrote: > > > On 1/11/22 1:26 PM, Adhemerval Zanella wrote: >> >> >> On 09/12/2021 20:53, Khem Raj via Libc-alpha wrote: >>> From: Mark Hatle >>> >>> According to the ELF specification: >>> >>> When resolving symbolic references, the dynamic linker examines the symbol >>> tables with a breadth-first search. >>> >>> This function was using a depth first search.  By doing so the conflict >>> resolution reported to the prelinker (when LD_TRACE_PRELINKING=1 is set) >>> was incorrect.  This caused problems when their were various circular >>> dependencies between libraries.  The problem usually manifested itself by >>> the wrong IFUNC being executed. >>> >>> Similar issue has been reported here [1] >>> >>> [BZ# 20488] >>> >>> [1] https://sourceware.org/legacy-ml/libc-alpha/2016-05/msg00034.html >>> >>> Signed-off-by: Mark Hatle >>> Signed-off-by: Khem Raj >> >> I am trying to understand why it this only an issue for LD_TRACE_PRELINKING=1, >> do we have a testcase that stress it for a default usercase? > > The underlying issue here is that resolution is happening depth first and not breadth first.  According to the ELF spec, all resolution should be breadth-first. > > > As noted in item in above, the prelinker just happens to be a way to actually show that the behavior is incorrect.  (There even appears to be a related defect with a reproducer.) > > > When taking the values from LD_TRACE_PRELINKING=1, various addresses and conflict resolutions are specified.  When you compare what is reported, vs what happens, vs what the spec says they don't align as they should. That was pretty clear from bug report, what I am trying to understand is why it seems to not being reported before in default non-prelinked usage. Also, the patch only changes the semantic to prelinked binaries which is at least troublesome: this semantic difference at symbol resolution is a potential source of issues where user see different behavior depending whether prelinked was used or not. I am also worried because making such change might potentially trigger some hard to diagnostic breakage. > > >> When you say the 'wrong IFUNC being executed' what exactly you mean here? >> Could we use a testcase based on this? > > > The prelinker (and possibly just in general), the IFUNC address used is the one from the wrong library scope.  I personally have never tried to reproduce this outside of the prelinking use-case, but based on the referenced report and the code at the time of the change, it is believed this could happen (but rarely) without a prelinked system in a very complex case with multiple libraries providing the same functions. > > The main issue (my memory is sketchy sorry this might be wrong) is that if one or more functions is an IFUNC and one or more functions is NOT, then the you can get into a situation with a conflict of the wrong function being called using the wrong mechanism. > > Definitely in the prelink case, the resolver gave the address of a regular function which was then placed into an IFUNC (or maybe it was the other way around) triggering the runtime segfault. I asked about ifunc because it should not really be dependent where ifunc is used or not, what might be happening is this issue triggers the long standing ifunc ordering issue more often. To summarize, although this change only affects prelinked binaries I think it would be better to change for default semantic as well. We can make it a tunable for the transition, like new DSO sorting algorithm with glibc.rtld.dynamic_sort, and evaluate on Fedora rawhide, and then hit the switch to make it the default. We will also need regression testcases, although not sure what kind of coverage we will need to provide (our own testsuite currently does not trigger any issue though). > > --Mark > >>> --- >>>   elf/dl-deps.c | 14 ++++++++++---- >>>   1 file changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/elf/dl-deps.c b/elf/dl-deps.c >>> index 237d9636c5..e15f7f83d8 100644 >>> --- a/elf/dl-deps.c >>> +++ b/elf/dl-deps.c >>> @@ -73,13 +73,19 @@ _dl_build_local_scope (struct link_map **list, struct link_map *map) >>>   { >>>     struct link_map **p = list; >>>     struct link_map **q; >>> +  struct link_map **r; >>>       *p++ = map; >>>     map->l_reserved = 1; >>> -  if (map->l_initfini) >>> -    for (q = map->l_initfini + 1; *q; ++q) >>> -      if (! (*q)->l_reserved) >>> -    p += _dl_build_local_scope (p, *q); >>> + >>> +  for (r = list; r < p; ++r) >>> +    if ((*r)->l_initfini) >>> +      for (q = (*r)->l_initfini + 1; *q; ++q) >>> +    if (! (*q)->l_reserved) >>> +      { >>> +        *p++ = *q; >>> +        (*q)->l_reserved = 1; >>> +      } >>>     return p - list; >>>   } >>>