From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17148 invoked by alias); 10 Jun 2019 20:27:33 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 17137 invoked by uid 89); 10 Jun 2019 20:27:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=YES X-HELO: mail-ua1-f42.google.com Received: from mail-ua1-f42.google.com (HELO mail-ua1-f42.google.com) (209.85.222.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 10 Jun 2019 20:27:31 +0000 Received: by mail-ua1-f42.google.com with SMTP id z13so3385007uaa.4 for ; Mon, 10 Jun 2019 13:27:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:references:from:openpgp:autocrypt:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=8VKNFZw3FQjitioYeNb/DRuhQvyo285NvChkGX8VzGQ=; b=UjAXvmIv/ymtHCHSLkek7/PLpeLnd+WjTI2hN7CSansrqJUTWtuUhEIoqx+u9nSnCc AFBUQ8/X09Pi+uy0pOrC04+as0TB3PNgkKo2QD4R8J0UlaC/TqXubWG0KyPZTltVFSnB RW3ABYPECt5Jk4GeeTgTreLMn3DvLFwYIsSWMMg4zTfIkMlWFixwyUAJ8fCLvPLXLkMq GBm7yHznmy0+GSxprlbLVSEADeckRhxbBr45ggVxWhhl8bp1wt/7MdpZQ4wCv0PGP/L1 2eoXEABf+0Gtqk7qor9HxgjNDIwW2Evwpb9OnZjibd+cmoFfYtKnDCrCEFoQ+pwQmUrQ lC6w== Return-Path: Received: from [192.168.1.132] ([177.194.125.152]) by smtp.googlemail.com with ESMTPSA id o10sm3861080vsd.9.2019.06.10.13.27.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Jun 2019 13:27:28 -0700 (PDT) To: Nat! , libc-help@sourceware.org References: <87blzypg5j.fsf@mid.deneb.enyo.de> <0a7c2435-43f8-8dfb-83ab-22ceff7ca51c@mulle-kybernetik.com> <9497a5c2-0dc8-18fe-6120-deb551f7ddd8@mulle-kybernetik.com> From: Adhemerval Zanella Openpgp: preference=signencrypt Subject: Re: Problem with atexit and _dl_fini Message-ID: Date: Mon, 10 Jun 2019 20:27:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <9497a5c2-0dc8-18fe-6120-deb551f7ddd8@mulle-kybernetik.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-06/txt/msg00008.txt.bz2 On 10/06/2019 10:07, Nat! wrote: > > On 10.06.19 13:48, Adhemerval Zanella wrote: >> >> On 09/06/2019 17:59, Nat! wrote: >>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;) >>> >>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux >>> >>> >>> ``` >>> >>> # Shows another breakage involving `atexit` on linux >>> >>> Here the `atexit` callback is invoked mistakenly multiple times. >> This 'example' does not really show the issue because ldd script issues >> the loader multiple times, see below. You can check exactly what ldd is >> doing by calling with sh -x. > > I agree it doesn't show the same issue, but it shows that something else is going very wrong. :) Or are you happy, that atexit is called multiple times ? Who's calling exit here anyway ? Check out the debugger output too (see updated README.md) The ldd is not a program, but rather a shell script that issues the target binary along with system loader multiple times. What you are seeing is not atexit called multiple times, but rather how the script is called. When you set LD_PRELOAD *before* issuing ldd you will make the shell binary to also pre-load the library. I instrumented the binary to also print the output command line from the issue binary (get either by program_invocation_name or /proc/self/cmdline): $ LD_PRELOAD=./libld-preload.so ./ldd ./main /bin/bash: load /bin/bash: unload /bin/bash: unload /bin/bash: unload linux-vdso.so.1 (0x00007ffd445ef000) ./libld-preload.so (0x00007fa866ac5000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8664b5000) /lib64/ld-linux-x86-64.so.2 (0x00007fa8668a6000) /bin/bash: unload /bin/bash: unload The program is not load since although ldd does call the loader, it calls in a trace mode that does not actually load any shared library. The first 'load' is issued by library when bash is first executed and later multiple 'unload' is due bash forks and then exits multiple times. > > >> >> I will try to use your instruction to run on docker to see what exactly >> is happening in your environment. > > That's not necessary anymore. I managed to make it reproducible in a much simpler form just now. > > The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) : > >     https://github.com/mulle-nat/ld-so-breakage Thanks, it is way more useful. I now I understand what is happening and IMHO this behaviour is a required because on glibc we set that atexit/on_exit handlers are ran when deregister a library (as for dlclose). Using the example in your testcase: --- USE_A=YES ./build/main_adbc -- install atexit_b -- install atexit_a -- run atexit_a -- run atexit_b --- The behaviour of atexit handlers being called in wrong order is they are being registered with '__cxa_atexit' which in turn sets its internal type as 'ef_cxa'. Since _dl_init is registered last (after all shared library loading and constructors calls), it will call _dl_fini which in turn will call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler). The '__cxa_finalize' will then all 'ef_cxa' function for the module passed by __do_global_dtors_aux and set the function as 'ef_free'. It will then prevent '__run_exit_handlers' to run the handlers more than once. So the question you might ask is why not just to use 'ef_at' for atexit handlers, make them no to run on __cxa_finalize and thus make your example run as you expect? The issue is glibc does not know whether your library would be dlopened or not. If you set an atfork handler by a constructor that references to a function inside the shared library and if do *not* set to *not* be ran later you might, a case of dlopen -> constructor -> dlclose -> exit will try to execute and invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c. So the question is why exactly glibc defined that atexit should be called by dlclose. I understand that __cxa_finalize / destructor make sense to make it possible the shared library to free allocated resources, but I can't really get why there a need to extend it to 'atexit' as well. > > > The "another datapoint" project shows how constructor/destructor don't pair up: > >     https://github.com/mulle-nat/atexit-breakage-linux > > > And as a random bonus this project indicates to me that LD_PRELOAD doesn't do what its supposed to either: > >     https://github.com/mulle-nat/LD_PRELOAD-breakage-linux > > > In total I think the state of affairs is pretty dismal. I didn't expect that much basic stuff not working on linux. With hindsight, I probably have wasted _weeks_ on these problems. > > I still maintain that the concept to let `atexit` callbacks not run by `exit` is broken. An `atexit` callback is not the same as an `__attribute__((destructor))__`. > > > Ciao > >    Nat! > >