From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98415 invoked by alias); 11 Jun 2019 18:53:08 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 98405 invoked by uid 89); 11 Jun 2019 18:53:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,SPF_PASS autolearn=ham version=3.3.1 spammy=pet, damage X-HELO: muller.mulle-kybernetik.com Received: from muller.mulle-kybernetik.com (HELO muller.mulle-kybernetik.com) (78.46.34.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Jun 2019 18:53:06 +0000 Received: (qmail 69920 invoked from network); 11 Jun 2019 20:53:03 +0200 Received: from unknown (HELO ?192.168.2.34?) (nat@78.46.34.175) by mail.mulle-kybernetik.com with ESMTPS (ECDHE-RSA-AES128-SHA encrypted); 11 Jun 2019 20:53:03 +0200 Subject: Re: Problem with atexit and _dl_fini To: libc-help@sourceware.org References: <87blzypg5j.fsf@mid.deneb.enyo.de> <0a7c2435-43f8-8dfb-83ab-22ceff7ca51c@mulle-kybernetik.com> <9497a5c2-0dc8-18fe-6120-deb551f7ddd8@mulle-kybernetik.com> From: Nat! Cc: Adhemerval Zanella Message-ID: Date: Tue, 11 Jun 2019 18:53:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2019-06/txt/msg00010.txt.bz2 On 10.06.19 22:27, Adhemerval Zanella wrote: > > The program is not load since although ldd does call the loader, it calls > in a trace mode that does not actually load any shared library. The first > 'load' is issued by library when bash is first executed and later multiple > 'unload' is due bash forks and then exits multiple times. I can understand this. Possibly the same is happening when I am running this in a debugger. > >> >>> I will try to use your instruction to run on docker to see what exactly >>> is happening in your environment. >> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now. >> >> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) : >> >>     https://github.com/mulle-nat/ld-so-breakage > Thanks, it is way more useful. I now I understand what is happening and > IMHO this behaviour is a required because on glibc we set that atexit/on_exit > handlers are ran when deregister a library (as for dlclose). > > Using the example in your testcase: > > --- > USE_A=YES ./build/main_adbc > -- install atexit_b > -- install atexit_a > -- run atexit_a > -- run atexit_b > --- > > The behaviour of atexit handlers being called in wrong order is they are > being registered with '__cxa_atexit' which in turn sets its internal type > as 'ef_cxa'. Since _dl_init is registered last (after all shared library > loading and constructors calls), it will call _dl_fini which in turn will > call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler). > > The '__cxa_finalize' will then all 'ef_cxa' function for the module passed > by __do_global_dtors_aux and set the function as 'ef_free'. It will then > prevent '__run_exit_handlers' to run the handlers more than once. > > So the question you might ask is why not just to use 'ef_at' for atexit > handlers, make them no to run on __cxa_finalize and thus make your example > run as you expect? The issue is glibc does not know whether your library > would be dlopened or not. > > If you set an atfork handler by a constructor that references to a function > inside the shared library and if do *not* set to *not* be ran later you might, > a case of dlopen -> constructor -> dlclose -> exit will try to execute and > invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c. > > So the question is why exactly glibc defined that atexit should be called > by dlclose. I understand that __cxa_finalize / destructor make sense to > make it possible the shared library to free allocated resources, but I > can't really get why there a need to extend it to 'atexit' as well. > My pet theory is this. After I posted my example, I looked at the ELF spec (http://refspecs.linuxbase.org/elf/elf.pdf) . This writes about how to implement the ELF destructors. ELF specifies to use `atexit` for destructors. The ELF spec at the time of writing does not seem to consider the unloading of a shared object and then everything written there makes sense. When you want to support unloading though, atexit is now the wrong way to do it. But the code was already there and noone wanted to change too much. Alas that's just a theory :) I still think this is a case of glibc trying to be too helpful, but doing more damage (violating good code `atexit`) then good (supporting programmers unwittingly unloading code with `atexit`). Thanks for taking the time to look into this! Ciao   Nat!