From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf31.google.com (mail-qv1-xf31.google.com [IPv6:2607:f8b0:4864:20::f31]) by sourceware.org (Postfix) with ESMTPS id 13A98385700A for ; Thu, 15 Apr 2021 17:44:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 13A98385700A Received: by mail-qv1-xf31.google.com with SMTP id i9so11923778qvo.3 for ; Thu, 15 Apr 2021 10:44:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2usFXspUFnZocWFqvbJcMhLBIFpaI3zJidooXK/nGNc=; b=F0hytnzzNa/WUX7wmAHuYGZkrTBwcBG+KEvNGaUl8xcsy4498Gh7ZrnprywzbI4TWz cw+v/uBPVVkqQS6r1syvRFn8mZEwGg8HIfZArg9EDvszXPL3ej68xnRazdgl/MHosJza M2OuTb4nEg25xe20SO5jklXLS84s67zvd2F70WNp5VurQDT7tNXxaF6Z+v9ta5Hojn+/ CtsO2dmy0hE+FVa9OKYTtXzPsUiORhvN2Vw60+PX3dNEb7jgdqSn/oLiABtxOnYqlWpw bb3/yqyddmOWPRtTKgCSHIu8SxDgvNyV+io9fsgtprYLnZUVdVY0RsRhEZgbEZcSborO cfPg== X-Gm-Message-State: AOAM5325Nxl1TtKsru9sIikxfLx40jwHbvTSMGgOgGVF0HZ1dDMmldIx 650yModT8xliINwAFME6kdlKYg== X-Google-Smtp-Source: ABdhPJwj0UvmkEOEsfcYTSNfn0j6AuwH3LBZZ5J1QOl66adndOtiytpcZXZSFPle5llpx4TEal54Wg== X-Received: by 2002:a05:6214:d65:: with SMTP id 5mr4389904qvs.56.1618508658092; Thu, 15 Apr 2021 10:44:18 -0700 (PDT) Received: from [192.168.1.132] ([177.194.41.149]) by smtp.gmail.com with ESMTPSA id u21sm2260203qtq.11.2021.04.15.10.44.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Apr 2021 10:44:17 -0700 (PDT) Subject: Re: [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] To: libc-alpha@sourceware.org, Szabolcs Nagy References: From: Adhemerval Zanella Message-ID: Date: Thu, 15 Apr 2021 14:44:15 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 17:44:21 -0000 On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote: > DTV setup at thread creation (_dl_allocate_tls_init) is changed > to take the dlopen lock, GL(dl_load_lock). Avoiding data races > here without locks would require design changes: the map that is > accessed for static TLS initialization here may be concurrently > freed by dlclose. That use after free may be solved by only > locking around static TLS setup or by ensuring dlclose does not > free modules with static TLS, however currently every link map > with TLS has to be accessed at least to see if it needs static > TLS. And even if that's solved, still a lot of atomics would be > needed to synchronize DTV related globals without a lock. So fix > both bug 19329 and bug 27111 with a lock that prevents DTV setup > running concurrently with dlopen or dlclose. It sounds reasonable and I think the extra locks should be ok performance-wise (concurrent dlopen/dlclose should not be an hotspot operation in majority of the cases). > > _dl_update_slotinfo at TLS access still does not use any locks > so CONCURRENCY NOTES are added to explain the synchronization. > The early exit from the slotinfo walk when max_modid is reached > is not strictly necessary, but does not hurt either. > > An incorrect acquire load was removed from _dl_resize_dtv: it > did not synchronize with any release store or fence and > synchronization is now handled separately at thread creation > and TLS access time. Reading the fix that add the acquire load (d8dd00805b8) it seems the idea was to use a sequentially-consistent ordering, since previously GL(dl_tls_max_dtv_idx) was not accessed using atomic operations. > > There are still a number of racy read accesses to globals that > will be changed to relaxed MO atomics in a followup patch. This > should not introduce regressions compared to existing behaviour > and avoid cluttering the main part of the fix. > > Not all TLS access related data races got fixed here: there are > additional races at lazy tlsdesc relocations see bug 27137. LGTM, thanks. There is only a misspelled word and in minor style issue in the comments below. Reviewed-by: Adhemerval Zanella > --- > elf/dl-tls.c | 63 +++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 47 insertions(+), 16 deletions(-) > > diff --git a/elf/dl-tls.c b/elf/dl-tls.c > index f8b32b3ecb..33c06782b1 100644 > --- a/elf/dl-tls.c > +++ b/elf/dl-tls.c > @@ -471,14 +471,11 @@ extern dtv_t _dl_static_dtv[]; > #endif > > static dtv_t * > -_dl_resize_dtv (dtv_t *dtv) > +_dl_resize_dtv (dtv_t *dtv, size_t max_modid) > { > /* Resize the dtv. */ > dtv_t *newp; > - /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by > - other threads concurrently. */ > - size_t newsize > - = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS; > + size_t newsize = max_modid + DTV_SURPLUS; > size_t oldsize = dtv[-1].counter; > > if (dtv == GL(dl_initial_dtv)) Ok. > @@ -524,11 +521,14 @@ _dl_allocate_tls_init (void *result) > size_t total = 0; > size_t maxgen = 0; > > + /* Protects global dynamic TLS related state. */ > + __rtld_lock_lock_recursive (GL(dl_load_lock)); > + > /* Check if the current dtv is big enough. */ > if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) > { > /* Resize the dtv. */ > - dtv = _dl_resize_dtv (dtv); > + dtv = _dl_resize_dtv (dtv, GL(dl_tls_max_dtv_idx)); > > /* Install this new dtv in the thread data structures. */ > INSTALL_DTV (result, &dtv[-1]); Ok. > @@ -596,6 +596,7 @@ _dl_allocate_tls_init (void *result) > listp = listp->next; > assert (listp != NULL); > } > + __rtld_lock_unlock_recursive (GL(dl_load_lock)); > > /* The DTV version is up-to-date now. */ > dtv[0].counter = maxgen; Ok. > @@ -730,12 +731,29 @@ _dl_update_slotinfo (unsigned long int req_modid) > > if (dtv[0].counter < listp->slotinfo[idx].gen) > { > - /* The generation counter for the slot is higher than what the > - current dtv implements. We have to update the whole dtv but > - only those entries with a generation counter <= the one for > - the entry we need. */ > + /* CONCURRENCY NOTES: > + > + Here the dtv needs to be updated to new_gen generation count. > + > + This code may be called during TLS access when GL(dl_load_lock) > + is not held. In that case the user code has to synchrnize with s/synchrnize/synchronize > + dlopen and dlclose calls of relevant modules. A module m is > + relevant if the generation of m <= new_gen and dlclose of m is > + synchronized: a memory access here happens after the dlopen and > + before the dlclose of relevant modules. The dtv entries for > + relevant modules need to be updated, other entries can be > + arbitrary. > + > + This e.g. means that the first part of the slotinfo list can be > + accessed race free, but the tail may be concurrently extended. > + Similarly relevant slotinfo entries can be read race free, but > + other entries are racy. However updating a non-relevant dtv > + entry does not affect correctness. For a relevant module m, > + max_modid >= modid of m. */ > size_t new_gen = listp->slotinfo[idx].gen; > size_t total = 0; > + size_t max_modid = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); > + assert (max_modid >= req_modid); > > /* We have to look through the entire dtv slotinfo list. */ > listp = GL(dl_tls_dtv_slotinfo_list); Ok. > @@ -745,12 +763,14 @@ _dl_update_slotinfo (unsigned long int req_modid) > { > size_t modid = total + cnt; > > + /* Later entries are not relevant. */ > + if (modid > max_modid) > + break; > + > size_t gen = listp->slotinfo[cnt].gen; > > if (gen > new_gen) > - /* This is a slot for a generation younger than the > - one we are handling now. It might be incompletely > - set up so ignore it. */ > + /* Not relevant. */ > continue; > > /* If the entry is older than the current dtv layout we Ok. > @@ -767,7 +787,7 @@ _dl_update_slotinfo (unsigned long int req_modid) > continue; > > /* Resize the dtv. */ > - dtv = _dl_resize_dtv (dtv); > + dtv = _dl_resize_dtv (dtv, max_modid); > > assert (modid <= dtv[-1].counter); > Ok. > @@ -789,8 +809,17 @@ _dl_update_slotinfo (unsigned long int req_modid) > } > > total += listp->len; > + if (total > max_modid) > + break; > + > + /* Synchronize with _dl_add_to_slotinfo. Ideally this would Double space after period. > + be consume MO since we only need to order the accesses to > + the next node after the read of the address and on most > + hardware (other than alpha) a normal load would do that > + because of the address dependency. */ > + listp = atomic_load_acquire (&listp->next); > } > - while ((listp = listp->next) != NULL); > + while (listp != NULL); > > /* This will be the new maximum generation counter. */ > dtv[0].counter = new_gen; Ok. > @@ -982,7 +1011,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add) > the first slot. */ > assert (idx == 0); > > - listp = prevp->next = (struct dtv_slotinfo_list *) > + listp = (struct dtv_slotinfo_list *) > malloc (sizeof (struct dtv_slotinfo_list) > + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); > if (listp == NULL) > @@ -996,6 +1025,8 @@ cannot create TLS data structures")); > listp->next = NULL; > memset (listp->slotinfo, '\0', > TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); > + /* Synchronize with _dl_update_slotinfo. */ > + atomic_store_release (&prevp->next, listp); > } > > /* Add the information into the slotinfo data structure. */ > Ok.