From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) by sourceware.org (Postfix) with ESMTPS id 6D81A3858298 for ; Tue, 29 Aug 2023 19:02:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D81A3858298 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-1c8e9d75ce1so2749082fac.3 for ; Tue, 29 Aug 2023 12:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1693335778; x=1693940578; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=De5RxjzHFCWJzodOHsHE87agUUUUMKN3MKa6cT0gvdE=; b=DlTT+k5I9ehKWBm3w/sz3FU1hMQpWAdN7vzgv5gqBzthBl2Ytcs+dh23HQ6du7p5/0 vGrk2zY89J/ABfSgsA6EAYEIFX+5wnUwnuemUeSwSJ2s9JNOn3r4qjGiYFxtgbdTRxvJ F5bL3t11dfPIkkceaX8j8whhn3rMPJxyNiOqwJHDDNp9P9aTE2tK/u82lhZn96+xF3Zz QeeqHeUvI4LCcBS/KV9BsqaGXewefhGgLy4UZWe3BlKBWYIlCJsG0In70rluzxIvmRDI S0paJ/yHLd8iAvXYWG7a3GZ7HNHKBxMML386WdAjR5VRH/NLkx+R/6UxlwhyqYM6SmFP 52EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693335778; x=1693940578; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=De5RxjzHFCWJzodOHsHE87agUUUUMKN3MKa6cT0gvdE=; b=C2PjEIc/gylUZz+vZkWEKiZtq6zU/qUKDAo3YCuXeXVTBCdyVJNL/Z/XBODFYjXps6 bwAz3NqdSVnaepFqc+6ByVDz8h0dOKHBWhScOsN+sdftqVFdUCAm5THt3ud3XX/4FXEr eJaOjesK35I4ljTLHM3IOWBi3Kx8RM9Mc4JXKKScLA6jyyUYbBFPDtfeTVBZ/TqJd5xV 0DHFJq2CLUqurZW0+ZZvPihVNxgJMqxzBSDkX/A0JisFOoBFb1pFupHqBABDTbe5po8H b4GWOUFjVw8Ax/aVeY78qNP8UixHBmiJQfTeHg/90HkfnDFtnAGoODtCYJx54Eb7nq15 RDlA== X-Gm-Message-State: AOJu0YwIXJC421pnc9AxGHivPEqGQkeUb+kUwsaslBM25Up1r6/p5BT1 URnH6pMQh6zyXuytpmVKhtcTHw== X-Google-Smtp-Source: AGHT+IEbsOE+4QxKMke8jv0yy4CHjUEBdyFslUg1l89esej6INr0CWlOtLsN80wzsswAUC6Dnphy9A== X-Received: by 2002:a05:6870:e751:b0:19f:6fae:d5fc with SMTP id t17-20020a056870e75100b0019f6faed5fcmr98552oak.33.1693335778664; Tue, 29 Aug 2023 12:02:58 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:578c:7535:3d41:c7e8:c5ab? ([2804:1b3:a7c3:578c:7535:3d41:c7e8:c5ab]) by smtp.gmail.com with ESMTPSA id q13-20020a056870828d00b001cd14c60b35sm5033338oae.5.2023.08.29.12.02.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 29 Aug 2023 12:02:58 -0700 (PDT) Message-ID: Date: Tue, 29 Aug 2023 16:02:55 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v3] elf: Fix slow tls access after dlopen [BZ #19924] Content-Language: en-US To: Szabolcs Nagy , libc-alpha@sourceware.org, Paul Zimmermann References: <20230106185250.2936935-1-szabolcs.nagy@arm.com> <087abdd6-0db7-102c-6fdf-8a6c968f8371@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 29/08/23 11:09, Szabolcs Nagy wrote: > The 08/23/2023 16:06, Adhemerval Zanella Netto wrote: >> The fix sound reasonable, and I did not see any regression on some different >> architectures (aarch64, powerpc, sparc, x86). However I am not sure how >> compreensible our own regression tests are for this issue, since afaik >> there is no concurrent tests that stress concurrent dlopen module and >> TLS access. It also does fix the performance issue from bug report. >> >> Below some minor comments. > ... >>> --- a/elf/dl-reloc.c >>> +++ b/elf/dl-reloc.c >>> @@ -114,9 +114,7 @@ _dl_try_allocate_static_tls (struct link_map *map, bool optional) >>> #ifdef SHARED >>> if (__builtin_expect (THREAD_DTV()[0].counter != GL(dl_tls_generation), >>> 0)) >>> - /* Update the slot information data for at least the generation of >>> - the DSO we are allocating data for. */ >>> - (void) _dl_update_slotinfo (map->l_tls_modid); >>> + (void) _dl_update_slotinfo (map->l_tls_modid, GL(dl_tls_generation)); >> >> Shouldn't it use atomic_load_relaxed on GL(dl_tls_generation) for consistency? > > _dl_try_allocate_static_tls is called during tlsdesc reloc > processing. > > relocation processing always runs with the GL(dl_load_lock) > held and then no concurrent write to the gen count is possible. > > loads outside the lock have to use atomics to sync with the > atomic store under the lock, loads under the lock can be > normal loads. Fair enough, maybe add a small comment with this very explanation. > >>> struct link_map * >>> -_dl_update_slotinfo (unsigned long int req_modid) >>> +_dl_update_slotinfo (unsigned long int req_modid, size_t new_gen) >>> { >>> struct link_map *the_map = NULL; >>> dtv_t *dtv = THREAD_DTV (); >>> >>> - /* The global dl_tls_dtv_slotinfo array contains for each module >>> - index the generation counter current when the entry was created. >>> + /* CONCURRENCY NOTES: >>> + >>> + The global dl_tls_dtv_slotinfo array contains for each module >> >> Maybe use the correct name, dl_tls_dtv_slotinfo_list, so it is easier >> to reference on the code. > > ok. (it was this way in the orig comment, but using the full > identifier is probably better) > >> >>> + index the generation counter current when that entry was updated. >>> This array never shrinks so that all module indices which were >> >> I think it should 'that' here (it specifies the module indices and there >> is no comma). > > what do you mean? [...] so that all module indices which were [...]. But I just realized this is not from patch, so fell free to ignore it. > >> >>> - valid at some time can be used to access it. Before the first >>> - use of a new module index in this function the array was extended >>> - appropriately. Access also does not have to be guarded against >>> - modifications of the array. It is assumed that pointer-size >>> - values can be read atomically even in SMP environments. It is >>> - possible that other threads at the same time dynamically load >>> - code and therefore add to the slotinfo list. This is a problem >>> - since we must not pick up any information about incomplete work. >>> - The solution to this is to ignore all dtv slots which were >>> - created after the one we are currently interested. We know that >>> - dynamic loading for this module is completed and this is the last >>> - load operation we know finished. */ >>> - unsigned long int idx = req_modid; >>> + valid at some time can be used to access it. Concurrent loading >>> + and unloading of modules can update slotinfo entries or extend >>> + the array. The updates happen under the GL(dl_load_tls_lock) and >>> + finish with the release store of the generation counter to >>> + GL(dl_tls_generation) which is synchronized with the load of >>> + new_gen in the caller. So updates up to new_gen are synchronized >>> + but updates for later generations may not be. >>> + >>> + Here we update the thread dtv from old_gen (== dtv[0].counter) to >>> + new_gen generation. For this each dtv[i] entry is either set to >> >> Maybe a comma after 'For this'. > > ok > >>> + an unallocated state (set), or left unmodified (nop). Where (set) >>> + may resize the dtv first if modid i >= dtv[-1].counter. The rules >>> + for the decision between (set) and (nop) are >>> + >>> + (1) If slotinfo entry i is concurrently updated then either (set) >>> + or (nop) is valid: TLS access cannot use dtv[i] unless it is >>> + synchronized with a generation > new_gen. >>> + >>> + Otherwise if the generation of slotinfo entry i is gen and the >> >> Maybe a comma after 'Otherwise'. > > ok