From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2045.outbound.protection.outlook.com [40.107.7.45]) by sourceware.org (Postfix) with ESMTPS id DFB8B385840A for ; Fri, 6 Jan 2023 16:37:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DFB8B385840A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OoxnTnxvlhzxT/8BVXnHsDx1SrNwKFb/z74f/sUByT4=; b=yexMLIVaAe7CLU7pnAgISJlwJ/qms9a/8xZnE2sO2UdDzJClDLMyzEUdaFJE9aos3TS6O4BWdj6MIbH9jXHtnICUmXCbUpGDXbiCxdpHKwyT1F+TkbR1eZoMolWLBmRBS/1xMPyHETqQPwcVwxxMlHhQfAP9tLSMIFAPEBvwUAg= Received: from AM0PR10CA0103.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:208:e6::20) by GV1PR08MB8569.eurprd08.prod.outlook.com (2603:10a6:150:81::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.9; Fri, 6 Jan 2023 16:37:18 +0000 Received: from VI1EUR03FT036.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:e6:cafe::1f) by AM0PR10CA0103.outlook.office365.com (2603:10a6:208:e6::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.15 via Frontend Transport; Fri, 6 Jan 2023 16:37:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VI1EUR03FT036.mail.protection.outlook.com (100.127.144.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5944.16 via Frontend Transport; Fri, 6 Jan 2023 16:37:17 +0000 Received: ("Tessian outbound 0d7b2ab0f13d:v132"); Fri, 06 Jan 2023 16:37:17 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bb830f97eec885fe X-CR-MTA-TID: 64aa7808 Received: from 723e90fe3334.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1B8FB414-6857-4D62-B9E3-3953252FAE2A.1; Fri, 06 Jan 2023 16:37:11 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 723e90fe3334.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 06 Jan 2023 16:37:11 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AvDHVulvlSxNR1vVDYFPLHLHuH0XZtpPbkHW2mWpGSleKxq+aaDkvAnSw6dQqefF1hK/GboAmYyYfuzea8fRiaJB+oA7QzsLCw2kgJHXQzzSfAX4tllSmozbQjG16BHjC5cQBb4pF//xrfFkUa4LpuiDPK2UJlle3Eu7ys4I+P5FTfXHPQeFUhPAaXfwQA+AjcpSDfOeE32QfCYR8VCgMPtphnrLTqPeImNDchhvHVX7Beq9IKFU7iSGKnHbiSztAyehLIWaipdToBZDgqON+r6lFNXCHEQ9FBrafaCEPV/QAnmWt6X/CARPIh/+5JD29582gs5pBLYX/mm67Iy3PA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OoxnTnxvlhzxT/8BVXnHsDx1SrNwKFb/z74f/sUByT4=; b=Up+mNWSEnTJ/ezz5N8jZIRlbn2b0wHIoVcEAy174iGSFUQ0mBOm1eFZ97gpl3d9oDdtptwxKhtYUibg/udmAEZerf4QZE1dBo308T/QuTbIF4x8g/2bxcxyxNvUPU2/NoxH88rpuVrWBmG/YkYVjGkvW0AMFwN70uoFZEdIKbEdsgDBqyPARsbdTlwMdfmE6bg+Cxkw16AIJ7fIA2whPdrQOWAkHSOG23LVzd78l0eScOHka4ka0jLeYEmAAiNP9uT+pjm1PEVeG5565GM4XSokJkofJp2CAjdohJNmoe2pdrZQ5l14qwBHbnF4QeaD4j8joCiuJPpraCBk7czMMJA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OoxnTnxvlhzxT/8BVXnHsDx1SrNwKFb/z74f/sUByT4=; b=yexMLIVaAe7CLU7pnAgISJlwJ/qms9a/8xZnE2sO2UdDzJClDLMyzEUdaFJE9aos3TS6O4BWdj6MIbH9jXHtnICUmXCbUpGDXbiCxdpHKwyT1F+TkbR1eZoMolWLBmRBS/1xMPyHETqQPwcVwxxMlHhQfAP9tLSMIFAPEBvwUAg= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by AS4PR08MB7506.eurprd08.prod.outlook.com (2603:10a6:20b:4f8::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.14; Fri, 6 Jan 2023 16:37:08 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff%5]) with mapi id 15.20.5986.009; Fri, 6 Jan 2023 16:37:08 +0000 Date: Fri, 6 Jan 2023 16:36:53 +0000 From: Szabolcs Nagy To: Paul Zimmermann Cc: nsz@port70.net, libc-alpha@sourceware.org Subject: Re: Fix slow tls access after dlopen Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0359.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:d::35) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|AS4PR08MB7506:EE_|VI1EUR03FT036:EE_|GV1PR08MB8569:EE_ X-MS-Office365-Filtering-Correlation-Id: fea690a8-e2b0-44da-68fc-08daf0044687 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 18yaM7AjBl000NepzJclll5AdmUOs8hcLPwyvUFdz+vZu7Hs8EJJ20foT1G3kUkrO1Idu0Rg7sX1Y4z6ZSl+8PsdixS4jN7GsEqojlE7zwwFSzH3AI6B8cJtqbKUWI/wzjggp/sjyZmfk2pGc4h98ZrJF8ttQlcdLV+lU2jXi8UHjeOtALlCsaj8K8gAVPeCjzz24JGi0jq9RtYk6tAciB78F6X5x1W5OFjTDm8yF0unZucMR/3t0vtrqbxjbynmczzr/p2vc//PFC68NZ53EpyEZlwlrbu/x2cw7oZyJz2FN+mqoCeMBSYX/Wmgpvoi61Z9rBsG4XxfGryoFGJ8TtKTlm3T3nv3n6NJXBOSRrekIZnMZPJMzjjXydEBRALS/YcsAd3KQKyuYj/m8CPoNYrKmdRuW80JKbOahrJrIobB2GRfzoOPH6b6epFJb/CtzBoRAXJsk+oQdHDUwjOH0QLcoAAahDaQHq3BB51EV4qglnns0tRwP+0Wiw5feTUIPhOF8pbnAgxA6unHs/auForm6jZL5wWL7yRX2LZf6t9qulTaRk/4VCUea6k2firoY3XrqOEnzHNM776fupOLd0lCugFHCPounQ9iSez0WCqTs2jOGysO/mqcP8g5r/A7ItKRR7ZijMmS285tHVe04G+LocURE8aiVKJLdkopTw4KbTc3SkGchz3TNWR5/B9mpGcDbq4nJ7QwZx0wYn88j/QtWANvtFZwmEP9XcjGhc9jEC+0voi3jnCmjnYcbXEj X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(376002)(136003)(396003)(346002)(39860400002)(451199015)(2906002)(44832011)(38100700002)(86362001)(83380400001)(5660300002)(4326008)(478600001)(316002)(6666004)(2616005)(8676002)(186003)(66476007)(66556008)(26005)(6512007)(966005)(66946007)(6916009)(6486002)(6506007)(66899015)(8936002)(41300700001)(36756003)(67856001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB7506 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VI1EUR03FT036.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7b9ee953-e370-4e10-3fbc-08daf00440bf X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1tEq6moEAD+xv+dMtn+39MrtQ0f0AZTmv0greCVWgIfrhp0r976mQUCpw3oczrvU7WA9IK3nuYDsApgJR46JsvXDrRvR5T3wgbb0r1glRaaG0QLmzsdxZVaLLrMoKbiMrg4bqF9FggRN5eJaJHs7anwGk6bSuDyikahyaX7MwYRGd8yP5XE4yxqIcFvYqgZw7Hoa8exvxGgwD6/jK9W3GhATkPuHvI0F51gfO/uAiCv2Ir/5EsiMpc684H7XGkmN2TYBw95LpJeXmHr68CAGbcOInKfnZ/oGQmeMRm04BLuEfKHhoB19CxyU5xjwaFP3s7/98keWjtzTEu728U6o+31NDhIPDLFl2dJiULrPeBkFD9/yv2LwadFnmfW+mHi33SAVwXExh/RkRRYN23cYBJqIQhlYS+JwC2IBmfCW51SvsgyblWQ0Pogr+7ctK/LyockpD4X50unPAT3rYxu31neZRvzhBhZ5t0XjoUfLZQQ7HXbXPB+Mm5YrODo9mq2YMAwuLlIWWmeX3u89R/cbCxjFqVagnwx4kNlDYBhUKt93ALFLi4T7PW5hgDqVxLS9D4ovGzWqQixVO+IdbUDFm1fKSOjFlc66zE1hRW9SBW/9cx09noxFJ9d4Ji2FOP/VOO9FcKnfith1Axd1+oL3PgiLTZGxpeK5Q0+/0n4yRRVa2yirCo83Rp0oGeZ8FH8VIe/ivjMi80PUU+aCoMwPT8EK2oiFa2LYxrQMzJqOb3SWmxrI0M7U4W++rylQocTJTy3FAwD4BswX2+DT7o8qSLbA8gLLv3BeWH5jlAoWcqI= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(136003)(396003)(376002)(346002)(451199015)(36840700001)(40470700004)(46966006)(86362001)(40480700001)(40460700003)(36756003)(356005)(26005)(6512007)(186003)(966005)(6486002)(478600001)(8676002)(8936002)(4326008)(6862004)(70206006)(41300700001)(70586007)(44832011)(5660300002)(316002)(2906002)(81166007)(36860700001)(6506007)(82310400005)(47076005)(6666004)(82740400003)(336012)(2616005)(83380400001)(66899015)(67856001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jan 2023 16:37:17.8598 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fea690a8-e2b0-44da-68fc-08daf0044687 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VI1EUR03FT036.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8569 X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 01/06/2023 11:06, Paul Zimmermann via Libc-alpha wrote: > Hi Szabolcs, > > please could you repost [1] as a distinct patch as requested by Carlos? > > This issue affects the SageMath software tool, with a major slowdown: > > https://trac.sagemath.org/ticket/34850 > > (see comment 97 which provides a short C program to reproduce the issue). i see, thanks. > > Best regards, > Paul > > [1] https://patchwork.ozlabs.org/project/glibc/patch/b116855de71098ef7dd2875dd3237f8f3ecc12c2.1618301209.git.szabolcs.nagy@arm.com/ yeah i resent that patch (only updated the commit message): https://patchwork.ozlabs.org/project/glibc/patch/20221019111439.63501-1-szabolcs.nagy@arm.com/ but it requires more work (to document the new sync mechanism). and requires actual review. and i think there is a better way but there are non-trivial trade-offs so reviewing the design would help: in tls access slow path we currently update the thread dtv up to the generation of the accessed module, but always do the slow path if the dtv is older than the global generation. the plan is to update the dtv to the global generation (latest dlopen/dlclose), this means we don't keep hitting the slow path after a dlopen, but we do hit it once after every dlopen/dlclose (even though an independent library is loaded, not related to the current tls access, this is ok if dlopen/dlclose of libs with tls is rare). dtv update does two things: 1. resize the dtv array (of the current thread) upto max modid + some, 2. go through the global dtv slotinfo list and if the generation of entry i is newer than the dtv generation then free/reset dtv[i]. (note that dtv[i] is not allocated here, it's either reset or left alone.) the original patch in bugzilla and my patch tries to do the dtv update with lock-free atomics which means the global generation count has to be accessed with acquire-load and release-store and a lot of racy loads need to use relaxed mo atomics to avoid data races. this works except out-of-thin-air values may break the code. OOTA value is only a theoretical issue with relaxed mo access i think, here it can cause ub down the line and that ub can cause arbitrary value to be stored concurrently justifying the OOTA value in the first place, this is a bit ugly and the use of atomics makes the code harder to read, but otherwise i don't see a problem with this approach. another approach is to take GL(dl_load_tls_lock) in tls access slow path and then the sync is much simpler (no nasty relaxed atomics). the drawback is that malloc/free is used under the lock which may be interposed such that they use GD tls which makes lock ordering deadlock more likely (note such issue already exists in dlopen and dlclose and tls access already uses the same lock and malloc/free in tls_get_addr_tail, however now the malloc/free would be under the lock so deadlock can happen between two threads only doing tls access, no dlopen/dlclose is needed). another problem is lock contention when there are many tls accesses concurrently with a dlopen/dlclose that bumps the generation count so all those accesses have to wait for each other and dlopen/dlclose for the same lock. so using a lock is not ideal but if we do bigger changes it may work or we may get away with simpler code: - ideally an independent dlopen should not slow tls access down. i think this can be done by updating dtv[i] in all threads at dlclose time and leaving dtv resize for the tls access slow path. (moving dtv resize to dlopen/dlclose too might work but the sync with tls access is non-trivial then), then generation count is not needed at all. dtv[i] is valid if dtv array is large enough. first access still needs to allocate. dlclose will be slower and has to sync with thread creation and exit and with dtv resize at tls access. this is a reasonable trade-off i think. requires dtv resize change to avoid the malloc lock ordering problem (not use realloc or use a per-thead-lock to sync with dlclose) and target specific asm change (at least for tlsdesc) and removal of all the slotinfo generation count code. - tls_get_addr_tail should not use the tls lock: dlopen can just set up map->l_tls_offset and have it const during the lifetime of the module. (i'm not sure why tls offset is set up lazily i dont think it buys us anything but i might missed something around dl namespaces). - _dl_update_slotinfo is called outside of tls access unnecessarily (in dlopen and reloc processing) i think these should be removed. what might make sense instead is to update the dtv once in rtld when the process is still single threaded and all modules are relocated but ctors are not yet run (this can ensure that libs loaded at start time to have reliable and as-safe GD tls access: they will use static tls without any locks or malloc on first access). - completely avoiding locks and malloc in GD tls access would require upfront allocation of tls for (and sync with) all threads, this is likely too much memory use (and dlopen sync overhead), but even with lazy tls allocation e.g. we can guarantee as-safe, no-malloc, no-failure GD tls access when tls was already accessed in that thread (so independent dlopen cannot force a dtv update only the first access may require lock/malloc). not sure how useful this is but then we can make dlsym report allocation failure for tls syms so users can do an initial tls access in relevant threads to ensure fast and reliable behaviour later (even in signal handlers). if we want a fix for this release i can repost my previous patch (with lock-free atomics that is likely backportable) but at some point a larger rewrite would be better i think.