From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id BE4663858D1E for ; Fri, 29 Apr 2022 17:13:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE4663858D1E Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-e93bbb54f9so8722780fac.12 for ; Fri, 29 Apr 2022 10:13:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=FSZi7n5dl2nivfffwEE0S41oq+hUag4IgvighzA7Tvg=; b=bScfmM2///7Ui1FyM1uQ518i7nn9kRdr1ecf1hNEFPwT9TlJ4YyywprNbYnGVam6fm nJvGez4tsbBHkHpDozR0gmcEN0EEMIA8uDJLWwdnf1vkjz8qtaW+gnebMQsi/TACf+d7 4H2QDop+Ivqe6IaN/RNrk7gHkm3e5VxoeiG3Y3qi2I4eoLAia4rzg43AJ638iJxjtvVr k0WyAB8ycEVUfLbS8T6KK0S5eW0tB6Sy07MggKbEgsjJhnKLVzyFIlCpjfVvUxQj/XYP 5VKagNN3mUmQHnLPeZ5dwloBLUFAlrOu2fsnV0nCLQXhOwrWuq6BYJNMOHY4Uzag0a0b mrTw== X-Gm-Message-State: AOAM530b9WbogxTPEzx9vAMG0BF0fQi4onrPjM1dkX1Dna9crbPTs12h LIc8CQo/pKHZCrTRXou/UFiriw== X-Google-Smtp-Source: ABdhPJzIaTvHhg0NloPwfInUH7Etu4zrO03WOZFp1zQKzeQVprSKaIqiqrF27/eZhBrrSj4dPA9loA== X-Received: by 2002:a05:6870:f149:b0:dd:f3b0:986d with SMTP id l9-20020a056870f14900b000ddf3b0986dmr1806172oac.148.1651252401069; Fri, 29 Apr 2022 10:13:21 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:726:ddb4:c7ad:9b4b:dfd6? ([2804:431:c7cb:726:ddb4:c7ad:9b4b:dfd6]) by smtp.gmail.com with ESMTPSA id q34-20020a056808202200b0032545864b89sm1398785oiw.14.2022.04.29.10.13.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Apr 2022 10:13:20 -0700 (PDT) Message-ID: <970c59ff-f004-22ad-5132-c3137e63c87c@linaro.org> Date: Fri, 29 Apr 2022 14:13:17 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842) Content-Language: en-US To: Wilco Dijkstra Cc: 'GNU C Library' , Florian Weimer References: From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Apr 2022 17:13:25 -0000 On 29/04/2022 10:04, Wilco Dijkstra wrote: > Hi Adhemerval, > > /* Lock the recursive named lock variable. */ > #define __libc_lock_lock_recursive(NAME) \ > do { \ > void *self = THREAD_SELF; \ > if ((NAME).owner != self) \ > { \ > if (SINGLE_THREAD_P) \ > (NAME).lock = 1; \ > else \ > lll_lock ((NAME).lock, LLL_PRIVATE); \ > (NAME).owner = self; \ > } \ > ++(NAME).cnt; \ > } while (0) > > > The issue is that recursive locks are very expensive. Even with the single thread > optimization there are still 5 memory accesses just to take a free lock! While I'm in > favour of improving locks like above, I think removing the single threaded optimizations > from libio will cause major performance regressions in getc, putc, feof etc. > > Basically doing single threaded optimizations at the highest level will always be faster > than doing it at the lock level. That doesn't mean optimizing general locks isn't useful, > however all it does is reduce the need to add single threaded optimizations rather than > make them completely unnecessary. The main problem is _IO_enable_locks is a clunky interface because it requires flockfile to set _flags2 outside a lock region leading a possible racy issue (BZ #27842). Moving to lock itself it will pretty much: lock: if (THREAD_SELF->owner != self) THREAD_SELF->lock = 1 unlock: if (THREAD_SELF.cnt == 0) { THREAD_SELF->owner = NULL THREAD_SELF.lock = 0; } Which is still a regression, but at least we don't have a race condition on flockfile anymore. > >> Or if we are bounded to keep the current practice to check for single-thread and >> skip locks internally. It would be good to consolidate all the internal lock >> usage and have the single-thread lock optimizations on all locks, not only on >> pthread_mutex_lock. > > Yes, we should have single threaded optimization on all locks, including __libc_lock_lock. > Recursive locks can be optimized both for single threaded and multi-threaded case. > I'm wondering whether we could use the lock variable itself to store useful data (since most > ISAs will allow 64 bits). Then all you need to check is the .lock field and only if it is already > taken do the extra checks for ownership and incrementing recursion counter. > I think ideally we would like to model all internal locks to a futex-like so we can use the congestion optimization as described by Jens Gustedt paper [1] which would allows us to move the counter and the lock to same word. I don't think we can improve recursive locks without a 64-bit futex operation. Then assuming lll_lock handle the counter, we might further optimize the lock as (without any single-thread optimization): typedef union { uint32_t lck; pid_t owner; } __libc_lock_recursive_t; static inline void __libc_lock_lock_recursive (__libc_lock_recursive_t *l) { pid_t tid = THREAD_SELF->pid; if (l->l.owner != tid) { lll_lock (l->l.lck, LLL_PRIVATE); l->l.owner = tid; } } static inline void __libc_lock_unlock_recursive (__libc_lock_recursive_t *l) { /* not sure if this yield any gain here... */ l->l.owner = 0; lll_unlock (l->l.lck); } And lll_lock and lll_unlock will handle the counter congestion. I think the single-thread optimization could be done on lll_lock/lll_unlock as well. [1] https://hal.inria.fr/hal-01236734/document