From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adhemerval.zanella@linaro.org>
Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com
 [IPv6:2001:4860:4864:20::34])
 by sourceware.org (Postfix) with ESMTPS id BE4663858D1E
 for <libc-alpha@sourceware.org>; Fri, 29 Apr 2022 17:13:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE4663858D1E
Received: by mail-oa1-x34.google.com with SMTP id
 586e51a60fabf-e93bbb54f9so8722780fac.12
 for <libc-alpha@sourceware.org>; Fri, 29 Apr 2022 10:13:22 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:cc:references:from:in-reply-to
 :content-transfer-encoding;
 bh=FSZi7n5dl2nivfffwEE0S41oq+hUag4IgvighzA7Tvg=;
 b=bScfmM2///7Ui1FyM1uQ518i7nn9kRdr1ecf1hNEFPwT9TlJ4YyywprNbYnGVam6fm
 nJvGez4tsbBHkHpDozR0gmcEN0EEMIA8uDJLWwdnf1vkjz8qtaW+gnebMQsi/TACf+d7
 4H2QDop+Ivqe6IaN/RNrk7gHkm3e5VxoeiG3Y3qi2I4eoLAia4rzg43AJ638iJxjtvVr
 k0WyAB8ycEVUfLbS8T6KK0S5eW0tB6Sy07MggKbEgsjJhnKLVzyFIlCpjfVvUxQj/XYP
 5VKagNN3mUmQHnLPeZ5dwloBLUFAlrOu2fsnV0nCLQXhOwrWuq6BYJNMOHY4Uzag0a0b
 mrTw==
X-Gm-Message-State: AOAM530b9WbogxTPEzx9vAMG0BF0fQi4onrPjM1dkX1Dna9crbPTs12h
 LIc8CQo/pKHZCrTRXou/UFiriw==
X-Google-Smtp-Source: ABdhPJzIaTvHhg0NloPwfInUH7Etu4zrO03WOZFp1zQKzeQVprSKaIqiqrF27/eZhBrrSj4dPA9loA==
X-Received: by 2002:a05:6870:f149:b0:dd:f3b0:986d with SMTP id
 l9-20020a056870f14900b000ddf3b0986dmr1806172oac.148.1651252401069; 
 Fri, 29 Apr 2022 10:13:21 -0700 (PDT)
Received: from ?IPV6:2804:431:c7cb:726:ddb4:c7ad:9b4b:dfd6?
 ([2804:431:c7cb:726:ddb4:c7ad:9b4b:dfd6])
 by smtp.gmail.com with ESMTPSA id
 q34-20020a056808202200b0032545864b89sm1398785oiw.14.2022.04.29.10.13.19
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Fri, 29 Apr 2022 10:13:20 -0700 (PDT)
Message-ID: <970c59ff-f004-22ad-5132-c3137e63c87c@linaro.org>
Date: Fri, 29 Apr 2022 14:13:17 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.8.1
Subject: Re: [PATCH v2 3/4] Move libio lock single-thread optimization to
 generic libc-lock (BZ #27842)
Content-Language: en-US
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>,
 Florian Weimer <fweimer@redhat.com>
References: <DB6PR0801MB18791EAA59DF3BB2D70791CB83FC9@DB6PR0801MB1879.eurprd08.prod.outlook.com>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In-Reply-To: <DB6PR0801MB18791EAA59DF3BB2D70791CB83FC9@DB6PR0801MB1879.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Apr 2022 17:13:25 -0000


On 29/04/2022 10:04, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> /* Lock the recursive named lock variable.  */
> #define __libc_lock_lock_recursive(NAME) \
>   do {                                                                        \
>     void *self = THREAD_SELF;                                                 \
>     if ((NAME).owner != self)                                                 \
>       {									      \
> 	if (SINGLE_THREAD_P)                                                  \
> 	  (NAME).lock = 1;						      \
>         else								      \
>            lll_lock ((NAME).lock, LLL_PRIVATE);                               \
>         (NAME).owner = self;                                                  \
>       }                                                                       \
>     ++(NAME).cnt;                                                             \
>   } while (0)
> 
> 
> The issue is that recursive locks are very expensive. Even with the single thread
> optimization there are still 5 memory accesses just to take a free lock! While I'm in
> favour of improving locks like above, I think removing the single threaded optimizations
> from libio will cause major performance regressions in getc, putc, feof etc.
> 
> Basically doing single threaded optimizations at the highest level will always be faster
> than doing it at the lock level. That doesn't mean optimizing general locks isn't useful,
> however all it does is reduce the need to add single threaded optimizations rather than
> make them completely unnecessary.

The main problem is _IO_enable_locks is a clunky interface because it requires
flockfile to set _flags2 outside a lock region leading a possible racy issue 
(BZ #27842).  Moving to lock itself it will pretty much:


lock:
  if (THREAD_SELF->owner != self)
    THREAD_SELF->lock = 1

unlock:

  if (THREAD_SELF.cnt == 0)
    {
      THREAD_SELF->owner = NULL
      THREAD_SELF.lock = 0;
    }

Which is still a regression, but at least we don't have a race condition on
flockfile anymore.

> 
>> Or if we are bounded to keep the current practice to check for single-thread and
>> skip locks internally.  It would be good to consolidate all the internal lock
>> usage and have the single-thread lock optimizations on all locks, not only on
>> pthread_mutex_lock.
> 
> Yes, we should have single threaded optimization on all locks, including __libc_lock_lock.
> Recursive locks can be optimized both for single threaded and multi-threaded case.
> I'm wondering whether we could use the lock variable itself to store useful data (since most
> ISAs will allow 64 bits). Then all you need to check is the .lock field and only if it is already
> taken do the extra checks for ownership and incrementing recursion counter.
> 

I think ideally we would like to model all internal locks to a futex-like
so we can use the congestion optimization as described by Jens Gustedt
paper [1] which would allows us to move the counter and the lock to
same word.  I don't think we can improve recursive locks without a 64-bit
futex operation. 

Then assuming lll_lock handle the counter, we might further optimize
the lock as (without any single-thread optimization):

  typedef union
  {
    uint32_t lck; 
    pid_t owner;
  } __libc_lock_recursive_t;

  static inline void
  __libc_lock_lock_recursive (__libc_lock_recursive_t *l)
  {
    pid_t tid = THREAD_SELF->pid;
    if (l->l.owner != tid)
      {
        lll_lock (l->l.lck, LLL_PRIVATE);
	l->l.owner = tid;
      }
  }

  static inline void
  __libc_lock_unlock_recursive (__libc_lock_recursive_t *l)
  {
  /* not sure if this yield any gain here... */
    l->l.owner = 0;
    lll_unlock (l->l.lck);
  }

And lll_lock and lll_unlock will handle the counter congestion.  I think
the single-thread optimization could be done on lll_lock/lll_unlock as well.

[1] https://hal.inria.fr/hal-01236734/document