public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
@ 2022-04-29 13:04 Wilco Dijkstra
  2022-04-29 17:13 ` Adhemerval Zanella
  0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2022-04-29 13:04 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: 'GNU C Library', Florian Weimer

Hi Adhemerval,

/* Lock the recursive named lock variable.  */
#define __libc_lock_lock_recursive(NAME) \
  do {                                                                        \
    void *self = THREAD_SELF;                                                 \
    if ((NAME).owner != self)                                                 \
      {									      \
	if (SINGLE_THREAD_P)                                                  \
	  (NAME).lock = 1;						      \
        else								      \
           lll_lock ((NAME).lock, LLL_PRIVATE);                               \
        (NAME).owner = self;                                                  \
      }                                                                       \
    ++(NAME).cnt;                                                             \
  } while (0)


The issue is that recursive locks are very expensive. Even with the single thread
optimization there are still 5 memory accesses just to take a free lock! While I'm in
favour of improving locks like above, I think removing the single threaded optimizations
from libio will cause major performance regressions in getc, putc, feof etc.

Basically doing single threaded optimizations at the highest level will always be faster
than doing it at the lock level. That doesn't mean optimizing general locks isn't useful,
however all it does is reduce the need to add single threaded optimizations rather than
make them completely unnecessary.

> Or if we are bounded to keep the current practice to check for single-thread and
> skip locks internally.  It would be good to consolidate all the internal lock
> usage and have the single-thread lock optimizations on all locks, not only on
> pthread_mutex_lock.

Yes, we should have single threaded optimization on all locks, including __libc_lock_lock.
Recursive locks can be optimized both for single threaded and multi-threaded case.
I'm wondering whether we could use the lock variable itself to store useful data (since most
ISAs will allow 64 bits). Then all you need to check is the .lock field and only if it is already
taken do the extra checks for ownership and incrementing recursion counter.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-29 13:04 [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842) Wilco Dijkstra
@ 2022-04-29 17:13 ` Adhemerval Zanella
  2022-05-16 16:17   ` Wilco Dijkstra
  0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-04-29 17:13 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library', Florian Weimer



On 29/04/2022 10:04, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> /* Lock the recursive named lock variable.  */
> #define __libc_lock_lock_recursive(NAME) \
>   do {                                                                        \
>     void *self = THREAD_SELF;                                                 \
>     if ((NAME).owner != self)                                                 \
>       {									      \
> 	if (SINGLE_THREAD_P)                                                  \
> 	  (NAME).lock = 1;						      \
>         else								      \
>            lll_lock ((NAME).lock, LLL_PRIVATE);                               \
>         (NAME).owner = self;                                                  \
>       }                                                                       \
>     ++(NAME).cnt;                                                             \
>   } while (0)
> 
> 
> The issue is that recursive locks are very expensive. Even with the single thread
> optimization there are still 5 memory accesses just to take a free lock! While I'm in
> favour of improving locks like above, I think removing the single threaded optimizations
> from libio will cause major performance regressions in getc, putc, feof etc.
> 
> Basically doing single threaded optimizations at the highest level will always be faster
> than doing it at the lock level. That doesn't mean optimizing general locks isn't useful,
> however all it does is reduce the need to add single threaded optimizations rather than
> make them completely unnecessary.

The main problem is _IO_enable_locks is a clunky interface because it requires
flockfile to set _flags2 outside a lock region leading a possible racy issue 
(BZ #27842).  Moving to lock itself it will pretty much:


lock:
  if (THREAD_SELF->owner != self)
    THREAD_SELF->lock = 1

unlock:

  if (THREAD_SELF.cnt == 0)
    {
      THREAD_SELF->owner = NULL
      THREAD_SELF.lock = 0;
    }

Which is still a regression, but at least we don't have a race condition on
flockfile anymore.

> 
>> Or if we are bounded to keep the current practice to check for single-thread and
>> skip locks internally.  It would be good to consolidate all the internal lock
>> usage and have the single-thread lock optimizations on all locks, not only on
>> pthread_mutex_lock.
> 
> Yes, we should have single threaded optimization on all locks, including __libc_lock_lock.
> Recursive locks can be optimized both for single threaded and multi-threaded case.
> I'm wondering whether we could use the lock variable itself to store useful data (since most
> ISAs will allow 64 bits). Then all you need to check is the .lock field and only if it is already
> taken do the extra checks for ownership and incrementing recursion counter.
> 

I think ideally we would like to model all internal locks to a futex-like
so we can use the congestion optimization as described by Jens Gustedt
paper [1] which would allows us to move the counter and the lock to
same word.  I don't think we can improve recursive locks without a 64-bit
futex operation. 

Then assuming lll_lock handle the counter, we might further optimize
the lock as (without any single-thread optimization):

  typedef union
  {
    uint32_t lck; 
    pid_t owner;
  } __libc_lock_recursive_t;

  static inline void
  __libc_lock_lock_recursive (__libc_lock_recursive_t *l)
  {
    pid_t tid = THREAD_SELF->pid;
    if (l->l.owner != tid)
      {
        lll_lock (l->l.lck, LLL_PRIVATE);
	l->l.owner = tid;
      }
  }

  static inline void
  __libc_lock_unlock_recursive (__libc_lock_recursive_t *l)
  {
  /* not sure if this yield any gain here... */
    l->l.owner = 0;
    lll_unlock (l->l.lck);
  }

And lll_lock and lll_unlock will handle the counter congestion.  I think
the single-thread optimization could be done on lll_lock/lll_unlock as well.

[1] https://hal.inria.fr/hal-01236734/document

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-29 17:13 ` Adhemerval Zanella
@ 2022-05-16 16:17   ` Wilco Dijkstra
  2022-05-16 16:23     ` Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2022-05-16 16:17 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: 'GNU C Library', Florian Weimer

Hi Adhemerval,

> The main problem is _IO_enable_locks is a clunky interface because it requires
> flockfile to set _flags2 outside a lock region leading a possible racy issue 
> (BZ #27842).  Moving to lock itself it will pretty much:

It should be fine if we use a boolean instead of a flag. IIRC the IO structure was
externally exposed in Emacs, but if that is no longer the case then we could
change the structure safely.

A quick check of rand() shows the following results for various locks and optimizations
(relative performance compared to unlocked case):

100% - rand() without locking
317% - standard lock (= current code)
108% - add SINGLE_THREAD_P optimization inside rand 
112% - single threaded optimization in _libc_lock_lock
373% - standard recursive lock
129% - recursive lock with single thread optimization

Locks are expensive and adding a single-thread optimization is important!
It looks recursive locks remain expensive (~20% slower) compared to specialized
single-thread optimization, but normal locks might be fast enough in most cases.

> I think ideally we would like to model all internal locks to a futex-like
> so we can use the congestion optimization as described by Jens Gustedt
> paper [1] which would allows us to move the counter and the lock to
> same word.  I don't think we can improve recursive locks without a 64-bit
> futex operation. 

How much lock recursion exists in GLIBC? An 8-bit counter is likely sufficient...
With a 64-bit lock you could combine owner and count into a single value -
this may not help the multi-threaded case much, but could reduce the overhead
of the single-thread case compared to non-recursive locks.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-05-16 16:17   ` Wilco Dijkstra
@ 2022-05-16 16:23     ` Florian Weimer
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Weimer @ 2022-05-16 16:23 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Adhemerval Zanella, 'GNU C Library'

* Wilco Dijkstra:

> Hi Adhemerval,
>
>> The main problem is _IO_enable_locks is a clunky interface because it requires
>> flockfile to set _flags2 outside a lock region leading a possible racy issue 
>> (BZ #27842).  Moving to lock itself it will pretty much:
>
> It should be fine if we use a boolean instead of a flag. IIRC the IO
> structure was externally exposed in Emacs, but if that is no longer
> the case then we could change the structure safely.

gnulib has code that directly _flags (not _flags2).  gnulib is
statically linked, which makes for a long transition time.

>> I think ideally we would like to model all internal locks to a futex-like
>> so we can use the congestion optimization as described by Jens Gustedt
>> paper [1] which would allows us to move the counter and the lock to
>> same word.  I don't think we can improve recursive locks without a 64-bit
>> futex operation. 
>
> How much lock recursion exists in GLIBC? An 8-bit counter is likely
> sufficient...

One of the loader logs is recursive, and dlopen calls can be nested
basically arbitrarily deeply in ELF constructors.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-28 16:56         ` Florian Weimer
@ 2022-04-28 17:14           ` Adhemerval Zanella
  0 siblings, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2022-04-28 17:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Adhemerval Zanella via Libc-alpha



On 28/04/2022 13:56, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>>>>  	(NAME).owner = NULL;						      \
>>>>> -	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
>>>>> +        if (!SINGLE_THREAD_P)                                                 \
>>>>> +	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
>>>>>        }									      \
>>>>>    } while (0)
>>>>>  #else
>>>>
>>>> I don't think this is correct if threads are created in the lock region.
>>>
>>> I was not sure about this one and I think we the main issue in fact there is
>>> we can't use the single-thread optimization on unlock.  Maybe a better option
>>> would to use a different scheme as proposed by 
>>> https://hal.inria.fr/hal-01236734/document, where we can embedded lock and
>>> cnt in only one variable (as the lll_lock already does).
>>
>> Don't 99f841c441feeaa9a3d97fd91bb3d6ec8073c982 have the issue for pthread_mutex_lock ?
> 
> No, that optimization follows our documented guidance, namely:
> 
> | Most applications should perform the same actions whether or not
> | @code{__libc_single_threaded} is true, except with less
> | synchronization.  If this rule is followed, a process that
> | subsequently becomes multi-threaded is already in a consistent state.
> 

So I wonder if

/* Lock the recursive named lock variable.  */
#define __libc_lock_lock_recursive(NAME) \
  do {                                                                        \
    void *self = THREAD_SELF;                                                 \
    if ((NAME).owner != self)                                                 \
      {									      \
	if (SINGLE_THREAD_P)                                                  \
	  (NAME).lock = 1;						      \
        else								      \
           lll_lock ((NAME).lock, LLL_PRIVATE);                               \
        (NAME).owner = self;                                                  \
      }                                                                       \
    ++(NAME).cnt;                                                             \
  } while (0)

/* Unlock the recursive named lock variable.  */
/* We do no error checking here.  */
#define __libc_lock_unlock_recursive(NAME) \
  do {                                                                        \
    if (--(NAME).cnt == 0)                                                    \
      {                                                                       \
        (NAME).owner = NULL;                                                  \
        if (SINGLE_THREAD_P)						      \
          (NAME).lock = 0;  						      \
        else                                                                  \
          lll_unlock ((NAME).lock, LLL_PRIVATE);                              \
      }                                                                       \
  } while (0)

Or if we are bounded to keep the current practice to check for single-thread and
skip locks internally.  It would be good to consolidate all the internal lock
usage and have the single-thread lock optimizations on all locks, not only on
pthread_mutex_lock.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-28 16:39       ` Adhemerval Zanella
@ 2022-04-28 16:56         ` Florian Weimer
  2022-04-28 17:14           ` Adhemerval Zanella
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2022-04-28 16:56 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella:

>>>>  	(NAME).owner = NULL;						      \
>>>> -	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
>>>> +        if (!SINGLE_THREAD_P)                                                 \
>>>> +	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
>>>>        }									      \
>>>>    } while (0)
>>>>  #else
>>>
>>> I don't think this is correct if threads are created in the lock region.
>> 
>> I was not sure about this one and I think we the main issue in fact there is
>> we can't use the single-thread optimization on unlock.  Maybe a better option
>> would to use a different scheme as proposed by 
>> https://hal.inria.fr/hal-01236734/document, where we can embedded lock and
>> cnt in only one variable (as the lll_lock already does).
>
> Don't 99f841c441feeaa9a3d97fd91bb3d6ec8073c982 have the issue for pthread_mutex_lock ?

No, that optimization follows our documented guidance, namely:

| Most applications should perform the same actions whether or not
| @code{__libc_single_threaded} is true, except with less
| synchronization.  If this rule is followed, a process that
| subsequently becomes multi-threaded is already in a consistent state.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-27 16:32     ` Adhemerval Zanella
@ 2022-04-28 16:39       ` Adhemerval Zanella
  2022-04-28 16:56         ` Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-04-28 16:39 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 27/04/2022 13:32, Adhemerval Zanella wrote:
> 
> 
> On 27/04/2022 10:30, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>>
>>> diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h
>>> index 6c2d6acfd1..abd84e71b4 100644
>>> --- a/sysdeps/nptl/libc-lock.h
>>> +++ b/sysdeps/nptl/libc-lock.h
>>> @@ -86,7 +86,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>>  # define __libc_lock_lock_recursive(NAME) \
>>>    do {									      \
>>>      void *self = THREAD_SELF;						      \
>>> -    if ((NAME).owner != self)						      \
>>> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>>>        {									      \
>>>  	lll_lock ((NAME).lock, LLL_PRIVATE);				      \
>>>  	(NAME).owner = self;						      \
>>> @@ -104,7 +104,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>>    ({									      \
>>>      int result = 0;							      \
>>>      void *self = THREAD_SELF;						      \
>>> -    if ((NAME).owner != self)						      \
>>> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>>>        {									      \
>>>  	if (lll_trylock ((NAME).lock) == 0)				      \
>>>  	  {								      \
>>> @@ -131,7 +131,8 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>>      if (--(NAME).cnt == 0)						      \
>>>        {									      \
>>>  	(NAME).owner = NULL;						      \
>>> -	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
>>> +        if (!SINGLE_THREAD_P)                                                 \
>>> +	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
>>>        }									      \
>>>    } while (0)
>>>  #else
>>
>> I don't think this is correct if threads are created in the lock region.
> 
> I was not sure about this one and I think we the main issue in fact there is
> we can't use the single-thread optimization on unlock.  Maybe a better option
> would to use a different scheme as proposed by 
> https://hal.inria.fr/hal-01236734/document, where we can embedded lock and
> cnt in only one variable (as the lll_lock already does).

Don't 99f841c441feeaa9a3d97fd91bb3d6ec8073c982 have the issue for pthread_mutex_lock ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-27 13:30   ` Florian Weimer
@ 2022-04-27 16:32     ` Adhemerval Zanella
  2022-04-28 16:39       ` Adhemerval Zanella
  0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-04-27 16:32 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 27/04/2022 10:30, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h
>> index 6c2d6acfd1..abd84e71b4 100644
>> --- a/sysdeps/nptl/libc-lock.h
>> +++ b/sysdeps/nptl/libc-lock.h
>> @@ -86,7 +86,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>  # define __libc_lock_lock_recursive(NAME) \
>>    do {									      \
>>      void *self = THREAD_SELF;						      \
>> -    if ((NAME).owner != self)						      \
>> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>>        {									      \
>>  	lll_lock ((NAME).lock, LLL_PRIVATE);				      \
>>  	(NAME).owner = self;						      \
>> @@ -104,7 +104,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>    ({									      \
>>      int result = 0;							      \
>>      void *self = THREAD_SELF;						      \
>> -    if ((NAME).owner != self)						      \
>> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>>        {									      \
>>  	if (lll_trylock ((NAME).lock) == 0)				      \
>>  	  {								      \
>> @@ -131,7 +131,8 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>>      if (--(NAME).cnt == 0)						      \
>>        {									      \
>>  	(NAME).owner = NULL;						      \
>> -	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
>> +        if (!SINGLE_THREAD_P)                                                 \
>> +	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
>>        }									      \
>>    } while (0)
>>  #else
> 
> I don't think this is correct if threads are created in the lock region.

I was not sure about this one and I think we the main issue in fact there is
we can't use the single-thread optimization on unlock.  Maybe a better option
would to use a different scheme as proposed by 
https://hal.inria.fr/hal-01236734/document, where we can embedded lock and
cnt in only one variable (as the lll_lock already does).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-26 19:15 ` [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842) Adhemerval Zanella
@ 2022-04-27 13:30   ` Florian Weimer
  2022-04-27 16:32     ` Adhemerval Zanella
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2022-04-27 13:30 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella via Libc-alpha:

> diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h
> index 6c2d6acfd1..abd84e71b4 100644
> --- a/sysdeps/nptl/libc-lock.h
> +++ b/sysdeps/nptl/libc-lock.h
> @@ -86,7 +86,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>  # define __libc_lock_lock_recursive(NAME) \
>    do {									      \
>      void *self = THREAD_SELF;						      \
> -    if ((NAME).owner != self)						      \
> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>        {									      \
>  	lll_lock ((NAME).lock, LLL_PRIVATE);				      \
>  	(NAME).owner = self;						      \
> @@ -104,7 +104,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>    ({									      \
>      int result = 0;							      \
>      void *self = THREAD_SELF;						      \
> -    if ((NAME).owner != self)						      \
> +    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
>        {									      \
>  	if (lll_trylock ((NAME).lock) == 0)				      \
>  	  {								      \
> @@ -131,7 +131,8 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
>      if (--(NAME).cnt == 0)						      \
>        {									      \
>  	(NAME).owner = NULL;						      \
> -	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
> +        if (!SINGLE_THREAD_P)                                                 \
> +	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
>        }									      \
>    } while (0)
>  #else

I don't think this is correct if threads are created in the lock region.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)
  2022-04-26 19:15 [PATCH v2 0/4] Move libio lock single-thread optimization to generic libc-lock Adhemerval Zanella
@ 2022-04-26 19:15 ` Adhemerval Zanella
  2022-04-27 13:30   ` Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-04-26 19:15 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

This patch moves the single thread stdio optimization (d2e04918833d9)
to the libc-lock.  With generic support there is no need to add
per-function code to handle the single-thread case, and it allows to
removes the _IO_enable_locks (since once process goes multithread the
locks will be always used).

It also handles the memory streams requirement (de895ddcd7fc), since
the SINGLE_THREAD_P already contains te information whether the
process is multithread (so there is no need to disable the optimization
because such stream are listed in _IO_list_all).

Finally it also removed the flockfile uses a read-modify-write operation
on _flags2 outside a lock region (BZ #27842).

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
---
 libio/Versions           |  3 ---
 libio/feof.c             |  2 --
 libio/ferror.c           |  2 --
 libio/fputc.c            |  2 --
 libio/genops.c           | 28 ----------------------------
 libio/getc.c             |  2 --
 libio/getchar.c          |  4 +---
 libio/iofopncook.c       |  2 --
 libio/ioungetc.c         |  2 --
 libio/libio.h            |  4 ----
 libio/memstream.c        |  3 ---
 libio/putc.c             |  2 --
 libio/wmemstream.c       |  3 ---
 nptl/pthread_create.c    |  3 ---
 stdio-common/flockfile.c |  1 -
 sysdeps/mach/libc-lock.h |  9 +++++----
 sysdeps/nptl/libc-lock.h |  7 ++++---
 17 files changed, 10 insertions(+), 69 deletions(-)

diff --git a/libio/Versions b/libio/Versions
index b91a7bc914..b528fae670 100644
--- a/libio/Versions
+++ b/libio/Versions
@@ -159,9 +159,6 @@ libc {
     # Used by NPTL and librt
     __libc_fatal;
 
-    # Used by NPTL
-    _IO_enable_locks;
-
     __fseeko64;
     __ftello64;
   }
diff --git a/libio/feof.c b/libio/feof.c
index 7f79b2795e..41e7beeca7 100644
--- a/libio/feof.c
+++ b/libio/feof.c
@@ -32,8 +32,6 @@ _IO_feof (FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
-  if (!_IO_need_lock (fp))
-    return _IO_feof_unlocked (fp);
   _IO_flockfile (fp);
   result = _IO_feof_unlocked (fp);
   _IO_funlockfile (fp);
diff --git a/libio/ferror.c b/libio/ferror.c
index d2489c54d3..a60efc3f0f 100644
--- a/libio/ferror.c
+++ b/libio/ferror.c
@@ -32,8 +32,6 @@ _IO_ferror (FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
-  if (!_IO_need_lock (fp))
-    return _IO_ferror_unlocked (fp);
   _IO_flockfile (fp);
   result = _IO_ferror_unlocked (fp);
   _IO_funlockfile (fp);
diff --git a/libio/fputc.c b/libio/fputc.c
index d29a9d0aee..c3d66500d1 100644
--- a/libio/fputc.c
+++ b/libio/fputc.c
@@ -32,8 +32,6 @@ fputc (int c, FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
-  if (!_IO_need_lock (fp))
-    return _IO_putc_unlocked (c, fp);
   _IO_acquire_lock (fp);
   result = _IO_putc_unlocked (c, fp);
   _IO_release_lock (fp);
diff --git a/libio/genops.c b/libio/genops.c
index ccfbcd149d..eca50bacbc 100644
--- a/libio/genops.c
+++ b/libio/genops.c
@@ -488,39 +488,11 @@ _IO_init (FILE *fp, int flags)
   _IO_init_internal (fp, flags);
 }
 
-static int stdio_needs_locking;
-
-/* In a single-threaded process most stdio locks can be omitted.  After
-   _IO_enable_locks is called, locks are not optimized away any more.
-   It must be first called while the process is still single-threaded.
-
-   This lock optimization can be disabled on a per-file basis by setting
-   _IO_FLAGS2_NEED_LOCK, because a file can have user-defined callbacks
-   or can be locked with flockfile and then a thread may be created
-   between a lock and unlock, so omitting the lock is not valid.
-
-   Here we have to make sure that the flag is set on all existing files
-   and files created later.  */
-void
-_IO_enable_locks (void)
-{
-  _IO_ITER i;
-
-  if (stdio_needs_locking)
-    return;
-  stdio_needs_locking = 1;
-  for (i = _IO_iter_begin (); i != _IO_iter_end (); i = _IO_iter_next (i))
-    _IO_iter_file (i)->_flags2 |= _IO_FLAGS2_NEED_LOCK;
-}
-libc_hidden_def (_IO_enable_locks)
-
 void
 _IO_old_init (FILE *fp, int flags)
 {
   fp->_flags = _IO_MAGIC|flags;
   fp->_flags2 = 0;
-  if (stdio_needs_locking)
-    fp->_flags2 |= _IO_FLAGS2_NEED_LOCK;
   fp->_IO_buf_base = NULL;
   fp->_IO_buf_end = NULL;
   fp->_IO_read_base = NULL;
diff --git a/libio/getc.c b/libio/getc.c
index 8c79cf702f..18647aa3e1 100644
--- a/libio/getc.c
+++ b/libio/getc.c
@@ -34,8 +34,6 @@ _IO_getc (FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
-  if (!_IO_need_lock (fp))
-    return _IO_getc_unlocked (fp);
   _IO_acquire_lock (fp);
   result = _IO_getc_unlocked (fp);
   _IO_release_lock (fp);
diff --git a/libio/getchar.c b/libio/getchar.c
index e0f4f471b3..4ab1e7b065 100644
--- a/libio/getchar.c
+++ b/libio/getchar.c
@@ -33,10 +33,8 @@ int
 getchar (void)
 {
   int result;
-  if (!_IO_need_lock (stdin))
-    return _IO_getc_unlocked (stdin);
   _IO_acquire_lock (stdin);
   result = _IO_getc_unlocked (stdin);
   _IO_release_lock (stdin);
   return result;
-}
\ No newline at end of file
+}
diff --git a/libio/iofopncook.c b/libio/iofopncook.c
index 16bcc7f4a3..5ecca41717 100644
--- a/libio/iofopncook.c
+++ b/libio/iofopncook.c
@@ -162,8 +162,6 @@ _IO_cookie_init (struct _IO_cookie_file *cfile, int read_write,
   _IO_mask_flags (&cfile->__fp.file, read_write,
 		  _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING);
 
-  cfile->__fp.file._flags2 |= _IO_FLAGS2_NEED_LOCK;
-
   /* We use a negative number different from -1 for _fileno to mark that
      this special stream is not associated with a real file, but still has
      to be treated as such.  */
diff --git a/libio/ioungetc.c b/libio/ioungetc.c
index 92d5572e21..547953dd16 100644
--- a/libio/ioungetc.c
+++ b/libio/ioungetc.c
@@ -33,8 +33,6 @@ ungetc (int c, FILE *fp)
   CHECK_FILE (fp, EOF);
   if (c == EOF)
     return EOF;
-  if (!_IO_need_lock (fp))
-    return _IO_sputbackc (fp, (unsigned char) c);
   _IO_acquire_lock (fp);
   result = _IO_sputbackc (fp, (unsigned char) c);
   _IO_release_lock (fp);
diff --git a/libio/libio.h b/libio/libio.h
index 42ff6143af..e1e8ffdc54 100644
--- a/libio/libio.h
+++ b/libio/libio.h
@@ -88,7 +88,6 @@ typedef struct
 #define _IO_FLAGS2_USER_WBUF 8
 #define _IO_FLAGS2_NOCLOSE 32
 #define _IO_FLAGS2_CLOEXEC 64
-#define _IO_FLAGS2_NEED_LOCK 128
 
 /* _IO_pos_BAD is an off64_t value indicating error, unknown, or EOF.  */
 #define _IO_pos_BAD ((off64_t) -1)
@@ -212,9 +211,6 @@ extern int _IO_ftrylockfile (FILE *) __THROW;
 #define _IO_cleanup_region_end(_Doit) /**/
 #endif
 
-#define _IO_need_lock(_fp) \
-  (((_fp)->_flags2 & _IO_FLAGS2_NEED_LOCK) != 0)
-
 extern int _IO_vfscanf (FILE * __restrict, const char * __restrict,
 			__gnuc_va_list, int *__restrict);
 extern __ssize_t _IO_padn (FILE *, int, __ssize_t);
diff --git a/libio/memstream.c b/libio/memstream.c
index 7ab61876ba..15022b72ee 100644
--- a/libio/memstream.c
+++ b/libio/memstream.c
@@ -92,9 +92,6 @@ __open_memstream (char **bufloc, size_t *sizeloc)
   new_f->fp.bufloc = bufloc;
   new_f->fp.sizeloc = sizeloc;
 
-  /* Disable single thread optimization.  BZ 21735.  */
-  new_f->fp._sf._sbf._f._flags2 |= _IO_FLAGS2_NEED_LOCK;
-
   return (FILE *) &new_f->fp._sf._sbf;
 }
 libc_hidden_def (__open_memstream)
diff --git a/libio/putc.c b/libio/putc.c
index 44e30649c9..f4cc2024b6 100644
--- a/libio/putc.c
+++ b/libio/putc.c
@@ -25,8 +25,6 @@ _IO_putc (int c, FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
-  if (!_IO_need_lock (fp))
-    return _IO_putc_unlocked (c, fp);
   _IO_acquire_lock (fp);
   result = _IO_putc_unlocked (c, fp);
   _IO_release_lock (fp);
diff --git a/libio/wmemstream.c b/libio/wmemstream.c
index 9366ef4aad..abaf421069 100644
--- a/libio/wmemstream.c
+++ b/libio/wmemstream.c
@@ -94,9 +94,6 @@ open_wmemstream (wchar_t **bufloc, size_t *sizeloc)
   new_f->fp.bufloc = bufloc;
   new_f->fp.sizeloc = sizeloc;
 
-  /* Disable single thread optimization.  BZ 21735.  */
-  new_f->fp._sf._sbf._f._flags2 |= _IO_FLAGS2_NEED_LOCK;
-
   return (FILE *) &new_f->fp._sf._sbf;
 }
 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index e7a099acb7..4f45ea36bc 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -740,9 +740,6 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
         collect_default_sched (pd);
     }
 
-  if (__glibc_unlikely (__nptl_nthreads == 1))
-    _IO_enable_locks ();
-
   /* Pass the descriptor to the caller.  */
   *newthread = (pthread_t) pd;
 
diff --git a/stdio-common/flockfile.c b/stdio-common/flockfile.c
index a5decb450f..49f72c69ab 100644
--- a/stdio-common/flockfile.c
+++ b/stdio-common/flockfile.c
@@ -22,7 +22,6 @@
 void
 __flockfile (FILE *stream)
 {
-  stream->_flags2 |= _IO_FLAGS2_NEED_LOCK;
   _IO_lock_lock (*stream->_lock);
 }
 weak_alias (__flockfile, flockfile);
diff --git a/sysdeps/mach/libc-lock.h b/sysdeps/mach/libc-lock.h
index 225eb67f5a..ee38948d1e 100644
--- a/sysdeps/mach/libc-lock.h
+++ b/sysdeps/mach/libc-lock.h
@@ -106,9 +106,9 @@ typedef struct __libc_lock_recursive_opaque__ __libc_lock_recursive_t;
      __libc_lock_recursive_t *const __lock = &(NAME);   \
      void *__self = __libc_lock_owner_self ();   \
      int __r = 0;   \
-     if (__self == __lock->owner)   \
+     if (!SINGLE_THREAD_P && __self == __lock->owner)   \
        ++__lock->cnt;   \
-     else if ((__r = lll_trylock (__lock->lock)) == 0)   \
+     else if (!SINGLE_THREAD_P && (__r = lll_trylock (__lock->lock)) == 0)   \
        __lock->owner = __self, __lock->cnt = 1;   \
      __r;   \
    })
@@ -117,7 +117,7 @@ typedef struct __libc_lock_recursive_opaque__ __libc_lock_recursive_t;
   ({   \
      __libc_lock_recursive_t *const __lock = &(NAME);   \
      void *__self = __libc_lock_owner_self ();   \
-     if (__self != __lock->owner)   \
+     if (!SINGLE_THREAD_P && __self != __lock->owner)   \
        {   \
          lll_lock (__lock->lock, 0);   \
          __lock->owner = __self;   \
@@ -132,7 +132,8 @@ typedef struct __libc_lock_recursive_opaque__ __libc_lock_recursive_t;
      if (--__lock->cnt == 0)   \
        {   \
          __lock->owner = 0;   \
-         lll_unlock (__lock->lock, 0);   \
+         if (!SINGLE_THREAD_P) \
+           lll_unlock (__lock->lock, 0);   \
        }   \
    })
 
diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h
index 6c2d6acfd1..abd84e71b4 100644
--- a/sysdeps/nptl/libc-lock.h
+++ b/sysdeps/nptl/libc-lock.h
@@ -86,7 +86,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
 # define __libc_lock_lock_recursive(NAME) \
   do {									      \
     void *self = THREAD_SELF;						      \
-    if ((NAME).owner != self)						      \
+    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
       {									      \
 	lll_lock ((NAME).lock, LLL_PRIVATE);				      \
 	(NAME).owner = self;						      \
@@ -104,7 +104,7 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
   ({									      \
     int result = 0;							      \
     void *self = THREAD_SELF;						      \
-    if ((NAME).owner != self)						      \
+    if (!SINGLE_THREAD_P && (NAME).owner != self)						      \
       {									      \
 	if (lll_trylock ((NAME).lock) == 0)				      \
 	  {								      \
@@ -131,7 +131,8 @@ typedef struct { int lock; int cnt; void *owner; } __libc_lock_recursive_t;
     if (--(NAME).cnt == 0)						      \
       {									      \
 	(NAME).owner = NULL;						      \
-	lll_unlock ((NAME).lock, LLL_PRIVATE);				      \
+        if (!SINGLE_THREAD_P)                                                 \
+	  lll_unlock ((NAME).lock, LLL_PRIVATE);			      \
       }									      \
   } while (0)
 #else
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-16 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 13:04 [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842) Wilco Dijkstra
2022-04-29 17:13 ` Adhemerval Zanella
2022-05-16 16:17   ` Wilco Dijkstra
2022-05-16 16:23     ` Florian Weimer
  -- strict thread matches above, loose matches on Subject: below --
2022-04-26 19:15 [PATCH v2 0/4] Move libio lock single-thread optimization to generic libc-lock Adhemerval Zanella
2022-04-26 19:15 ` [PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842) Adhemerval Zanella
2022-04-27 13:30   ` Florian Weimer
2022-04-27 16:32     ` Adhemerval Zanella
2022-04-28 16:39       ` Adhemerval Zanella
2022-04-28 16:56         ` Florian Weimer
2022-04-28 17:14           ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).