From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x933.google.com (mail-ua1-x933.google.com [IPv6:2607:f8b0:4864:20::933]) by sourceware.org (Postfix) with ESMTPS id 83FBA3858405 for ; Mon, 15 Nov 2021 19:24:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 83FBA3858405 Received: by mail-ua1-x933.google.com with SMTP id ay21so37149124uab.12 for ; Mon, 15 Nov 2021 11:24:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=a/R0k0EcMfs+PyaANcyDZSr0Ofo6aEu3LreYeZ9Caq4=; b=C7fW2HLIZD9GhOgZUH9zx07fCNh4sdOtHU+jJCM2jO/+OzvA/ZwwRkzRGIMVJ3qBJ3 V3vLEn6zJ0GMa3CtL7I6eIqWYnEandgBcSyBbPI6oXBcEGIP/al2qmH59vjS39Mi5mHN nCTMBzMR8G55De0hl+/eIYRS4NDSwVQ5YYFhdJuprUBrN/hv5YaYPxEaXfnak6IIxTN6 5syTJh7zzTMmTS16gLyXMrzDaY316BssxCw9vZct6UHPs+4QayN0LJ13N4tfdWMMg3UK gMhSyxMw+VnPkN3sThO4ZIPEwKdN7c1EJndZRy4L11ocAw9FqxKAU/RHrphEvMqnLBF1 6jkQ== X-Gm-Message-State: AOAM530xbJmOyYexFuBIzDkDH54baaTDcbOgG231ZyuO/CclLqA4JU2G 6aVz6CrRUpKz4Wzs9OvcSInOeg== X-Google-Smtp-Source: ABdhPJw93XMwzCxN4w7kx31annPPlyigIj0sB8YVlrvNmjkgTQCbN0U/nGIiVML0O4wM2yXouqwTEg== X-Received: by 2002:ab0:5a23:: with SMTP id l32mr1753193uad.129.1637004282485; Mon, 15 Nov 2021 11:24:42 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:66dc:13f5:e2fb:5a0d:90? ([2804:431:c7ca:66dc:13f5:e2fb:5a0d:90]) by smtp.gmail.com with ESMTPSA id d16sm9039578vko.29.2021.11.15.11.24.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 15 Nov 2021 11:24:42 -0800 (PST) Message-ID: <3b23e97a-fa13-15b4-54d7-618d5c6e5a0c@linaro.org> Date: Mon, 15 Nov 2021 16:24:39 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [PATCH 1/3] nptl: Extract from pthread_cond_common.c Content-Language: en-US To: Florian Weimer , libc-alpha@sourceware.org Cc: Jakub Jelinek , gcc-patches@gcc.gnu.org, Jason Merrill References: From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2021 19:24:47 -0000 On 03/11/2021 13:27, Florian Weimer via Libc-alpha wrote: > And make it an installed header. This addresses a few aliasing > violations (which do not seem to result in miscompilation due to > the use of atomics), and also enables use of wide counters in other > parts of the library. > > The debug output in nptl/tst-cond22 has been adjusted to print > the 32-bit values instead because it avoids a big-endian/little-endian > difference. LGTM, thanks. Reviewed-by: Adhemerval Zanella > --- > bits/atomic_wide_counter.h | 35 ++++ > include/atomic_wide_counter.h | 89 +++++++++++ > include/bits/atomic_wide_counter.h | 1 + > misc/Makefile | 3 +- > misc/atomic_wide_counter.c | 127 +++++++++++++++ > nptl/Makefile | 13 +- > nptl/pthread_cond_common.c | 204 ++++-------------------- > nptl/tst-cond22.c | 14 +- > sysdeps/nptl/bits/thread-shared-types.h | 22 +-- > 9 files changed, 310 insertions(+), 198 deletions(-) > create mode 100644 bits/atomic_wide_counter.h > create mode 100644 include/atomic_wide_counter.h > create mode 100644 include/bits/atomic_wide_counter.h > create mode 100644 misc/atomic_wide_counter.c > > diff --git a/bits/atomic_wide_counter.h b/bits/atomic_wide_counter.h > new file mode 100644 > index 0000000000..0687eb554e > --- /dev/null > +++ b/bits/atomic_wide_counter.h > @@ -0,0 +1,35 @@ > +/* Monotonically increasing wide counters (at least 62 bits). > + Copyright (C) 2016-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _BITS_ATOMIC_WIDE_COUNTER_H > +#define _BITS_ATOMIC_WIDE_COUNTER_H > + > +/* Counter that is monotonically increasing (by less than 2**31 per > + increment), with a single writer, and an arbitrary number of > + readers. */ > +typedef union > +{ > + __extension__ unsigned long long int __value64; > + struct > + { > + unsigned int __low; > + unsigned int __high; > + } __value32; > +} __atomic_wide_counter; > + > +#endif /* _BITS_ATOMIC_WIDE_COUNTER_H */ Ok, it would be included in multiple places so we can't tie to a specific header. > diff --git a/include/atomic_wide_counter.h b/include/atomic_wide_counter.h > new file mode 100644 > index 0000000000..31f009d5e6 > --- /dev/null > +++ b/include/atomic_wide_counter.h > @@ -0,0 +1,89 @@ > +/* Monotonically increasing wide counters (at least 62 bits). > + Copyright (C) 2016-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _ATOMIC_WIDE_COUNTER_H > +#define _ATOMIC_WIDE_COUNTER_H > + > +#include > +#include > + > +#if __HAVE_64B_ATOMICS > + > +static inline uint64_t > +__atomic_wide_counter_load_relaxed (__atomic_wide_counter *c) > +{ > + return atomic_load_relaxed (&c->__value64); > +} > + > +static inline uint64_t > +__atomic_wide_counter_fetch_add_relaxed (__atomic_wide_counter *c, > + unsigned int val) > +{ > + return atomic_fetch_add_relaxed (&c->__value64, val); > +} > + > +static inline uint64_t > +__atomic_wide_counter_fetch_add_acquire (__atomic_wide_counter *c, > + unsigned int val) > +{ > + return atomic_fetch_add_acquire (&c->__value64, val); > +} > + > +static inline void > +__atomic_wide_counter_add_relaxed (__atomic_wide_counter *c, > + unsigned int val) > +{ > + atomic_store_relaxed (&c->__value64, > + atomic_load_relaxed (&c->__value64) + val); > +} > + > +static uint64_t __attribute__ ((unused)) > +__atomic_wide_counter_fetch_xor_release (__atomic_wide_counter *c, > + unsigned int val) > +{ > + return atomic_fetch_xor_release (&c->__value64, val); > +} > + > +#else /* !__HAVE_64B_ATOMICS */ > + > +uint64_t __atomic_wide_counter_load_relaxed (__atomic_wide_counter *c) > + attribute_hidden; > + > +uint64_t __atomic_wide_counter_fetch_add_relaxed (__atomic_wide_counter *c, > + unsigned int op) > + attribute_hidden; > + > +static inline uint64_t > +__atomic_wide_counter_fetch_add_acquire (__atomic_wide_counter *c, > + unsigned int val) > +{ > + uint64_t r = __atomic_wide_counter_fetch_add_relaxed (c, val); > + atomic_thread_fence_acquire (); > + return r; > +} > + > +static inline void > +__atomic_wide_counter_add_relaxed (__atomic_wide_counter *c, > + unsigned int val) > +{ > + __atomic_wide_counter_fetch_add_relaxed (c, val); > +} > + > +#endif /* !__HAVE_64B_ATOMICS */ > + > +#endif /* _ATOMIC_WIDE_COUNTER_H */ Ok. > diff --git a/include/bits/atomic_wide_counter.h b/include/bits/atomic_wide_counter.h > new file mode 100644 > index 0000000000..8fb09a5291 > --- /dev/null > +++ b/include/bits/atomic_wide_counter.h > @@ -0,0 +1 @@ > +#include_next Ok. > diff --git a/misc/Makefile b/misc/Makefile > index 1083ba3bfc..3b66cb9f6a 100644 > --- a/misc/Makefile > +++ b/misc/Makefile > @@ -73,7 +73,8 @@ routines := brk sbrk sstk ioctl \ > fgetxattr flistxattr fremovexattr fsetxattr getxattr \ > listxattr lgetxattr llistxattr lremovexattr lsetxattr \ > removexattr setxattr getauxval ifunc-impl-list makedev \ > - allocate_once fd_to_filename single_threaded unwind-link > + allocate_once fd_to_filename single_threaded unwind-link \ > + atomic_wide_counter > > generated += tst-error1.mtrace tst-error1-mem.out \ > tst-allocate_once.mtrace tst-allocate_once-mem.out Ok. > diff --git a/misc/atomic_wide_counter.c b/misc/atomic_wide_counter.c > new file mode 100644 > index 0000000000..56d8981925 > --- /dev/null > +++ b/misc/atomic_wide_counter.c > @@ -0,0 +1,127 @@ > +/* Monotonically increasing wide counters (at least 62 bits). > + Copyright (C) 2016-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +#if !__HAVE_64B_ATOMICS > + > +/* Values we add or xor are less than or equal to 1<<31, so we only > + have to make overflow-and-addition atomic wrt. to concurrent load > + operations and xor operations. To do that, we split each counter > + into two 32b values of which we reserve the MSB of each to > + represent an overflow from the lower-order half to the higher-order > + half. > + > + In the common case, the state is (higher-order / lower-order half, and . is > + basically concatenation of the bits): > + 0.h / 0.l = h.l > + > + When we add a value of x that overflows (i.e., 0.l + x == 1.L), we run the > + following steps S1-S4 (the values these represent are on the right-hand > + side): > + S1: 0.h / 1.L == (h+1).L > + S2: 1.(h+1) / 1.L == (h+1).L > + S3: 1.(h+1) / 0.L == (h+1).L > + S4: 0.(h+1) / 0.L == (h+1).L > + If the LSB of the higher-order half is set, readers will ignore the > + overflow bit in the lower-order half. > + > + To get an atomic snapshot in load operations, we exploit that the > + higher-order half is monotonically increasing; if we load a value V from > + it, then read the lower-order half, and then read the higher-order half > + again and see the same value V, we know that both halves have existed in > + the sequence of values the full counter had. This is similar to the > + validated reads in the time-based STMs in GCC's libitm (e.g., > + method_ml_wt). > + > + One benefit of this scheme is that this makes load operations > + obstruction-free because unlike if we would just lock the counter, readers > + can almost always interpret a snapshot of each halves. Readers can be > + forced to read a new snapshot when the read is concurrent with an overflow. > + However, overflows will happen infrequently, so load operations are > + practically lock-free. */ > + > +uint64_t > +__atomic_wide_counter_fetch_add_relaxed (__atomic_wide_counter *c, > + unsigned int op) > +{ > + /* S1. Note that this is an atomic read-modify-write so it extends the > + release sequence of release MO store at S3. */ > + unsigned int l = atomic_fetch_add_relaxed (&c->__value32.__low, op); > + unsigned int h = atomic_load_relaxed (&c->__value32.__high); > + uint64_t result = ((uint64_t) h << 31) | l; > + l += op; > + if ((l >> 31) > 0) > + { > + /* Overflow. Need to increment higher-order half. Note that all > + add operations are ordered in happens-before. */ > + h++; > + /* S2. Release MO to synchronize with the loads of the higher-order half > + in the load operation. See __condvar_load_64_relaxed. */ > + atomic_store_release (&c->__value32.__high, > + h | ((unsigned int) 1 << 31)); > + l ^= (unsigned int) 1 << 31; > + /* S3. See __condvar_load_64_relaxed. */ > + atomic_store_release (&c->__value32.__low, l); > + /* S4. Likewise. */ > + atomic_store_release (&c->__value32.__high, h); > + } > + return result; > +} > + > +uint64_t > +__atomic_wide_counter_load_relaxed (__atomic_wide_counter *c) > +{ > + unsigned int h, l, h2; > + do > + { > + /* This load and the second one below to the same location read from the > + stores in the overflow handling of the add operation or the > + initializing stores (which is a simple special case because > + initialization always completely happens before further use). > + Because no two stores to the higher-order half write the same value, > + the loop ensures that if we continue to use the snapshot, this load > + and the second one read from the same store operation. All candidate > + store operations have release MO. > + If we read from S2 in the first load, then we will see the value of > + S1 on the next load (because we synchronize with S2), or a value > + later in modification order. We correctly ignore the lower-half's > + overflow bit in this case. If we read from S4, then we will see the > + value of S3 in the next load (or a later value), which does not have > + the overflow bit set anymore. > + */ > + h = atomic_load_acquire (&c->__value32.__high); > + /* This will read from the release sequence of S3 (i.e, either the S3 > + store or the read-modify-writes at S1 following S3 in modification > + order). Thus, the read synchronizes with S3, and the following load > + of the higher-order half will read from the matching S2 (or a later > + value). > + Thus, if we read a lower-half value here that already overflowed and > + belongs to an increased higher-order half value, we will see the > + latter and h and h2 will not be equal. */ > + l = atomic_load_acquire (&c->__value32.__low); > + /* See above. */ > + h2 = atomic_load_relaxed (&c->__value32.__high); > + } > + while (h != h2); > + if (((l >> 31) > 0) && ((h >> 31) > 0)) > + l ^= (unsigned int) 1 << 31; > + return ((uint64_t) (h & ~((unsigned int) 1 << 31)) << 31) + l; > +} > + > +#endif /* !__HAVE_64B_ATOMICS */ Ok, it is basically moving out the code from pthread_cond one. > diff --git a/nptl/Makefile b/nptl/Makefile > index ff4d590f11..6310aecaad 100644 > --- a/nptl/Makefile > +++ b/nptl/Makefile > @@ -22,8 +22,14 @@ subdir := nptl > > include ../Makeconfig > > -headers := pthread.h semaphore.h bits/semaphore.h \ > - bits/struct_mutex.h bits/struct_rwlock.h > +headers := \ > + bits/atomic_wide_counter.h \ > + bits/semaphore.h \ > + bits/struct_mutex.h \ > + bits/struct_rwlock.h \ > + pthread.h \ > + semaphore.h \ > + # headers > > extra-libs := libpthread > extra-libs-others := $(extra-libs) Ok. > @@ -270,7 +276,7 @@ tests = tst-attr2 tst-attr3 tst-default-attr \ > tst-mutexpi1 tst-mutexpi2 tst-mutexpi3 tst-mutexpi4 \ > tst-mutexpi5 tst-mutexpi5a tst-mutexpi6 tst-mutexpi7 tst-mutexpi7a \ > tst-mutexpi9 tst-mutexpi10 \ > - tst-cond22 tst-cond26 \ > + tst-cond26 \ > tst-robustpi1 tst-robustpi2 tst-robustpi3 tst-robustpi4 tst-robustpi5 \ > tst-robustpi6 tst-robustpi7 tst-robustpi9 \ > tst-rwlock2 tst-rwlock2a tst-rwlock2b tst-rwlock3 \ > @@ -319,6 +325,7 @@ tests-internal := tst-robustpi8 tst-rwlock19 tst-rwlock20 \ > tst-barrier5 tst-signal7 tst-mutex8 tst-mutex8-static \ > tst-mutexpi8 tst-mutexpi8-static \ > tst-setgetname \ > + tst-cond22 \ > > xtests = tst-setuid1 tst-setuid1-static tst-setuid2 \ > tst-mutexpp1 tst-mutexpp6 tst-mutexpp10 tst-setgroups \ Ok. > diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c > index c35b9ef03a..fb56f93b6e 100644 > --- a/nptl/pthread_cond_common.c > +++ b/nptl/pthread_cond_common.c > @@ -17,79 +17,52 @@ > . */ > > #include > +#include > #include > #include > > -/* We need 3 least-significant bits on __wrefs for something else. */ > +/* We need 3 least-significant bits on __wrefs for something else. > + This also matches __atomic_wide_counter requirements: The highest > + value we add is __PTHREAD_COND_MAX_GROUP_SIZE << 2 to __g1_start > + (the two extra bits are for the lock in the two LSBs of > + __g1_start). */ > #define __PTHREAD_COND_MAX_GROUP_SIZE ((unsigned) 1 << 29) > > -#if __HAVE_64B_ATOMICS == 1 > - > -static uint64_t __attribute__ ((unused)) > +static inline uint64_t > __condvar_load_wseq_relaxed (pthread_cond_t *cond) > { > - return atomic_load_relaxed (&cond->__data.__wseq); > + return __atomic_wide_counter_load_relaxed (&cond->__data.__wseq); > } > > -static uint64_t __attribute__ ((unused)) > +static inline uint64_t > __condvar_fetch_add_wseq_acquire (pthread_cond_t *cond, unsigned int val) > { > - return atomic_fetch_add_acquire (&cond->__data.__wseq, val); > + return __atomic_wide_counter_fetch_add_acquire (&cond->__data.__wseq, val); > } > > -static uint64_t __attribute__ ((unused)) > -__condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val) > +static inline uint64_t > +__condvar_load_g1_start_relaxed (pthread_cond_t *cond) > { > - return atomic_fetch_xor_release (&cond->__data.__wseq, val); > + return __atomic_wide_counter_load_relaxed (&cond->__data.__g1_start); > } > > -static uint64_t __attribute__ ((unused)) > -__condvar_load_g1_start_relaxed (pthread_cond_t *cond) > +static inline void > +__condvar_add_g1_start_relaxed (pthread_cond_t *cond, unsigned int val) > { > - return atomic_load_relaxed (&cond->__data.__g1_start); > + __atomic_wide_counter_add_relaxed (&cond->__data.__g1_start, val); > } > > -static void __attribute__ ((unused)) > -__condvar_add_g1_start_relaxed (pthread_cond_t *cond, unsigned int val) > +#if __HAVE_64B_ATOMICS == 1 > + > +static inline uint64_t > +__condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val) > { > - atomic_store_relaxed (&cond->__data.__g1_start, > - atomic_load_relaxed (&cond->__data.__g1_start) + val); > + return atomic_fetch_xor_release (&cond->__data.__wseq.__value64, val); > } > Ok, although the reorganization of the placemente makes review a bit confusing. > -#else > - > -/* We use two 64b counters: __wseq and __g1_start. They are monotonically > - increasing and single-writer-multiple-readers counters, so we can implement > - load, fetch-and-add, and fetch-and-xor operations even when we just have > - 32b atomics. Values we add or xor are less than or equal to 1<<31 (*), > - so we only have to make overflow-and-addition atomic wrt. to concurrent > - load operations and xor operations. To do that, we split each counter into > - two 32b values of which we reserve the MSB of each to represent an > - overflow from the lower-order half to the higher-order half. > - > - In the common case, the state is (higher-order / lower-order half, and . is > - basically concatenation of the bits): > - 0.h / 0.l = h.l > - > - When we add a value of x that overflows (i.e., 0.l + x == 1.L), we run the > - following steps S1-S4 (the values these represent are on the right-hand > - side): > - S1: 0.h / 1.L == (h+1).L > - S2: 1.(h+1) / 1.L == (h+1).L > - S3: 1.(h+1) / 0.L == (h+1).L > - S4: 0.(h+1) / 0.L == (h+1).L > - If the LSB of the higher-order half is set, readers will ignore the > - overflow bit in the lower-order half. > - > - To get an atomic snapshot in load operations, we exploit that the > - higher-order half is monotonically increasing; if we load a value V from > - it, then read the lower-order half, and then read the higher-order half > - again and see the same value V, we know that both halves have existed in > - the sequence of values the full counter had. This is similar to the > - validated reads in the time-based STMs in GCC's libitm (e.g., > - method_ml_wt). > - > - The xor operation needs to be an atomic read-modify-write. The write > +#else /* !__HAVE_64B_ATOMICS */ > + > +/* The xor operation needs to be an atomic read-modify-write. The write > itself is not an issue as it affects just the lower-order half but not bits > used in the add operation. To make the full fetch-and-xor atomic, we > exploit that concurrently, the value can increase by at most 1<<31 (*): The > @@ -97,117 +70,18 @@ __condvar_add_g1_start_relaxed (pthread_cond_t *cond, unsigned int val) > than __PTHREAD_COND_MAX_GROUP_SIZE waiters can enter concurrently and thus > increment __wseq. Therefore, if the xor operation observes a value of > __wseq, then the value it applies the modification to later on can be > - derived (see below). > - > - One benefit of this scheme is that this makes load operations > - obstruction-free because unlike if we would just lock the counter, readers > - can almost always interpret a snapshot of each halves. Readers can be > - forced to read a new snapshot when the read is concurrent with an overflow. > - However, overflows will happen infrequently, so load operations are > - practically lock-free. > - > - (*) The highest value we add is __PTHREAD_COND_MAX_GROUP_SIZE << 2 to > - __g1_start (the two extra bits are for the lock in the two LSBs of > - __g1_start). */ > - > -typedef struct > -{ > - unsigned int low; > - unsigned int high; > -} _condvar_lohi; > - > -static uint64_t > -__condvar_fetch_add_64_relaxed (_condvar_lohi *lh, unsigned int op) > -{ > - /* S1. Note that this is an atomic read-modify-write so it extends the > - release sequence of release MO store at S3. */ > - unsigned int l = atomic_fetch_add_relaxed (&lh->low, op); > - unsigned int h = atomic_load_relaxed (&lh->high); > - uint64_t result = ((uint64_t) h << 31) | l; > - l += op; > - if ((l >> 31) > 0) > - { > - /* Overflow. Need to increment higher-order half. Note that all > - add operations are ordered in happens-before. */ > - h++; > - /* S2. Release MO to synchronize with the loads of the higher-order half > - in the load operation. See __condvar_load_64_relaxed. */ > - atomic_store_release (&lh->high, h | ((unsigned int) 1 << 31)); > - l ^= (unsigned int) 1 << 31; > - /* S3. See __condvar_load_64_relaxed. */ > - atomic_store_release (&lh->low, l); > - /* S4. Likewise. */ > - atomic_store_release (&lh->high, h); > - } > - return result; > -} > - > -static uint64_t > -__condvar_load_64_relaxed (_condvar_lohi *lh) > -{ > - unsigned int h, l, h2; > - do > - { > - /* This load and the second one below to the same location read from the > - stores in the overflow handling of the add operation or the > - initializing stores (which is a simple special case because > - initialization always completely happens before further use). > - Because no two stores to the higher-order half write the same value, > - the loop ensures that if we continue to use the snapshot, this load > - and the second one read from the same store operation. All candidate > - store operations have release MO. > - If we read from S2 in the first load, then we will see the value of > - S1 on the next load (because we synchronize with S2), or a value > - later in modification order. We correctly ignore the lower-half's > - overflow bit in this case. If we read from S4, then we will see the > - value of S3 in the next load (or a later value), which does not have > - the overflow bit set anymore. > - */ > - h = atomic_load_acquire (&lh->high); > - /* This will read from the release sequence of S3 (i.e, either the S3 > - store or the read-modify-writes at S1 following S3 in modification > - order). Thus, the read synchronizes with S3, and the following load > - of the higher-order half will read from the matching S2 (or a later > - value). > - Thus, if we read a lower-half value here that already overflowed and > - belongs to an increased higher-order half value, we will see the > - latter and h and h2 will not be equal. */ > - l = atomic_load_acquire (&lh->low); > - /* See above. */ > - h2 = atomic_load_relaxed (&lh->high); > - } > - while (h != h2); > - if (((l >> 31) > 0) && ((h >> 31) > 0)) > - l ^= (unsigned int) 1 << 31; > - return ((uint64_t) (h & ~((unsigned int) 1 << 31)) << 31) + l; > -} > - > -static uint64_t __attribute__ ((unused)) > -__condvar_load_wseq_relaxed (pthread_cond_t *cond) > -{ > - return __condvar_load_64_relaxed ((_condvar_lohi *) &cond->__data.__wseq32); > -} > - > -static uint64_t __attribute__ ((unused)) > -__condvar_fetch_add_wseq_acquire (pthread_cond_t *cond, unsigned int val) > -{ > - uint64_t r = __condvar_fetch_add_64_relaxed > - ((_condvar_lohi *) &cond->__data.__wseq32, val); > - atomic_thread_fence_acquire (); > - return r; > -} > + derived. */ > Ok. > static uint64_t __attribute__ ((unused)) > __condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val) > { > - _condvar_lohi *lh = (_condvar_lohi *) &cond->__data.__wseq32; > /* First, get the current value. See __condvar_load_64_relaxed. */ > unsigned int h, l, h2; > do > { > - h = atomic_load_acquire (&lh->high); > - l = atomic_load_acquire (&lh->low); > - h2 = atomic_load_relaxed (&lh->high); > + h = atomic_load_acquire (&cond->__data.__wseq.__value32.__high); > + l = atomic_load_acquire (&cond->__data.__wseq.__value32.__low); > + h2 = atomic_load_relaxed (&cond->__data.__wseq.__value32.__high); > } > while (h != h2); > if (((l >> 31) > 0) && ((h >> 31) == 0)) Ok. > @@ -219,8 +93,9 @@ __condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val) > earlier in modification order than the following fetch-xor. > This uses release MO to make the full operation have release semantics > (all other operations access the lower-order half). */ > - unsigned int l2 = atomic_fetch_xor_release (&lh->low, val) > - & ~((unsigned int) 1 << 31); > + unsigned int l2 > + = (atomic_fetch_xor_release (&cond->__data.__wseq.__value32.__low, val) > + & ~((unsigned int) 1 << 31)); > if (l2 < l) > /* The lower-order half overflowed in the meantime. This happened exactly > once due to the limit on concurrent waiters (see above). */ > @@ -228,22 +103,7 @@ __condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val) > return ((uint64_t) h << 31) + l2; > } Ok. > > -static uint64_t __attribute__ ((unused)) > -__condvar_load_g1_start_relaxed (pthread_cond_t *cond) > -{ > - return __condvar_load_64_relaxed > - ((_condvar_lohi *) &cond->__data.__g1_start32); > -} > - > -static void __attribute__ ((unused)) > -__condvar_add_g1_start_relaxed (pthread_cond_t *cond, unsigned int val) > -{ > - ignore_value (__condvar_fetch_add_64_relaxed > - ((_condvar_lohi *) &cond->__data.__g1_start32, val)); > -} > - > -#endif /* !__HAVE_64B_ATOMICS */ > - > +#endif /* !__HAVE_64B_ATOMICS */ > > /* The lock that signalers use. See pthread_cond_wait_common for uses. > The lock is our normal three-state lock: not acquired (0) / acquired (1) / Ok. > diff --git a/nptl/tst-cond22.c b/nptl/tst-cond22.c > index 64f19ea0a5..1336e9c79d 100644 > --- a/nptl/tst-cond22.c > +++ b/nptl/tst-cond22.c > @@ -106,8 +106,11 @@ do_test (void) > status = 1; > } > > - printf ("cond = { %llu, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n", > - c.__data.__wseq, c.__data.__g1_start, > + printf ("cond = { 0x%x:%x, 0x%x:%x, %u/%u/%u, %u/%u/%u, %u, %u }\n", > + c.__data.__wseq.__value32.__high, > + c.__data.__wseq.__value32.__low, > + c.__data.__g1_start.__value32.__high, > + c.__data.__g1_start.__value32.__low, > c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0], > c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1], > c.__data.__g1_orig_size, c.__data.__wrefs); > @@ -149,8 +152,11 @@ do_test (void) > status = 1; > } > > - printf ("cond = { %llu, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n", > - c.__data.__wseq, c.__data.__g1_start, > + printf ("cond = { 0x%x:%x, 0x%x:%x, %u/%u/%u, %u/%u/%u, %u, %u }\n", > + c.__data.__wseq.__value32.__high, > + c.__data.__wseq.__value32.__low, > + c.__data.__g1_start.__value32.__high, > + c.__data.__g1_start.__value32.__low, > c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0], > c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1], > c.__data.__g1_orig_size, c.__data.__wrefs); Ok. > diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h > index 44bf1e358d..b82a79a43e 100644 > --- a/sysdeps/nptl/bits/thread-shared-types.h > +++ b/sysdeps/nptl/bits/thread-shared-types.h > @@ -43,6 +43,8 @@ > > #include > > +#include > + > > /* Common definition of pthread_mutex_t. */ > > @@ -91,24 +93,8 @@ typedef struct __pthread_internal_slist > > struct __pthread_cond_s > { > - __extension__ union > - { > - __extension__ unsigned long long int __wseq; > - struct > - { > - unsigned int __low; > - unsigned int __high; > - } __wseq32; > - }; > - __extension__ union > - { > - __extension__ unsigned long long int __g1_start; > - struct > - { > - unsigned int __low; > - unsigned int __high; > - } __g1_start32; > - }; > + __atomic_wide_counter __wseq; > + __atomic_wide_counter __g1_start; > unsigned int __g_refs[2] __LOCK_ALIGNMENT; > unsigned int __g_size[2]; > unsigned int __g1_orig_size; > Ok.