From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
To: Lipeng Zhu <lipeng.zhu@intel.com>, jakub@redhat.com
Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org,
hongjiu.lu@intel.com, pan.deng@intel.com, rep.dot.nop@gmail.com,
tianyou.li@intel.com, tkoenig@netcologne.de,
wangyang.guo@intel.com
Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock
Date: Thu, 14 Dec 2023 15:50:16 +0000 [thread overview]
Message-ID: <523c6fbf-344c-48b8-8508-7e2acb088606@arm.com> (raw)
In-Reply-To: <20231209153944.3746165-1-lipeng.zhu@intel.com>
On 09/12/2023 15:39, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
>
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
>
> libgcc/ChangeLog:
>
> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> (__gthrw): New function.
> (__gthread_rwlock_rdlock): New function.
> (__gthread_rwlock_tryrdlock): New function.
> (__gthread_rwlock_wrlock): New function.
> (__gthread_rwlock_trywrlock): New function.
> (__gthread_rwlock_unlock): New function.
>
> libgfortran/ChangeLog:
>
> * io/async.c (DEBUG_LINE): New macro.
> * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> (CHECK_RDLOCK): New macro.
> (CHECK_WRLOCK): New macro.
> (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> (IN_RWLOCK_DEBUG_QUEUE): New macro.
> (RDLOCK): New macro.
> (WRLOCK): New macro.
> (RWUNLOCK): New macro.
> (RD_TO_WRLOCK): New macro.
> (INTERN_RDLOCK): New macro.
> (INTERN_WRLOCK): New macro.
> (INTERN_RWUNLOCK): New macro.
> * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> a comment.
> (unit_lock): Remove including associated internal_proto.
> (unit_rwlock): New declarations including associated internal_proto.
> (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> unit_lock.
> * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
> unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> (st_write_done_worker): Likewise.
> * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> comment. Use unit_rwlock variable instead of unit_lock variable.
> (get_gfc_unit_from_unit_root): New function.
> (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> instead of LOCK and UNLOCK on unit_lock.
> (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
> LOCK and UNLOCK on unit_lock.
> (close_units): Likewise.
> (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> unit_lock.
> * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> instead of LOCK and UNLOCK on unit_lock.
> (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> of LOCK and UNLOCK on unit_lock.
>
It looks like this has broken builds on arm-none-eabi when using newlib:
In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
1023 | WRLOCK (&unit_rwlock);
| ^~~~~~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
1025 | RWUNLOCK (&unit_rwlock);
| ^~~~~~~~
R.
> ---
> v1 -> v2:
> Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
>
> v2 -> v3:
> Rebase the patch with trunk branch.
>
> v3 -> v4:
> Update the comments.
>
> v4 -> v5:
> Fix typos and code formatter.
>
> v5 -> v6:
> Add unit tests.
>
> v6 -> v7:
> Update ChangeLog and code formatter.
>
> Reviewed-by: Hongjiu Lu <hongjiu.lu@intel.com>
> Reviewed-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
> Reviewed-by: Thomas Koenig <tkoenig@netcologne.de>
> Reviewed-by: Jakub Jelinek <jakub@redhat.com>
> Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
> ---
> libgcc/gthr-posix.h | 60 +++++++
> libgfortran/io/async.c | 4 +
> libgfortran/io/async.h | 151 ++++++++++++++++++
> libgfortran/io/io.h | 15 +-
> libgfortran/io/transfer.c | 8 +-
> libgfortran/io/unit.c | 117 +++++++++-----
> libgfortran/io/unix.c | 16 +-
> .../testsuite/libgomp.fortran/rwlock_1.f90 | 33 ++++
> .../testsuite/libgomp.fortran/rwlock_2.f90 | 22 +++
> .../testsuite/libgomp.fortran/rwlock_3.f90 | 18 +++
> 10 files changed, 386 insertions(+), 58 deletions(-)
> create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90
>
> diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
> index aebcfdd9f4c..73283082997 100644
> --- a/libgcc/gthr-posix.h
> +++ b/libgcc/gthr-posix.h
> @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
> typedef pthread_key_t __gthread_key_t;
> typedef pthread_once_t __gthread_once_t;
> typedef pthread_mutex_t __gthread_mutex_t;
> +#ifndef __cplusplus
> +typedef pthread_rwlock_t __gthread_rwlock_t;
> +#endif
> typedef pthread_mutex_t __gthread_recursive_mutex_t;
> typedef pthread_cond_t __gthread_cond_t;
> typedef struct timespec __gthread_time_t;
> @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
>
> #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
> #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
> +#ifndef __cplusplus
> +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
> +#endif
> #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
> #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
> #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER
> @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init)
> __gthrw(pthread_mutexattr_settype)
> __gthrw(pthread_mutexattr_destroy)
>
> +#ifndef __cplusplus
> +__gthrw(pthread_rwlock_rdlock)
> +__gthrw(pthread_rwlock_tryrdlock)
> +__gthrw(pthread_rwlock_wrlock)
> +__gthrw(pthread_rwlock_trywrlock)
> +__gthrw(pthread_rwlock_unlock)
> +#endif
>
> #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)
> /* Objective-C. */
> @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond)
> return __gthrw_(pthread_cond_destroy) (__cond);
> }
>
> +#ifndef __cplusplus
> +static inline int
> +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock)
> +{
> + if (__gthread_active_p ())
> + return __gthrw_(pthread_rwlock_rdlock) (__rwlock);
> + else
> + return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock)
> +{
> + if (__gthread_active_p ())
> + return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock);
> + else
> + return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock)
> +{
> + if (__gthread_active_p ())
> + return __gthrw_(pthread_rwlock_wrlock) (__rwlock);
> + else
> + return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock)
> +{
> + if (__gthread_active_p ())
> + return __gthrw_(pthread_rwlock_trywrlock) (__rwlock);
> + else
> + return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock)
> +{
> + if (__gthread_active_p ())
> + return __gthrw_(pthread_rwlock_unlock) (__rwlock);
> + else
> + return 0;
> +}
> +#endif
> +
> #endif /* _LIBOBJC */
>
> #endif /* ! GCC_GTHR_POSIX_H */
> diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c
> index 8fa1f0d4ce0..91bf397105d 100644
> --- a/libgfortran/io/async.c
> +++ b/libgfortran/io/async.c
> @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX);
>
> DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;)
> DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;)
> +#ifdef __GTHREAD_RWLOCK_INIT
> +DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;)
> +DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;)
> +#endif
>
> /* Current unit for asynchronous I/O. Needed for error reporting. */
>
> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h
> index ad226c8e856..f112f6870bb 100644
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -210,6 +210,128 @@
> DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
> } while (0)
>
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RWLOCK_DEBUG_ADD(rwlock) do { \
> + aio_rwlock_debug *n; \
> + n = xmalloc (sizeof (aio_rwlock_debug)); \
> + n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \
> + if (n->prev) \
> + n->prev->next = n; \
> + n->next = NULL; \
> + n->line = __LINE__; \
> + n->func = __FUNCTION__; \
> + n->rw = rwlock; \
> + if (!aio_rwlock_debug_head) { \
> + aio_rwlock_debug_head = n; \
> + } \
> + } while (0)
> +
> +#define CHECK_RDLOCK(rwlock, status) do { \
> + aio_rwlock_debug *curr; \
> + INTERN_WRLOCK (&debug_queue_rwlock); \
> + if (__gthread_rwlock_tryrdlock (rwlock)) { \
> + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \
> + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
> + } else \
> + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \
> + } \
> + else { \
> + __gthread_rwlock_unlock (rwlock); \
> + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \
> + } \
> + INTERN_RWUNLOCK (&debug_queue_rwlock); \
> + }while (0)
> +
> +#define CHECK_WRLOCK(rwlock, status) do { \
> + aio_rwlock_debug *curr; \
> + INTERN_WRLOCK (&debug_queue_rwlock); \
> + if (__gthread_rwlock_trywrlock (rwlock)) { \
> + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \
> + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
> + } else \
> + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \
> + } \
> + else { \
> + __gthread_rwlock_unlock (rwlock); \
> + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \
> + } \
> + INTERN_RWUNLOCK (&debug_queue_rwlock); \
> + }while (0)
> +
> +#define TAIL_RWLOCK_DEBUG_QUEUE ({ \
> + aio_rwlock_debug *curr = aio_rwlock_debug_head; \
> + while (curr && curr->next) { \
> + curr = curr->next; \
> + } \
> + curr; \
> + })
> +
> +#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({ \
> + __label__ end; \
> + aio_rwlock_debug *curr = aio_rwlock_debug_head; \
> + while (curr) { \
> + if (curr->rw == rwlock) { \
> + goto end; \
> + } \
> + curr = curr->next; \
> + } \
> + end:; \
> + curr; \
> + })
> +
> +#define RDLOCK(rwlock) do { \
> + char status[200]; \
> + CHECK_RDLOCK (rwlock, status); \
> + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \
> + DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
> + INTERN_RDLOCK (rwlock); \
> + INTERN_WRLOCK (&debug_queue_rwlock); \
> + RWLOCK_DEBUG_ADD (rwlock); \
> + INTERN_RWUNLOCK (&debug_queue_rwlock); \
> + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
> + } while (0)
> +
> +#define WRLOCK(rwlock) do { \
> + char status[200]; \
> + CHECK_WRLOCK (rwlock, status); \
> + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \
> + DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
> + INTERN_WRLOCK (rwlock); \
> + INTERN_WRLOCK (&debug_queue_rwlock); \
> + RWLOCK_DEBUG_ADD (rwlock); \
> + INTERN_RWUNLOCK (&debug_queue_rwlock); \
> + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
> + } while (0)
> +
> +#define RWUNLOCK(rwlock) do { \
> + aio_rwlock_debug *curr; \
> + DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN "RWUNLOCK: " DEBUG_NORM #rwlock, \
> + __FUNCTION__, __LINE__, (void *) rwlock); \
> + INTERN_WRLOCK (&debug_queue_rwlock); \
> + curr = IN_RWLOCK_DEBUG_QUEUE (rwlock); \
> + if (curr) \
> + { \
> + if (curr->prev) \
> + curr->prev->next = curr->next; \
> + if (curr->next) { \
> + curr->next->prev = curr->prev; \
> + if (curr == aio_rwlock_debug_head) \
> + aio_rwlock_debug_head = curr->next; \
> + } else { \
> + if (curr == aio_rwlock_debug_head) \
> + aio_rwlock_debug_head = NULL; \
> + } \
> + free (curr); \
> + } \
> + INTERN_RWUNLOCK (&debug_queue_rwlock); \
> + INTERN_RWUNLOCK (rwlock); \
> + } while (0)
> +
> +#define RD_TO_WRLOCK(rwlock) \
> + RWUNLOCK (rwlock); \
> + WRLOCK (rwlock);
> +#endif
> +
> #define DEBUG_LINE(...) __VA_ARGS__
>
> #else
> @@ -221,12 +343,31 @@
> #define LOCK(mutex) INTERN_LOCK (mutex)
> #define UNLOCK(mutex) INTERN_UNLOCK (mutex)
> #define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex))
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock)
> +#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock)
> +#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) \
> + RWUNLOCK (rwlock); \
> + WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) do {} while (0)
> #endif
>
> #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>
> #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock)
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock)
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock)
> +
> #if ASYNC_IO
>
> /* au->lock has to be held when calling this macro. */
> @@ -288,8 +429,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{
> struct aio_lock_debug *prev;
> } aio_lock_debug;)
>
> +DEBUG_LINE (typedef struct aio_rwlock_debug{
> + __gthread_rwlock_t *rw;
> + int line;
> + const char *func;
> + struct aio_rwlock_debug *next;
> + struct aio_rwlock_debug *prev;
> +} aio_rwlock_debug;)
> +
> DEBUG_LINE (extern aio_lock_debug *aio_debug_head;)
> DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;)
> +DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;)
> +DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;)
>
> /* Thread - local storage of the current unit we are looking at. Needed for
> error reporting. */
> diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
> index ecdf1dd3f05..15daa0995b1 100644
> --- a/libgfortran/io/io.h
> +++ b/libgfortran/io/io.h
> @@ -690,7 +690,7 @@ typedef struct gfc_unit
> from the UNIT_ROOT tree, but doesn't free it and the
> last of the waiting threads will do that.
> This must be either atomically increased/decreased, or
> - always guarded by UNIT_LOCK. */
> + always guarded by UNIT_RWLOCK. */
> int waiting;
> /* Flag set by close_unit if the unit as been closed.
> Must be manipulated under unit's lock. */
> @@ -769,8 +769,13 @@ internal_proto(default_recl);
> extern gfc_unit *unit_root;
> internal_proto(unit_root);
>
> -extern __gthread_mutex_t unit_lock;
> -internal_proto(unit_lock);
> +#ifdef __GTHREAD_RWLOCK_INIT
> +extern __gthread_rwlock_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#else
> +extern __gthread_mutex_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#endif
>
> extern int close_unit (gfc_unit *);
> internal_proto(close_unit);
> @@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u)
> #ifdef HAVE_ATOMIC_FETCH_ADD
> (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
> #else
> - __gthread_mutex_lock (&unit_lock);
> + WRLOCK (&unit_rwlock);
> u->waiting--;
> - __gthread_mutex_unlock (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> #endif
> }
>
> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
> index 500db90c828..00d516adcb0 100644
> --- a/libgfortran/io/transfer.c
> +++ b/libgfortran/io/transfer.c
> @@ -4538,9 +4538,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock)
> if (free_newunit)
> {
> /* Avoid inverse lock issues by placing after unlock_unit. */
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> newunit_free (dtp->common.unit);
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> }
> }
>
> @@ -4634,9 +4634,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock)
> if (free_newunit)
> {
> /* Avoid inverse lock issues by placing after unlock_unit. */
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> newunit_free (dtp->common.unit);
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> }
> }
>
> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
> index 36d025949c2..0c8c35e464e 100644
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
>
>
> /* IO locking rules:
> - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> + UNIT_RWLOCK is a master rw lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> + Using an rwlock improves efficiency by allowing us to separate readers
> + and writers of both UNIT_ROOT and UNIT_CACHE.
> Concurrent use of different units should be supported, so
> each unit has its own lock, LOCK.
> Open should be atomic with its reopening of units and list_read.c
> in several places needs find_unit another unit while holding stdin
> - unit's lock, so it must be possible to acquire UNIT_LOCK while holding
> + unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding
> some unit's lock. Therefore to avoid deadlocks, it is forbidden
> - to acquire unit's private locks while holding UNIT_LOCK, except
> + to acquire unit's private locks while holding UNIT_RWLOCK, except
> for freshly created units (where no other thread can get at their
> address yet) or when using just trylock rather than lock operation.
> In addition to unit's private lock each unit has a WAITERS counter
> and CLOSED flag. WAITERS counter must be either only
> atomically incremented/decremented in all places (if atomic builtins
> - are supported), or protected by UNIT_LOCK in all places (otherwise).
> + are supported), or protected by UNIT_RWLOCK in all places (otherwise).
> CLOSED flag must be always protected by unit's LOCK.
> - After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held,
> + After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held,
> WAITERS must be incremented to avoid concurrent close from freeing
> - the unit between unlocking UNIT_LOCK and acquiring unit's LOCK.
> - Unit freeing is always done under UNIT_LOCK. If close_unit sees any
> + the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK.
> + Unit freeing is always done under UNIT_RWLOCK. If close_unit sees any
> WAITERS, it doesn't free the unit but instead sets the CLOSED flag
> and the thread that decrements WAITERS to zero while CLOSED flag is
> - set is responsible for freeing it (while holding UNIT_LOCK).
> + set is responsible for freeing it (while holding UNIT_RWLOCK).
> flush_all_units operation is iterating over the unit tree with
> - increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to
> + increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to
> flush each unit (and therefore needs the unit's LOCK held as well).
> To avoid deadlocks, it just trylocks the LOCK and if unsuccessful,
> - remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires
> - unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with
> + remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires
> + unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with
> the smallest UNIT_NUMBER above the last one flushed.
>
> If find_unit/find_or_create_unit/find_file/get_unit routines return
> @@ -101,10 +103,14 @@ gfc_offset max_offset;
> gfc_offset default_recl;
>
> gfc_unit *unit_root;
> +#ifdef __GTHREAD_RWLOCK_INIT
> +__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT;
> +#else
> #ifdef __GTHREAD_MUTEX_INIT
> -__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT;
> +__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT;
> #else
> -__gthread_mutex_t unit_lock;
> +__gthread_mutex_t unit_rwlock;
> +#endif
> #endif
>
> /* We use these filenames for error reporting. */
> @@ -317,6 +323,28 @@ delete_unit (gfc_unit *old)
> unit_root = delete_treap (old, unit_root);
> }
>
> +/* get_gfc_unit_from_root()-- Given an integer, return a pointer
> + to the unit structure. Returns NULL if the unit does not exist,
> + otherwise returns a locked unit. */
> +
> +static inline gfc_unit *
> +get_gfc_unit_from_unit_root (int n)
> +{
> + gfc_unit *p;
> + int c = 0;
> + p = unit_root;
> + while (p != NULL)
> + {
> + c = compare (n, p->unit_number);
> + if (c < 0)
> + p = p->left;
> + if (c > 0)
> + p = p->right;
> + if (c == 0)
> + break;
> + }
> + return p;
> +}
>
> /* get_gfc_unit()-- Given an integer, return a pointer to the unit
> structure. Returns NULL if the unit does not exist,
> @@ -329,7 +357,7 @@ get_gfc_unit (int n, int do_create)
> int c, created = 0;
>
> NOTE ("Unit n=%d, do_create = %d", n, do_create);
> - LOCK (&unit_lock);
> + RDLOCK (&unit_rwlock);
>
> retry:
> for (c = 0; c < CACHE_SIZE; c++)
> @@ -339,18 +367,25 @@ retry:
> goto found;
> }
>
> - p = unit_root;
> - while (p != NULL)
> - {
> - c = compare (n, p->unit_number);
> - if (c < 0)
> - p = p->left;
> - if (c > 0)
> - p = p->right;
> - if (c == 0)
> - break;
> - }
> -
> + p = get_gfc_unit_from_unit_root (n);
> +
> + /* We did not find a unit in the cache nor in the unit list,
> + create a new (locked) unit and insert into the unit list and
> + cache. Manipulating either or both the unit list and the unit
> + cache requires to hold a write-lock [for obvious reasons]:
> + By separating the read/write lock, we will greatly reduce
> + the contention on the read part, while the write part is
> + unlikely once the unit hits the cache. */
> + RD_TO_WRLOCK (&unit_rwlock);
> +
> + /* In the case of high concurrency, when multiple threads want
> + to find or create the same unit, the unit number may not
> + exist in cache nor in the unit list during read phase, then
> + threads will acquire the write-lock to insert the same unit
> + number to unit list. To avoid duplicate insert, we need to
> + find unit list once again to ensure that the unit number
> + not exist. */
> + p = get_gfc_unit_from_unit_root (n);
> if (p == NULL && do_create)
> {
> p = insert_unit (n);
> @@ -368,8 +403,8 @@ retry:
> if (created)
> {
> /* Newly created units have their lock held already
> - from insert_unit. Just unlock UNIT_LOCK and return. */
> - UNLOCK (&unit_lock);
> + from insert_unit. Just unlock UNIT_RWLOCK and return. */
> + RWUNLOCK (&unit_rwlock);
> return p;
> }
>
> @@ -380,7 +415,7 @@ found:
> if (! TRYLOCK (&p->lock))
> {
> /* assert (p->closed == 0); */
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> return p;
> }
>
> @@ -388,14 +423,14 @@ found:
> }
>
>
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
>
> if (p != NULL && (p->child_dtio == 0))
> {
> LOCK (&p->lock);
> if (p->closed)
> {
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> UNLOCK (&p->lock);
> if (predec_waiting_locked (p) == 0)
> destroy_unit_mutex (p);
> @@ -594,8 +629,8 @@ init_units (void)
> #endif
> #endif
>
> -#ifndef __GTHREAD_MUTEX_INIT
> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
> +#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT))
> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
> #endif
>
> if (sizeof (max_offset) == 8)
> @@ -732,7 +767,7 @@ close_unit_1 (gfc_unit *u, int locked)
>
> u->closed = 1;
> if (!locked)
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
>
> for (i = 0; i < CACHE_SIZE; i++)
> if (unit_cache[i] == u)
> @@ -759,7 +794,7 @@ close_unit_1 (gfc_unit *u, int locked)
> destroy_unit_mutex (u);
>
> if (!locked)
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
>
> return rc;
> }
> @@ -796,10 +831,10 @@ close_unit (gfc_unit *u)
> void
> close_units (void)
> {
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> while (unit_root != NULL)
> close_unit_1 (unit_root, 1);
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
>
> free (newunits);
>
> @@ -906,7 +941,7 @@ finish_last_advance_record (gfc_unit *u)
> int
> newunit_alloc (void)
> {
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> if (!newunits)
> {
> newunits = xcalloc (16, 1);
> @@ -920,7 +955,7 @@ newunit_alloc (void)
> {
> newunits[ii] = true;
> newunit_lwi = ii + 1;
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> return -ii + NEWUNIT_START;
> }
> }
> @@ -933,12 +968,12 @@ newunit_alloc (void)
> memset (newunits + old_size, 0, old_size);
> newunits[old_size] = true;
> newunit_lwi = old_size + 1;
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> return -old_size + NEWUNIT_START;
> }
>
>
> -/* Free a previously allocated newunit= unit number. unit_lock must
> +/* Free a previously allocated newunit= unit number. unit_rwlock must
> be held when calling. */
>
> void
> diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c
> index d466df979df..dcae051744d 100644
> --- a/libgfortran/io/unix.c
> +++ b/libgfortran/io/unix.c
> @@ -1773,7 +1773,7 @@ find_file (const char *file, gfc_charlen_type file_len)
> id = id_from_path (path);
> #endif
>
> - LOCK (&unit_lock);
> + RDLOCK (&unit_rwlock);
> retry:
> u = find_file0 (unit_root, FIND_FILE0_ARGS);
> if (u != NULL)
> @@ -1782,19 +1782,19 @@ retry:
> if (! __gthread_mutex_trylock (&u->lock))
> {
> /* assert (u->closed == 0); */
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> goto done;
> }
>
> inc_waiting_locked (u);
> }
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> if (u != NULL)
> {
> LOCK (&u->lock);
> if (u->closed)
> {
> - LOCK (&unit_lock);
> + RDLOCK (&unit_rwlock);
> UNLOCK (&u->lock);
> if (predec_waiting_locked (u) == 0)
> free (u);
> @@ -1838,13 +1838,13 @@ flush_all_units (void)
> gfc_unit *u;
> int min_unit = 0;
>
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> do
> {
> u = flush_all_units_1 (unit_root, min_unit);
> if (u != NULL)
> inc_waiting_locked (u);
> - UNLOCK (&unit_lock);
> + RWUNLOCK (&unit_rwlock);
> if (u == NULL)
> return;
>
> @@ -1855,13 +1855,13 @@ flush_all_units (void)
> if (u->closed == 0)
> {
> sflush (u->s);
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> UNLOCK (&u->lock);
> (void) predec_waiting_locked (u);
> }
> else
> {
> - LOCK (&unit_lock);
> + WRLOCK (&unit_rwlock);
> UNLOCK (&u->lock);
> if (predec_waiting_locked (u) == 0)
> free (u);
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> new file mode 100644
> index 00000000000..f90ecbeb00f
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> @@ -0,0 +1,33 @@
> +! { dg-do run }
> +! Multiple threads call open/write/read/close in concurrency with different unit number,
> +! threads can acquire read lock concurrently, to find unit from cache or unit list very frequently,
> +! if not found, threads will acquire the write lock exclusively to insert unit to cache and unit list.
> +! This test case is used to stress both the read and write lock when access unit cache and list.
> +program main
> + use omp_lib
> + implicit none
> + integer:: unit_number, v1, v2, i
> + character(11) :: file_name
> + character(3) :: async = "no"
> + !$omp parallel private (unit_number, v1, v2, file_name, async, i)
> + do i = 0, 100
> + unit_number = 10 + omp_get_thread_num ()
> + write (file_name, "(I3, A)") unit_number, "_tst.dat"
> + file_name = adjustl(file_name)
> + open (unit_number, file=file_name, asynchronous="yes")
> + ! call inquire with file parameter to test find_file in unix.c
> + inquire (file=file_name, asynchronous=async)
> + if (async /= "YES") stop 1
> + write (unit_number, *, asynchronous="yes") unit_number
> + write (unit_number, *, asynchronous="yes") unit_number + 1
> + close(unit_number)
> +
> + open (unit_number, file = file_name, asynchronous="yes")
> + read (unit_number, *, asynchronous="yes") v1
> + read (unit_number, *, asynchronous="yes") v2
> + wait (unit_number)
> + if ((v1 /= unit_number) .or. (v2 /= unit_number + 1)) stop 2
> + close(unit_number, status="delete")
> + end do
> + !$omp end parallel
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> new file mode 100644
> index 00000000000..08c80d14cfb
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> @@ -0,0 +1,22 @@
> +! { dg-do run }
> +! Insert a unit into cache at the beginning, then start multiple
> +! threads to access the same unit concurrency, unit will be found in unit cache during the read lock phase.
> +! This test case is used to test the read lock when access unit cache and list.
> +program main
> + use omp_lib
> + implicit none
> + integer:: thread_id, total_threads, i, j
> + total_threads = omp_get_max_threads ()
> + open (10, file='tst.dat', asynchronous="yes")
> + !$omp parallel private (thread_id, i, j)
> + do i = 1, 100
> + thread_id = omp_get_thread_num ()
> + do j = 1, 100
> + write (10, *, asynchronous="yes") thread_id, i
> + end do
> + end do
> + !$omp end parallel
> + ! call inquire with file parameter to test find_file in unix.c
> + call flush ()
> + close (10, status="delete")
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> new file mode 100644
> index 00000000000..1906fcd7a0b
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> @@ -0,0 +1,18 @@
> +! { dg-do run }
> +! Find or create the same unit number in concurrency,
> +! at beginning, threads cannot find the unit in cache or unit list,
> +! then threads will acquire the write lock to insert unit.
> +! This test case is used to ensure that no duplicate unit number will be
> +! inserted into cache nor unit list when same unit was accessed in concurrency.
> +program main
> + use omp_lib
> + implicit none
> + integer:: i
> + !$omp parallel private (i)
> + do i = 1, 100
> + open (10, file='tst.dat', asynchronous="yes")
> + ! Delete the unit number from cache and unit list to stress write lock.
> + close (10, status="delete")
> + end do
> + !$omp end parallel
> +end program
next prev parent reply other threads:[~2023-12-14 15:50 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 2:32 [PATCH v4] " Zhu, Lipeng
2023-05-16 7:08 ` Zhu, Lipeng
2023-05-23 2:53 ` Zhu, Lipeng
2023-05-24 19:18 ` Thomas Koenig
2023-08-18 3:06 ` Zhu, Lipeng
2023-09-14 8:33 ` Zhu, Lipeng
2023-10-23 1:21 ` Zhu, Lipeng
2023-10-23 5:52 ` Thomas Koenig
2023-10-23 23:59 ` Zhu, Lipeng
2023-11-01 10:14 ` Zhu, Lipeng
2023-11-02 9:58 ` Bernhard Reutner-Fischer
2023-11-23 9:36 ` Zhu, Lipeng
2023-12-07 5:18 ` Zhu, Lipeng
2023-08-18 3:18 ` [PATCH v6] " Zhu, Lipeng
2023-12-08 10:19 ` Jakub Jelinek
2023-12-09 15:13 ` Zhu, Lipeng
2023-12-09 15:39 ` [PATCH v7] " Lipeng Zhu
2023-12-09 15:23 ` Jakub Jelinek
2023-12-10 3:25 ` Zhu, Lipeng
2023-12-11 17:45 ` H.J. Lu
2023-12-12 2:05 ` Zhu, Lipeng
2023-12-13 20:52 ` Thomas Schwinge
2023-12-14 2:28 ` Zhu, Lipeng
2023-12-14 12:29 ` Thomas Schwinge
2023-12-14 12:39 ` Jakub Jelinek
2023-12-15 5:43 ` Zhu, Lipeng
2023-12-21 11:42 ` Thomas Schwinge
2023-12-22 6:48 ` Lipeng Zhu
2024-01-03 9:14 ` Lipeng Zhu
2024-01-17 13:25 ` Lipeng Zhu
2023-12-14 15:50 ` Richard Earnshaw (lists) [this message]
2023-12-15 11:31 ` Lipeng Zhu
2023-12-15 19:23 ` Richard Earnshaw
2024-01-02 11:57 ` Vaseeharan Vinayagamoorthy
2024-01-03 1:02 ` Lipeng Zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=523c6fbf-344c-48b8-8508-7e2acb088606@arm.com \
--to=richard.earnshaw@arm.com \
--cc=fortran@gcc.gnu.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongjiu.lu@intel.com \
--cc=jakub@redhat.com \
--cc=lipeng.zhu@intel.com \
--cc=pan.deng@intel.com \
--cc=rep.dot.nop@gmail.com \
--cc=tianyou.li@intel.com \
--cc=tkoenig@netcologne.de \
--cc=wangyang.guo@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).