From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3FCFB3858D20; Thu, 14 Dec 2023 15:50:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3FCFB3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3FCFB3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702569023; cv=none; b=A8hlTqoOlE5z7qL+5OMm/2QYkPHpxpW7AHVOpyQ20GJYZZv+QWqsDa+xNtWB/wQ9iwaqfCA+/21y7QhjE26sfxlNAhqPTcevCxarZjT9+L0J6hUQ4ARcffBxCJ/KWx6f7Q2XBNGTKlm97NNM/UXBlBiwHEMIf8P+onh6sZtRXOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702569023; c=relaxed/simple; bh=9g8M0J6s7on44ryXCctkRJULeJzGRB1RndNZdjC05fQ=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=MoJmQZH6HkGxaN2+N2PrFG625O2fDIHryQ5xkplSaaZCJJ3YkkJD/QWK5vVuPSifChSZkLjurVp0rlZAgZKy65Z+Z30P4QXEMHSKeMSVRtoYpOqz9lCoh9czdYL+tupmLgDba1vo/xP6qHwfJ4cyClxxG2M1jKaGMurgQdKdsQw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 633D8C15; Thu, 14 Dec 2023 07:51:04 -0800 (PST) Received: from [10.2.78.54] (e120077-lin.cambridge.arm.com [10.2.78.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 657863F5A1; Thu, 14 Dec 2023 07:50:17 -0800 (PST) Message-ID: <523c6fbf-344c-48b8-8508-7e2acb088606@arm.com> Date: Thu, 14 Dec 2023 15:50:16 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock Content-Language: en-GB To: Lipeng Zhu , jakub@redhat.com Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, pan.deng@intel.com, rep.dot.nop@gmail.com, tianyou.li@intel.com, tkoenig@netcologne.de, wangyang.guo@intel.com References: <20231209153944.3746165-1-lipeng.zhu@intel.com> From: "Richard Earnshaw (lists)" In-Reply-To: <20231209153944.3746165-1-lipeng.zhu@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3494.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPAM_BODY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 09/12/2023 15:39, Lipeng Zhu wrote: > This patch try to introduce the rwlock and split the read/write to > unit_root tree and unit_cache with rwlock instead of the mutex to > increase CPU efficiency. In the get_gfc_unit function, the percentage > to step into the insert_unit function is around 30%, in most instances, > we can get the unit in the phase of reading the unit_cache or unit_root > tree. So split the read/write phase by rwlock would be an approach to > make it more parallel. > > BTW, the IPC metrics can gain around 9x in our test > server with 220 cores. The benchmark we used is > https://github.com/rwesson/NEAT > > libgcc/ChangeLog: > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. > (__gthrw): New function. > (__gthread_rwlock_rdlock): New function. > (__gthread_rwlock_tryrdlock): New function. > (__gthread_rwlock_wrlock): New function. > (__gthread_rwlock_trywrlock): New function. > (__gthread_rwlock_unlock): New function. > > libgfortran/ChangeLog: > > * io/async.c (DEBUG_LINE): New macro. > * io/async.h (RWLOCK_DEBUG_ADD): New macro. > (CHECK_RDLOCK): New macro. > (CHECK_WRLOCK): New macro. > (TAIL_RWLOCK_DEBUG_QUEUE): New macro. > (IN_RWLOCK_DEBUG_QUEUE): New macro. > (RDLOCK): New macro. > (WRLOCK): New macro. > (RWUNLOCK): New macro. > (RD_TO_WRLOCK): New macro. > (INTERN_RDLOCK): New macro. > (INTERN_WRLOCK): New macro. > (INTERN_RWUNLOCK): New macro. > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in > a comment. > (unit_lock): Remove including associated internal_proto. > (unit_rwlock): New declarations including associated internal_proto. > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock > instead of __gthread_mutex_lock and __gthread_mutex_unlock on > unit_lock. > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on > unit_rwlock instead of LOCK and UNLOCK on unit_lock. > (st_write_done_worker): Likewise. > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' > comment. Use unit_rwlock variable instead of unit_lock variable. > (get_gfc_unit_from_unit_root): New function. > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock > instead of LOCK and UNLOCK on unit_lock. > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of > LOCK and UNLOCK on unit_lock. > (close_units): Likewise. > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on > unit_lock. > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock > instead of LOCK and UNLOCK on unit_lock. > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead > of LOCK and UNLOCK on unit_lock. > It looks like this has broken builds on arm-none-eabi when using newlib: In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran /runtime/error.c:27: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function ‘dec_waiting_unlocked’: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error : implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration] 1023 | WRLOCK (&unit_rwlock); | ^~~~~~ /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error : implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration] 1025 | RWUNLOCK (&unit_rwlock); | ^~~~~~~~ R. > --- > v1 -> v2: > Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. > > v2 -> v3: > Rebase the patch with trunk branch. > > v3 -> v4: > Update the comments. > > v4 -> v5: > Fix typos and code formatter. > > v5 -> v6: > Add unit tests. > > v6 -> v7: > Update ChangeLog and code formatter. > > Reviewed-by: Hongjiu Lu > Reviewed-by: Bernhard Reutner-Fischer > Reviewed-by: Thomas Koenig > Reviewed-by: Jakub Jelinek > Signed-off-by: Lipeng Zhu > --- > libgcc/gthr-posix.h | 60 +++++++ > libgfortran/io/async.c | 4 + > libgfortran/io/async.h | 151 ++++++++++++++++++ > libgfortran/io/io.h | 15 +- > libgfortran/io/transfer.c | 8 +- > libgfortran/io/unit.c | 117 +++++++++----- > libgfortran/io/unix.c | 16 +- > .../testsuite/libgomp.fortran/rwlock_1.f90 | 33 ++++ > .../testsuite/libgomp.fortran/rwlock_2.f90 | 22 +++ > .../testsuite/libgomp.fortran/rwlock_3.f90 | 18 +++ > 10 files changed, 386 insertions(+), 58 deletions(-) > create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90 > create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90 > create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90 > > diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h > index aebcfdd9f4c..73283082997 100644 > --- a/libgcc/gthr-posix.h > +++ b/libgcc/gthr-posix.h > @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; > typedef pthread_key_t __gthread_key_t; > typedef pthread_once_t __gthread_once_t; > typedef pthread_mutex_t __gthread_mutex_t; > +#ifndef __cplusplus > +typedef pthread_rwlock_t __gthread_rwlock_t; > +#endif > typedef pthread_mutex_t __gthread_recursive_mutex_t; > typedef pthread_cond_t __gthread_cond_t; > typedef struct timespec __gthread_time_t; > @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; > > #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER > #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function > +#ifndef __cplusplus > +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER > +#endif > #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT > #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) > #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER > @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) > __gthrw(pthread_mutexattr_settype) > __gthrw(pthread_mutexattr_destroy) > > +#ifndef __cplusplus > +__gthrw(pthread_rwlock_rdlock) > +__gthrw(pthread_rwlock_tryrdlock) > +__gthrw(pthread_rwlock_wrlock) > +__gthrw(pthread_rwlock_trywrlock) > +__gthrw(pthread_rwlock_unlock) > +#endif > > #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) > /* Objective-C. */ > @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) > return __gthrw_(pthread_cond_destroy) (__cond); > } > > +#ifndef __cplusplus > +static inline int > +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) > +{ > + if (__gthread_active_p ()) > + return __gthrw_(pthread_rwlock_rdlock) (__rwlock); > + else > + return 0; > +} > + > +static inline int > +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) > +{ > + if (__gthread_active_p ()) > + return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); > + else > + return 0; > +} > + > +static inline int > +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) > +{ > + if (__gthread_active_p ()) > + return __gthrw_(pthread_rwlock_wrlock) (__rwlock); > + else > + return 0; > +} > + > +static inline int > +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) > +{ > + if (__gthread_active_p ()) > + return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); > + else > + return 0; > +} > + > +static inline int > +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) > +{ > + if (__gthread_active_p ()) > + return __gthrw_(pthread_rwlock_unlock) (__rwlock); > + else > + return 0; > +} > +#endif > + > #endif /* _LIBOBJC */ > > #endif /* ! GCC_GTHR_POSIX_H */ > diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c > index 8fa1f0d4ce0..91bf397105d 100644 > --- a/libgfortran/io/async.c > +++ b/libgfortran/io/async.c > @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX); > > DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;) > DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;) > +#ifdef __GTHREAD_RWLOCK_INIT > +DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;) > +DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;) > +#endif > > /* Current unit for asynchronous I/O. Needed for error reporting. */ > > diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h > index ad226c8e856..f112f6870bb 100644 > --- a/libgfortran/io/async.h > +++ b/libgfortran/io/async.h > @@ -210,6 +210,128 @@ > DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \ > } while (0) > > +#ifdef __GTHREAD_RWLOCK_INIT > +#define RWLOCK_DEBUG_ADD(rwlock) do { \ > + aio_rwlock_debug *n; \ > + n = xmalloc (sizeof (aio_rwlock_debug)); \ > + n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \ > + if (n->prev) \ > + n->prev->next = n; \ > + n->next = NULL; \ > + n->line = __LINE__; \ > + n->func = __FUNCTION__; \ > + n->rw = rwlock; \ > + if (!aio_rwlock_debug_head) { \ > + aio_rwlock_debug_head = n; \ > + } \ > + } while (0) > + > +#define CHECK_RDLOCK(rwlock, status) do { \ > + aio_rwlock_debug *curr; \ > + INTERN_WRLOCK (&debug_queue_rwlock); \ > + if (__gthread_rwlock_tryrdlock (rwlock)) { \ > + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \ > + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \ > + } else \ > + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \ > + } \ > + else { \ > + __gthread_rwlock_unlock (rwlock); \ > + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \ > + } \ > + INTERN_RWUNLOCK (&debug_queue_rwlock); \ > + }while (0) > + > +#define CHECK_WRLOCK(rwlock, status) do { \ > + aio_rwlock_debug *curr; \ > + INTERN_WRLOCK (&debug_queue_rwlock); \ > + if (__gthread_rwlock_trywrlock (rwlock)) { \ > + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \ > + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \ > + } else \ > + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \ > + } \ > + else { \ > + __gthread_rwlock_unlock (rwlock); \ > + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \ > + } \ > + INTERN_RWUNLOCK (&debug_queue_rwlock); \ > + }while (0) > + > +#define TAIL_RWLOCK_DEBUG_QUEUE ({ \ > + aio_rwlock_debug *curr = aio_rwlock_debug_head; \ > + while (curr && curr->next) { \ > + curr = curr->next; \ > + } \ > + curr; \ > + }) > + > +#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({ \ > + __label__ end; \ > + aio_rwlock_debug *curr = aio_rwlock_debug_head; \ > + while (curr) { \ > + if (curr->rw == rwlock) { \ > + goto end; \ > + } \ > + curr = curr->next; \ > + } \ > + end:; \ > + curr; \ > + }) > + > +#define RDLOCK(rwlock) do { \ > + char status[200]; \ > + CHECK_RDLOCK (rwlock, status); \ > + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \ > + DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \ > + INTERN_RDLOCK (rwlock); \ > + INTERN_WRLOCK (&debug_queue_rwlock); \ > + RWLOCK_DEBUG_ADD (rwlock); \ > + INTERN_RWUNLOCK (&debug_queue_rwlock); \ > + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \ > + } while (0) > + > +#define WRLOCK(rwlock) do { \ > + char status[200]; \ > + CHECK_WRLOCK (rwlock, status); \ > + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \ > + DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \ > + INTERN_WRLOCK (rwlock); \ > + INTERN_WRLOCK (&debug_queue_rwlock); \ > + RWLOCK_DEBUG_ADD (rwlock); \ > + INTERN_RWUNLOCK (&debug_queue_rwlock); \ > + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \ > + } while (0) > + > +#define RWUNLOCK(rwlock) do { \ > + aio_rwlock_debug *curr; \ > + DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN "RWUNLOCK: " DEBUG_NORM #rwlock, \ > + __FUNCTION__, __LINE__, (void *) rwlock); \ > + INTERN_WRLOCK (&debug_queue_rwlock); \ > + curr = IN_RWLOCK_DEBUG_QUEUE (rwlock); \ > + if (curr) \ > + { \ > + if (curr->prev) \ > + curr->prev->next = curr->next; \ > + if (curr->next) { \ > + curr->next->prev = curr->prev; \ > + if (curr == aio_rwlock_debug_head) \ > + aio_rwlock_debug_head = curr->next; \ > + } else { \ > + if (curr == aio_rwlock_debug_head) \ > + aio_rwlock_debug_head = NULL; \ > + } \ > + free (curr); \ > + } \ > + INTERN_RWUNLOCK (&debug_queue_rwlock); \ > + INTERN_RWUNLOCK (rwlock); \ > + } while (0) > + > +#define RD_TO_WRLOCK(rwlock) \ > + RWUNLOCK (rwlock); \ > + WRLOCK (rwlock); > +#endif > + > #define DEBUG_LINE(...) __VA_ARGS__ > > #else > @@ -221,12 +343,31 @@ > #define LOCK(mutex) INTERN_LOCK (mutex) > #define UNLOCK(mutex) INTERN_UNLOCK (mutex) > #define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex)) > +#ifdef __GTHREAD_RWLOCK_INIT > +#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock) > +#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock) > +#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock) > +#define RD_TO_WRLOCK(rwlock) \ > + RWUNLOCK (rwlock); \ > + WRLOCK (rwlock); > +#endif > +#endif > + > +#ifndef __GTHREAD_RWLOCK_INIT > +#define RDLOCK(rwlock) LOCK (rwlock) > +#define WRLOCK(rwlock) LOCK (rwlock) > +#define RWUNLOCK(rwlock) UNLOCK (rwlock) > +#define RD_TO_WRLOCK(rwlock) do {} while (0) > #endif > > #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex); > > #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex); > > +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock) > +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock) > +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock) > + > #if ASYNC_IO > > /* au->lock has to be held when calling this macro. */ > @@ -288,8 +429,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{ > struct aio_lock_debug *prev; > } aio_lock_debug;) > > +DEBUG_LINE (typedef struct aio_rwlock_debug{ > + __gthread_rwlock_t *rw; > + int line; > + const char *func; > + struct aio_rwlock_debug *next; > + struct aio_rwlock_debug *prev; > +} aio_rwlock_debug;) > + > DEBUG_LINE (extern aio_lock_debug *aio_debug_head;) > DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;) > +DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;) > +DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;) > > /* Thread - local storage of the current unit we are looking at. Needed for > error reporting. */ > diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h > index ecdf1dd3f05..15daa0995b1 100644 > --- a/libgfortran/io/io.h > +++ b/libgfortran/io/io.h > @@ -690,7 +690,7 @@ typedef struct gfc_unit > from the UNIT_ROOT tree, but doesn't free it and the > last of the waiting threads will do that. > This must be either atomically increased/decreased, or > - always guarded by UNIT_LOCK. */ > + always guarded by UNIT_RWLOCK. */ > int waiting; > /* Flag set by close_unit if the unit as been closed. > Must be manipulated under unit's lock. */ > @@ -769,8 +769,13 @@ internal_proto(default_recl); > extern gfc_unit *unit_root; > internal_proto(unit_root); > > -extern __gthread_mutex_t unit_lock; > -internal_proto(unit_lock); > +#ifdef __GTHREAD_RWLOCK_INIT > +extern __gthread_rwlock_t unit_rwlock; > +internal_proto(unit_rwlock); > +#else > +extern __gthread_mutex_t unit_rwlock; > +internal_proto(unit_rwlock); > +#endif > > extern int close_unit (gfc_unit *); > internal_proto(close_unit); > @@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u) > #ifdef HAVE_ATOMIC_FETCH_ADD > (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); > #else > - __gthread_mutex_lock (&unit_lock); > + WRLOCK (&unit_rwlock); > u->waiting--; > - __gthread_mutex_unlock (&unit_lock); > + RWUNLOCK (&unit_rwlock); > #endif > } > > diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c > index 500db90c828..00d516adcb0 100644 > --- a/libgfortran/io/transfer.c > +++ b/libgfortran/io/transfer.c > @@ -4538,9 +4538,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock) > if (free_newunit) > { > /* Avoid inverse lock issues by placing after unlock_unit. */ > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > newunit_free (dtp->common.unit); > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > } > } > > @@ -4634,9 +4634,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock) > if (free_newunit) > { > /* Avoid inverse lock issues by placing after unlock_unit. */ > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > newunit_free (dtp->common.unit); > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > } > } > > diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c > index 36d025949c2..0c8c35e464e 100644 > --- a/libgfortran/io/unit.c > +++ b/libgfortran/io/unit.c > @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > > > /* IO locking rules: > - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + UNIT_RWLOCK is a master rw lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + Using an rwlock improves efficiency by allowing us to separate readers > + and writers of both UNIT_ROOT and UNIT_CACHE. > Concurrent use of different units should be supported, so > each unit has its own lock, LOCK. > Open should be atomic with its reopening of units and list_read.c > in several places needs find_unit another unit while holding stdin > - unit's lock, so it must be possible to acquire UNIT_LOCK while holding > + unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding > some unit's lock. Therefore to avoid deadlocks, it is forbidden > - to acquire unit's private locks while holding UNIT_LOCK, except > + to acquire unit's private locks while holding UNIT_RWLOCK, except > for freshly created units (where no other thread can get at their > address yet) or when using just trylock rather than lock operation. > In addition to unit's private lock each unit has a WAITERS counter > and CLOSED flag. WAITERS counter must be either only > atomically incremented/decremented in all places (if atomic builtins > - are supported), or protected by UNIT_LOCK in all places (otherwise). > + are supported), or protected by UNIT_RWLOCK in all places (otherwise). > CLOSED flag must be always protected by unit's LOCK. > - After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held, > + After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held, > WAITERS must be incremented to avoid concurrent close from freeing > - the unit between unlocking UNIT_LOCK and acquiring unit's LOCK. > - Unit freeing is always done under UNIT_LOCK. If close_unit sees any > + the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK. > + Unit freeing is always done under UNIT_RWLOCK. If close_unit sees any > WAITERS, it doesn't free the unit but instead sets the CLOSED flag > and the thread that decrements WAITERS to zero while CLOSED flag is > - set is responsible for freeing it (while holding UNIT_LOCK). > + set is responsible for freeing it (while holding UNIT_RWLOCK). > flush_all_units operation is iterating over the unit tree with > - increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to > + increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to > flush each unit (and therefore needs the unit's LOCK held as well). > To avoid deadlocks, it just trylocks the LOCK and if unsuccessful, > - remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires > - unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with > + remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires > + unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with > the smallest UNIT_NUMBER above the last one flushed. > > If find_unit/find_or_create_unit/find_file/get_unit routines return > @@ -101,10 +103,14 @@ gfc_offset max_offset; > gfc_offset default_recl; > > gfc_unit *unit_root; > +#ifdef __GTHREAD_RWLOCK_INIT > +__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT; > +#else > #ifdef __GTHREAD_MUTEX_INIT > -__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT; > +__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT; > #else > -__gthread_mutex_t unit_lock; > +__gthread_mutex_t unit_rwlock; > +#endif > #endif > > /* We use these filenames for error reporting. */ > @@ -317,6 +323,28 @@ delete_unit (gfc_unit *old) > unit_root = delete_treap (old, unit_root); > } > > +/* get_gfc_unit_from_root()-- Given an integer, return a pointer > + to the unit structure. Returns NULL if the unit does not exist, > + otherwise returns a locked unit. */ > + > +static inline gfc_unit * > +get_gfc_unit_from_unit_root (int n) > +{ > + gfc_unit *p; > + int c = 0; > + p = unit_root; > + while (p != NULL) > + { > + c = compare (n, p->unit_number); > + if (c < 0) > + p = p->left; > + if (c > 0) > + p = p->right; > + if (c == 0) > + break; > + } > + return p; > +} > > /* get_gfc_unit()-- Given an integer, return a pointer to the unit > structure. Returns NULL if the unit does not exist, > @@ -329,7 +357,7 @@ get_gfc_unit (int n, int do_create) > int c, created = 0; > > NOTE ("Unit n=%d, do_create = %d", n, do_create); > - LOCK (&unit_lock); > + RDLOCK (&unit_rwlock); > > retry: > for (c = 0; c < CACHE_SIZE; c++) > @@ -339,18 +367,25 @@ retry: > goto found; > } > > - p = unit_root; > - while (p != NULL) > - { > - c = compare (n, p->unit_number); > - if (c < 0) > - p = p->left; > - if (c > 0) > - p = p->right; > - if (c == 0) > - break; > - } > - > + p = get_gfc_unit_from_unit_root (n); > + > + /* We did not find a unit in the cache nor in the unit list, > + create a new (locked) unit and insert into the unit list and > + cache. Manipulating either or both the unit list and the unit > + cache requires to hold a write-lock [for obvious reasons]: > + By separating the read/write lock, we will greatly reduce > + the contention on the read part, while the write part is > + unlikely once the unit hits the cache. */ > + RD_TO_WRLOCK (&unit_rwlock); > + > + /* In the case of high concurrency, when multiple threads want > + to find or create the same unit, the unit number may not > + exist in cache nor in the unit list during read phase, then > + threads will acquire the write-lock to insert the same unit > + number to unit list. To avoid duplicate insert, we need to > + find unit list once again to ensure that the unit number > + not exist. */ > + p = get_gfc_unit_from_unit_root (n); > if (p == NULL && do_create) > { > p = insert_unit (n); > @@ -368,8 +403,8 @@ retry: > if (created) > { > /* Newly created units have their lock held already > - from insert_unit. Just unlock UNIT_LOCK and return. */ > - UNLOCK (&unit_lock); > + from insert_unit. Just unlock UNIT_RWLOCK and return. */ > + RWUNLOCK (&unit_rwlock); > return p; > } > > @@ -380,7 +415,7 @@ found: > if (! TRYLOCK (&p->lock)) > { > /* assert (p->closed == 0); */ > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > return p; > } > > @@ -388,14 +423,14 @@ found: > } > > > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > > if (p != NULL && (p->child_dtio == 0)) > { > LOCK (&p->lock); > if (p->closed) > { > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > UNLOCK (&p->lock); > if (predec_waiting_locked (p) == 0) > destroy_unit_mutex (p); > @@ -594,8 +629,8 @@ init_units (void) > #endif > #endif > > -#ifndef __GTHREAD_MUTEX_INIT > - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); > +#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT)) > + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); > #endif > > if (sizeof (max_offset) == 8) > @@ -732,7 +767,7 @@ close_unit_1 (gfc_unit *u, int locked) > > u->closed = 1; > if (!locked) > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > > for (i = 0; i < CACHE_SIZE; i++) > if (unit_cache[i] == u) > @@ -759,7 +794,7 @@ close_unit_1 (gfc_unit *u, int locked) > destroy_unit_mutex (u); > > if (!locked) > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > > return rc; > } > @@ -796,10 +831,10 @@ close_unit (gfc_unit *u) > void > close_units (void) > { > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > while (unit_root != NULL) > close_unit_1 (unit_root, 1); > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > > free (newunits); > > @@ -906,7 +941,7 @@ finish_last_advance_record (gfc_unit *u) > int > newunit_alloc (void) > { > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > if (!newunits) > { > newunits = xcalloc (16, 1); > @@ -920,7 +955,7 @@ newunit_alloc (void) > { > newunits[ii] = true; > newunit_lwi = ii + 1; > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > return -ii + NEWUNIT_START; > } > } > @@ -933,12 +968,12 @@ newunit_alloc (void) > memset (newunits + old_size, 0, old_size); > newunits[old_size] = true; > newunit_lwi = old_size + 1; > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > return -old_size + NEWUNIT_START; > } > > > -/* Free a previously allocated newunit= unit number. unit_lock must > +/* Free a previously allocated newunit= unit number. unit_rwlock must > be held when calling. */ > > void > diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c > index d466df979df..dcae051744d 100644 > --- a/libgfortran/io/unix.c > +++ b/libgfortran/io/unix.c > @@ -1773,7 +1773,7 @@ find_file (const char *file, gfc_charlen_type file_len) > id = id_from_path (path); > #endif > > - LOCK (&unit_lock); > + RDLOCK (&unit_rwlock); > retry: > u = find_file0 (unit_root, FIND_FILE0_ARGS); > if (u != NULL) > @@ -1782,19 +1782,19 @@ retry: > if (! __gthread_mutex_trylock (&u->lock)) > { > /* assert (u->closed == 0); */ > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > goto done; > } > > inc_waiting_locked (u); > } > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > if (u != NULL) > { > LOCK (&u->lock); > if (u->closed) > { > - LOCK (&unit_lock); > + RDLOCK (&unit_rwlock); > UNLOCK (&u->lock); > if (predec_waiting_locked (u) == 0) > free (u); > @@ -1838,13 +1838,13 @@ flush_all_units (void) > gfc_unit *u; > int min_unit = 0; > > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > do > { > u = flush_all_units_1 (unit_root, min_unit); > if (u != NULL) > inc_waiting_locked (u); > - UNLOCK (&unit_lock); > + RWUNLOCK (&unit_rwlock); > if (u == NULL) > return; > > @@ -1855,13 +1855,13 @@ flush_all_units (void) > if (u->closed == 0) > { > sflush (u->s); > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > UNLOCK (&u->lock); > (void) predec_waiting_locked (u); > } > else > { > - LOCK (&unit_lock); > + WRLOCK (&unit_rwlock); > UNLOCK (&u->lock); > if (predec_waiting_locked (u) == 0) > free (u); > diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 > new file mode 100644 > index 00000000000..f90ecbeb00f > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 > @@ -0,0 +1,33 @@ > +! { dg-do run } > +! Multiple threads call open/write/read/close in concurrency with different unit number, > +! threads can acquire read lock concurrently, to find unit from cache or unit list very frequently, > +! if not found, threads will acquire the write lock exclusively to insert unit to cache and unit list. > +! This test case is used to stress both the read and write lock when access unit cache and list. > +program main > + use omp_lib > + implicit none > + integer:: unit_number, v1, v2, i > + character(11) :: file_name > + character(3) :: async = "no" > + !$omp parallel private (unit_number, v1, v2, file_name, async, i) > + do i = 0, 100 > + unit_number = 10 + omp_get_thread_num () > + write (file_name, "(I3, A)") unit_number, "_tst.dat" > + file_name = adjustl(file_name) > + open (unit_number, file=file_name, asynchronous="yes") > + ! call inquire with file parameter to test find_file in unix.c > + inquire (file=file_name, asynchronous=async) > + if (async /= "YES") stop 1 > + write (unit_number, *, asynchronous="yes") unit_number > + write (unit_number, *, asynchronous="yes") unit_number + 1 > + close(unit_number) > + > + open (unit_number, file = file_name, asynchronous="yes") > + read (unit_number, *, asynchronous="yes") v1 > + read (unit_number, *, asynchronous="yes") v2 > + wait (unit_number) > + if ((v1 /= unit_number) .or. (v2 /= unit_number + 1)) stop 2 > + close(unit_number, status="delete") > + end do > + !$omp end parallel > +end program > diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 > new file mode 100644 > index 00000000000..08c80d14cfb > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 > @@ -0,0 +1,22 @@ > +! { dg-do run } > +! Insert a unit into cache at the beginning, then start multiple > +! threads to access the same unit concurrency, unit will be found in unit cache during the read lock phase. > +! This test case is used to test the read lock when access unit cache and list. > +program main > + use omp_lib > + implicit none > + integer:: thread_id, total_threads, i, j > + total_threads = omp_get_max_threads () > + open (10, file='tst.dat', asynchronous="yes") > + !$omp parallel private (thread_id, i, j) > + do i = 1, 100 > + thread_id = omp_get_thread_num () > + do j = 1, 100 > + write (10, *, asynchronous="yes") thread_id, i > + end do > + end do > + !$omp end parallel > + ! call inquire with file parameter to test find_file in unix.c > + call flush () > + close (10, status="delete") > +end program > diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 > new file mode 100644 > index 00000000000..1906fcd7a0b > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 > @@ -0,0 +1,18 @@ > +! { dg-do run } > +! Find or create the same unit number in concurrency, > +! at beginning, threads cannot find the unit in cache or unit list, > +! then threads will acquire the write lock to insert unit. > +! This test case is used to ensure that no duplicate unit number will be > +! inserted into cache nor unit list when same unit was accessed in concurrency. > +program main > + use omp_lib > + implicit none > + integer:: i > + !$omp parallel private (i) > + do i = 1, 100 > + open (10, file='tst.dat', asynchronous="yes") > + ! Delete the unit number from cache and unit list to stress write lock. > + close (10, status="delete") > + end do > + !$omp end parallel > +end program