public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
To: Lipeng Zhu <lipeng.zhu@intel.com>, jakub@redhat.com
Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org,
	hongjiu.lu@intel.com, pan.deng@intel.com, rep.dot.nop@gmail.com,
	tianyou.li@intel.com, tkoenig@netcologne.de,
	wangyang.guo@intel.com
Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock
Date: Thu, 14 Dec 2023 15:50:16 +0000	[thread overview]
Message-ID: <523c6fbf-344c-48b8-8508-7e2acb088606@arm.com> (raw)
In-Reply-To: <20231209153944.3746165-1-lipeng.zhu@intel.com>

On 09/12/2023 15:39, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
> 	* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> 	(__gthrw): New function.
> 	(__gthread_rwlock_rdlock): New function.
> 	(__gthread_rwlock_tryrdlock): New function.
> 	(__gthread_rwlock_wrlock): New function.
> 	(__gthread_rwlock_trywrlock): New function.
> 	(__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
> 	* io/async.c (DEBUG_LINE): New macro.
> 	* io/async.h (RWLOCK_DEBUG_ADD): New macro.
> 	(CHECK_RDLOCK): New macro.
> 	(CHECK_WRLOCK): New macro.
> 	(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> 	(IN_RWLOCK_DEBUG_QUEUE): New macro.
> 	(RDLOCK): New macro.
> 	(WRLOCK): New macro.
> 	(RWUNLOCK): New macro.
> 	(RD_TO_WRLOCK): New macro.
> 	(INTERN_RDLOCK): New macro.
> 	(INTERN_WRLOCK): New macro.
> 	(INTERN_RWUNLOCK): New macro.
> 	* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> 	a comment.
> 	(unit_lock): Remove including associated internal_proto.
> 	(unit_rwlock): New declarations including associated internal_proto.
> 	(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> 	instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> 	unit_lock.
> 	* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
> 	unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> 	(st_write_done_worker): Likewise.
> 	* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> 	comment. Use unit_rwlock variable instead of unit_lock variable.
> 	(get_gfc_unit_from_unit_root): New function.
> 	(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> 	instead of LOCK and UNLOCK on unit_lock.
> 	(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
> 	LOCK and UNLOCK on unit_lock.
> 	(close_units): Likewise.
> 	(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> 	unit_lock.
> 	* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> 	instead of LOCK and UNLOCK on unit_lock.
> 	(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> 	of LOCK and UNLOCK on unit_lock.
> 

It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function 
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
 1023 |   WRLOCK (&unit_rwlock);
      |   ^~~~~~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
 1025 |   RWUNLOCK (&unit_rwlock);
      |   ^~~~~~~~


R.

> ---
> v1 -> v2:
> Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
> 
> v2 -> v3:
> Rebase the patch with trunk branch.
> 
> v3 -> v4:
> Update the comments.
> 
> v4 -> v5:
> Fix typos and code formatter.
> 
> v5 -> v6:
> Add unit tests.
> 
> v6 -> v7:
> Update ChangeLog and code formatter.
> 
> Reviewed-by: Hongjiu Lu <hongjiu.lu@intel.com>
> Reviewed-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
> Reviewed-by: Thomas Koenig <tkoenig@netcologne.de>
> Reviewed-by: Jakub Jelinek <jakub@redhat.com>
> Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
> ---
>  libgcc/gthr-posix.h                           |  60 +++++++
>  libgfortran/io/async.c                        |   4 +
>  libgfortran/io/async.h                        | 151 ++++++++++++++++++
>  libgfortran/io/io.h                           |  15 +-
>  libgfortran/io/transfer.c                     |   8 +-
>  libgfortran/io/unit.c                         | 117 +++++++++-----
>  libgfortran/io/unix.c                         |  16 +-
>  .../testsuite/libgomp.fortran/rwlock_1.f90    |  33 ++++
>  .../testsuite/libgomp.fortran/rwlock_2.f90    |  22 +++
>  .../testsuite/libgomp.fortran/rwlock_3.f90    |  18 +++
>  10 files changed, 386 insertions(+), 58 deletions(-)
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> 
> diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
> index aebcfdd9f4c..73283082997 100644
> --- a/libgcc/gthr-posix.h
> +++ b/libgcc/gthr-posix.h
> @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
>  typedef pthread_key_t __gthread_key_t;
>  typedef pthread_once_t __gthread_once_t;
>  typedef pthread_mutex_t __gthread_mutex_t;
> +#ifndef __cplusplus
> +typedef pthread_rwlock_t __gthread_rwlock_t;
> +#endif
>  typedef pthread_mutex_t __gthread_recursive_mutex_t;
>  typedef pthread_cond_t __gthread_cond_t;
>  typedef struct timespec __gthread_time_t;
> @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
>  
>  #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
>  #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
> +#ifndef __cplusplus
> +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
> +#endif
>  #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
>  #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
>  #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER
> @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init)
>  __gthrw(pthread_mutexattr_settype)
>  __gthrw(pthread_mutexattr_destroy)
>  
> +#ifndef __cplusplus
> +__gthrw(pthread_rwlock_rdlock)
> +__gthrw(pthread_rwlock_tryrdlock)
> +__gthrw(pthread_rwlock_wrlock)
> +__gthrw(pthread_rwlock_trywrlock)
> +__gthrw(pthread_rwlock_unlock)
> +#endif
>  
>  #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)
>  /* Objective-C.  */
> @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond)
>    return __gthrw_(pthread_cond_destroy) (__cond);
>  }
>  
> +#ifndef __cplusplus
> +static inline int
> +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_rdlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_wrlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_trywrlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_unlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +#endif
> +
>  #endif /* _LIBOBJC */
>  
>  #endif /* ! GCC_GTHR_POSIX_H */
> diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c
> index 8fa1f0d4ce0..91bf397105d 100644
> --- a/libgfortran/io/async.c
> +++ b/libgfortran/io/async.c
> @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX);
>  
>  DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;)
>  DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;)
> +#ifdef __GTHREAD_RWLOCK_INIT
> +DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;)
> +DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;)
> +#endif
>  
>  /* Current unit for asynchronous I/O.  Needed for error reporting.  */
>  
> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h
> index ad226c8e856..f112f6870bb 100644
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -210,6 +210,128 @@
>      DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
>    } while (0)
>  
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RWLOCK_DEBUG_ADD(rwlock) do {		\
> +    aio_rwlock_debug *n;				\
> +    n = xmalloc (sizeof (aio_rwlock_debug));	\
> +    n->prev = TAIL_RWLOCK_DEBUG_QUEUE;			\
> +    if (n->prev)				\
> +      n->prev->next = n;			\
> +    n->next = NULL;				\
> +    n->line = __LINE__;				\
> +    n->func = __FUNCTION__;			\
> +    n->rw = rwlock;				\
> +    if (!aio_rwlock_debug_head) {			\
> +      aio_rwlock_debug_head = n;			\
> +    }						\
> +  } while (0)
> +
> +#define CHECK_RDLOCK(rwlock, status) do {					\
> +    aio_rwlock_debug *curr;						\
> +    INTERN_WRLOCK (&debug_queue_rwlock);					\
> +    if (__gthread_rwlock_tryrdlock (rwlock)) {				\
> +      if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) {				\
> +	sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
> +      } else								\
> +	sprintf (status, DEBUG_RED "unknown" DEBUG_NORM);			\
> +    }									\
> +    else {								\
> +      __gthread_rwlock_unlock (rwlock);					\
> +      sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM);			\
> +    }									\
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);					\
> +  }while (0)
> +
> +#define CHECK_WRLOCK(rwlock, status) do {					\
> +    aio_rwlock_debug *curr;						\
> +    INTERN_WRLOCK (&debug_queue_rwlock);					\
> +    if (__gthread_rwlock_trywrlock (rwlock)) {				\
> +      if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) {				\
> +	sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
> +      } else								\
> +	sprintf (status, DEBUG_RED "unknown" DEBUG_NORM);			\
> +    }									\
> +    else {								\
> +      __gthread_rwlock_unlock (rwlock);					\
> +      sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM);			\
> +    }									\
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);					\
> +  }while (0)
> +
> +#define TAIL_RWLOCK_DEBUG_QUEUE ({			\
> +      aio_rwlock_debug *curr = aio_rwlock_debug_head;	\
> +      while (curr && curr->next) {		\
> +	curr = curr->next;			\
> +      }						\
> +      curr;					\
> +    })
> +
> +#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({		\
> +      __label__ end;				\
> +      aio_rwlock_debug *curr = aio_rwlock_debug_head;	\
> +      while (curr) {				\
> +	if (curr->rw == rwlock) {			\
> +	  goto end;				\
> +	}					\
> +	curr = curr->next;			\
> +      }						\
> +    end:;					\
> +      curr;					\
> +    })
> +
> +#define RDLOCK(rwlock) do {						\
> +    char status[200];							\
> +    CHECK_RDLOCK (rwlock, status);						\
> +    DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix,	\
> +		 DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
> +    INTERN_RDLOCK (rwlock);							\
> +    INTERN_WRLOCK (&debug_queue_rwlock);					\
> +    RWLOCK_DEBUG_ADD (rwlock);						\
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);					\
> +    DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
> +  } while (0)
> +
> +#define WRLOCK(rwlock) do {						\
> +    char status[200];							\
> +    CHECK_WRLOCK (rwlock, status);						\
> +    DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix,	\
> +		 DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
> +    INTERN_WRLOCK (rwlock);							\
> +    INTERN_WRLOCK (&debug_queue_rwlock);					\
> +    RWLOCK_DEBUG_ADD (rwlock);						\
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);					\
> +    DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
> +  } while (0)
> +
> +#define RWUNLOCK(rwlock) do {						\
> +    aio_rwlock_debug *curr;						\
> +    DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN "RWUNLOCK: " DEBUG_NORM #rwlock, \
> +		 __FUNCTION__, __LINE__, (void *) rwlock);		\
> +    INTERN_WRLOCK (&debug_queue_rwlock);					\
> +    curr = IN_RWLOCK_DEBUG_QUEUE (rwlock);					\
> +    if (curr)								\
> +      {									\
> +	if (curr->prev)							\
> +	  curr->prev->next = curr->next;				\
> +	if (curr->next) {						\
> +	  curr->next->prev = curr->prev;				\
> +	  if (curr == aio_rwlock_debug_head)					\
> +	    aio_rwlock_debug_head = curr->next;				\
> +	} else {							\
> +	  if (curr == aio_rwlock_debug_head)					\
> +	    aio_rwlock_debug_head = NULL;					\
> +	}								\
> +	free (curr);							\
> +      }									\
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);					\
> +    INTERN_RWUNLOCK (rwlock);						\
> +  } while (0)
> +
> +#define RD_TO_WRLOCK(rwlock)	\
> +  RWUNLOCK (rwlock);	\
> +  WRLOCK (rwlock);
> +#endif
> +
>  #define DEBUG_LINE(...) __VA_ARGS__
>  
>  #else
> @@ -221,12 +343,31 @@
>  #define LOCK(mutex) INTERN_LOCK (mutex)
>  #define UNLOCK(mutex) INTERN_UNLOCK (mutex)
>  #define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex))
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock)
> +#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock)
> +#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock)	\
> +  RWUNLOCK (rwlock);	\
> +  WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) do {} while (0)
>  #endif
>  
>  #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>  
>  #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>  
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock)
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock)
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock)
> +
>  #if ASYNC_IO
>  
>  /* au->lock has to be held when calling this macro.  */
> @@ -288,8 +429,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{
>    struct aio_lock_debug *prev;
>  } aio_lock_debug;)
>  
> +DEBUG_LINE (typedef struct aio_rwlock_debug{
> +  __gthread_rwlock_t *rw;
> +  int line;
> +  const char *func;
> +  struct aio_rwlock_debug *next;
> +  struct aio_rwlock_debug *prev;
> +} aio_rwlock_debug;)
> +
>  DEBUG_LINE (extern aio_lock_debug *aio_debug_head;)
>  DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;)
> +DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;)
> +DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;)
>  
>  /* Thread - local storage of the current unit we are looking at. Needed for
>     error reporting.  */
> diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
> index ecdf1dd3f05..15daa0995b1 100644
> --- a/libgfortran/io/io.h
> +++ b/libgfortran/io/io.h
> @@ -690,7 +690,7 @@ typedef struct gfc_unit
>       from the UNIT_ROOT tree, but doesn't free it and the
>       last of the waiting threads will do that.
>       This must be either atomically increased/decreased, or
> -     always guarded by UNIT_LOCK.  */
> +     always guarded by UNIT_RWLOCK.  */
>    int waiting;
>    /* Flag set by close_unit if the unit as been closed.
>       Must be manipulated under unit's lock.  */
> @@ -769,8 +769,13 @@ internal_proto(default_recl);
>  extern gfc_unit *unit_root;
>  internal_proto(unit_root);
>  
> -extern __gthread_mutex_t unit_lock;
> -internal_proto(unit_lock);
> +#ifdef __GTHREAD_RWLOCK_INIT
> +extern __gthread_rwlock_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#else
> +extern __gthread_mutex_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#endif
>  
>  extern int close_unit (gfc_unit *);
>  internal_proto(close_unit);
> @@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u)
>  #ifdef HAVE_ATOMIC_FETCH_ADD
>    (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
>  #else
> -  __gthread_mutex_lock (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    u->waiting--;
> -  __gthread_mutex_unlock (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  #endif
>  }
>  
> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
> index 500db90c828..00d516adcb0 100644
> --- a/libgfortran/io/transfer.c
> +++ b/libgfortran/io/transfer.c
> @@ -4538,9 +4538,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock)
>     if (free_newunit)
>       {
>         /* Avoid inverse lock issues by placing after unlock_unit.  */
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         newunit_free (dtp->common.unit);
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>       }
>  }
>  
> @@ -4634,9 +4634,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock)
>     if (free_newunit)
>       {
>         /* Avoid inverse lock issues by placing after unlock_unit.  */
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         newunit_free (dtp->common.unit);
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>       }
>  }
>  
> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
> index 36d025949c2..0c8c35e464e 100644
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master rw lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   Using an rwlock improves efficiency by allowing us to separate readers
> +   and writers of both UNIT_ROOT and UNIT_CACHE.
>     Concurrent use of different units should be supported, so
>     each unit has its own lock, LOCK.
>     Open should be atomic with its reopening of units and list_read.c
>     in several places needs find_unit another unit while holding stdin
> -   unit's lock, so it must be possible to acquire UNIT_LOCK while holding
> +   unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding
>     some unit's lock.  Therefore to avoid deadlocks, it is forbidden
> -   to acquire unit's private locks while holding UNIT_LOCK, except
> +   to acquire unit's private locks while holding UNIT_RWLOCK, except
>     for freshly created units (where no other thread can get at their
>     address yet) or when using just trylock rather than lock operation.
>     In addition to unit's private lock each unit has a WAITERS counter
>     and CLOSED flag.  WAITERS counter must be either only
>     atomically incremented/decremented in all places (if atomic builtins
> -   are supported), or protected by UNIT_LOCK in all places (otherwise).
> +   are supported), or protected by UNIT_RWLOCK in all places (otherwise).
>     CLOSED flag must be always protected by unit's LOCK.
> -   After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held,
> +   After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held,
>     WAITERS must be incremented to avoid concurrent close from freeing
> -   the unit between unlocking UNIT_LOCK and acquiring unit's LOCK.
> -   Unit freeing is always done under UNIT_LOCK.  If close_unit sees any
> +   the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK.
> +   Unit freeing is always done under UNIT_RWLOCK.  If close_unit sees any
>     WAITERS, it doesn't free the unit but instead sets the CLOSED flag
>     and the thread that decrements WAITERS to zero while CLOSED flag is
> -   set is responsible for freeing it (while holding UNIT_LOCK).
> +   set is responsible for freeing it (while holding UNIT_RWLOCK).
>     flush_all_units operation is iterating over the unit tree with
> -   increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to
> +   increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to
>     flush each unit (and therefore needs the unit's LOCK held as well).
>     To avoid deadlocks, it just trylocks the LOCK and if unsuccessful,
> -   remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires
> -   unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with
> +   remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires
> +   unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with
>     the smallest UNIT_NUMBER above the last one flushed.
>  
>     If find_unit/find_or_create_unit/find_file/get_unit routines return
> @@ -101,10 +103,14 @@ gfc_offset max_offset;
>  gfc_offset default_recl;
>  
>  gfc_unit *unit_root;
> +#ifdef __GTHREAD_RWLOCK_INIT
> +__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT;
> +#else
>  #ifdef __GTHREAD_MUTEX_INIT
> -__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT;
> +__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT;
>  #else
> -__gthread_mutex_t unit_lock;
> +__gthread_mutex_t unit_rwlock;
> +#endif
>  #endif
>  
>  /* We use these filenames for error reporting.  */
> @@ -317,6 +323,28 @@ delete_unit (gfc_unit *old)
>    unit_root = delete_treap (old, unit_root);
>  }
>  
> +/* get_gfc_unit_from_root()-- Given an integer, return a pointer
> +   to the unit structure. Returns NULL if the unit does not exist,
> +   otherwise returns a locked unit. */
> +
> +static inline gfc_unit *
> +get_gfc_unit_from_unit_root (int n)
> +{
> +  gfc_unit *p;
> +  int c = 0;
> +  p = unit_root;
> +  while (p != NULL)
> +    {
> +      c = compare (n, p->unit_number);
> +      if (c < 0)
> +        p = p->left;
> +      if (c > 0)
> +        p = p->right;
> +      if (c == 0)
> +        break;
> +    }
> +  return p;
> +}
>  
>  /* get_gfc_unit()-- Given an integer, return a pointer to the unit
>     structure.  Returns NULL if the unit does not exist,
> @@ -329,7 +357,7 @@ get_gfc_unit (int n, int do_create)
>    int c, created = 0;
>  
>    NOTE ("Unit n=%d, do_create = %d", n, do_create);
> -  LOCK (&unit_lock);
> +  RDLOCK (&unit_rwlock);
>  
>  retry:
>    for (c = 0; c < CACHE_SIZE; c++)
> @@ -339,18 +367,25 @@ retry:
>  	goto found;
>        }
>  
> -  p = unit_root;
> -  while (p != NULL)
> -    {
> -      c = compare (n, p->unit_number);
> -      if (c < 0)
> -	p = p->left;
> -      if (c > 0)
> -	p = p->right;
> -      if (c == 0)
> -	break;
> -    }
> -
> +  p = get_gfc_unit_from_unit_root (n);
> +
> +  /* We did not find a unit in the cache nor in the unit list,
> +    create a new (locked) unit and insert into the unit list and
> +    cache. Manipulating either or both the unit list and the unit
> +    cache requires to hold a write-lock [for obvious reasons]:
> +    By separating the read/write lock, we will greatly reduce
> +    the contention on the read part, while the write part is
> +    unlikely once the unit hits the cache. */
> +  RD_TO_WRLOCK (&unit_rwlock);
> +
> +  /* In the case of high concurrency, when multiple threads want
> +    to find or create the same unit, the unit number may not
> +    exist in cache nor in the unit list during read phase, then
> +    threads will acquire the write-lock to insert the same unit
> +    number to unit list. To avoid duplicate insert, we need to
> +    find unit list once again to ensure that the unit number
> +    not exist. */
> +  p = get_gfc_unit_from_unit_root (n);
>    if (p == NULL && do_create)
>      {
>        p = insert_unit (n);
> @@ -368,8 +403,8 @@ retry:
>    if (created)
>      {
>        /* Newly created units have their lock held already
> -	 from insert_unit.  Just unlock UNIT_LOCK and return.  */
> -      UNLOCK (&unit_lock);
> +	 from insert_unit. Just unlock UNIT_RWLOCK and return. */
> +      RWUNLOCK (&unit_rwlock);
>        return p;
>      }
>  
> @@ -380,7 +415,7 @@ found:
>        if (! TRYLOCK (&p->lock))
>  	{
>  	  /* assert (p->closed == 0); */
> -	  UNLOCK (&unit_lock);
> +	  RWUNLOCK (&unit_rwlock);
>  	  return p;
>  	}
>  
> @@ -388,14 +423,14 @@ found:
>      }
>  
>  
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  
>    if (p != NULL && (p->child_dtio == 0))
>      {
>        LOCK (&p->lock);
>        if (p->closed)
>  	{
> -	  LOCK (&unit_lock);
> +	  WRLOCK (&unit_rwlock);
>  	  UNLOCK (&p->lock);
>  	  if (predec_waiting_locked (p) == 0)
>  	    destroy_unit_mutex (p);
> @@ -594,8 +629,8 @@ init_units (void)
>  #endif
>  #endif
>  
> -#ifndef __GTHREAD_MUTEX_INIT
> -  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
> +#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT))
> +  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
>  #endif
>  
>    if (sizeof (max_offset) == 8)
> @@ -732,7 +767,7 @@ close_unit_1 (gfc_unit *u, int locked)
>  
>    u->closed = 1;
>    if (!locked)
> -    LOCK (&unit_lock);
> +    WRLOCK (&unit_rwlock);
>  
>    for (i = 0; i < CACHE_SIZE; i++)
>      if (unit_cache[i] == u)
> @@ -759,7 +794,7 @@ close_unit_1 (gfc_unit *u, int locked)
>      destroy_unit_mutex (u);
>  
>    if (!locked)
> -    UNLOCK (&unit_lock);
> +    RWUNLOCK (&unit_rwlock);
>  
>    return rc;
>  }
> @@ -796,10 +831,10 @@ close_unit (gfc_unit *u)
>  void
>  close_units (void)
>  {
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    while (unit_root != NULL)
>      close_unit_1 (unit_root, 1);
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  
>    free (newunits);
>  
> @@ -906,7 +941,7 @@ finish_last_advance_record (gfc_unit *u)
>  int
>  newunit_alloc (void)
>  {
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    if (!newunits)
>      {
>        newunits = xcalloc (16, 1);
> @@ -920,7 +955,7 @@ newunit_alloc (void)
>          {
>            newunits[ii] = true;
>            newunit_lwi = ii + 1;
> -	  UNLOCK (&unit_lock);
> +	  RWUNLOCK (&unit_rwlock);
>            return -ii + NEWUNIT_START;
>          }
>      }
> @@ -933,12 +968,12 @@ newunit_alloc (void)
>    memset (newunits + old_size, 0, old_size);
>    newunits[old_size] = true;
>    newunit_lwi = old_size + 1;
> -    UNLOCK (&unit_lock);
> +    RWUNLOCK (&unit_rwlock);
>    return -old_size + NEWUNIT_START;
>  }
>  
>  
> -/* Free a previously allocated newunit= unit number.  unit_lock must
> +/* Free a previously allocated newunit= unit number.  unit_rwlock must
>     be held when calling.  */
>  
>  void
> diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c
> index d466df979df..dcae051744d 100644
> --- a/libgfortran/io/unix.c
> +++ b/libgfortran/io/unix.c
> @@ -1773,7 +1773,7 @@ find_file (const char *file, gfc_charlen_type file_len)
>    id = id_from_path (path);
>  #endif
>  
> -  LOCK (&unit_lock);
> +  RDLOCK (&unit_rwlock);
>  retry:
>    u = find_file0 (unit_root, FIND_FILE0_ARGS);
>    if (u != NULL)
> @@ -1782,19 +1782,19 @@ retry:
>        if (! __gthread_mutex_trylock (&u->lock))
>  	{
>  	  /* assert (u->closed == 0); */
> -	  UNLOCK (&unit_lock);
> +	  RWUNLOCK (&unit_rwlock);
>  	  goto done;
>  	}
>  
>        inc_waiting_locked (u);
>      }
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>    if (u != NULL)
>      {
>        LOCK (&u->lock);
>        if (u->closed)
>  	{
> -	  LOCK (&unit_lock);
> +	  RDLOCK (&unit_rwlock);
>  	  UNLOCK (&u->lock);
>  	  if (predec_waiting_locked (u) == 0)
>  	    free (u);
> @@ -1838,13 +1838,13 @@ flush_all_units (void)
>    gfc_unit *u;
>    int min_unit = 0;
>  
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    do
>      {
>        u = flush_all_units_1 (unit_root, min_unit);
>        if (u != NULL)
>  	inc_waiting_locked (u);
> -      UNLOCK (&unit_lock);
> +      RWUNLOCK (&unit_rwlock);
>        if (u == NULL)
>  	return;
>  
> @@ -1855,13 +1855,13 @@ flush_all_units (void)
>        if (u->closed == 0)
>  	{
>  	  sflush (u->s);
> -	  LOCK (&unit_lock);
> +	  WRLOCK (&unit_rwlock);
>  	  UNLOCK (&u->lock);
>  	  (void) predec_waiting_locked (u);
>  	}
>        else
>  	{
> -	  LOCK (&unit_lock);
> +	  WRLOCK (&unit_rwlock);
>  	  UNLOCK (&u->lock);
>  	  if (predec_waiting_locked (u) == 0)
>  	    free (u);
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> new file mode 100644
> index 00000000000..f90ecbeb00f
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> @@ -0,0 +1,33 @@
> +! { dg-do run }
> +! Multiple threads call open/write/read/close in concurrency with different unit number,
> +! threads can acquire read lock concurrently, to find unit from cache or unit list very frequently,
> +! if not found, threads will acquire the write lock exclusively to insert unit to cache and unit list.
> +! This test case is used to stress both the read and write lock when access unit cache and list.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: unit_number, v1, v2, i
> +  character(11) :: file_name
> +  character(3) :: async = "no"
> +  !$omp parallel private (unit_number, v1, v2, file_name, async, i)
> +    do i = 0, 100
> +      unit_number = 10 + omp_get_thread_num ()
> +      write (file_name, "(I3, A)") unit_number, "_tst.dat"
> +      file_name = adjustl(file_name)
> +      open (unit_number, file=file_name, asynchronous="yes")
> +      ! call inquire with file parameter to test find_file in unix.c
> +      inquire (file=file_name, asynchronous=async)
> +      if (async /= "YES") stop 1
> +      write (unit_number, *, asynchronous="yes") unit_number
> +      write (unit_number, *, asynchronous="yes") unit_number + 1
> +      close(unit_number)
> +
> +      open (unit_number, file = file_name, asynchronous="yes")
> +      read (unit_number, *, asynchronous="yes") v1
> +      read (unit_number, *, asynchronous="yes") v2
> +      wait (unit_number)
> +      if ((v1 /= unit_number) .or. (v2 /= unit_number + 1)) stop 2
> +      close(unit_number, status="delete")
> +    end do
> +  !$omp end parallel
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> new file mode 100644
> index 00000000000..08c80d14cfb
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> @@ -0,0 +1,22 @@
> +! { dg-do run }
> +! Insert a unit into cache at the beginning, then start multiple
> +! threads to access the same unit concurrency, unit will be found in unit cache during the read lock phase.
> +! This test case is used to test the read lock when access unit cache and list.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: thread_id, total_threads, i, j
> +  total_threads = omp_get_max_threads ()
> +  open (10, file='tst.dat', asynchronous="yes")
> +  !$omp parallel private (thread_id, i, j)
> +    do i = 1, 100
> +      thread_id = omp_get_thread_num ()
> +      do j = 1, 100
> +        write (10, *, asynchronous="yes") thread_id, i
> +      end do
> +    end do
> +  !$omp end parallel
> +  ! call inquire with file parameter to test find_file in unix.c
> +  call flush ()
> +  close (10, status="delete")
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> new file mode 100644
> index 00000000000..1906fcd7a0b
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> @@ -0,0 +1,18 @@
> +! { dg-do run }
> +! Find or create the same unit number in concurrency,
> +! at beginning, threads cannot find the unit in cache or unit list,
> +! then threads will acquire the write lock to insert unit.
> +! This test case is used to ensure that no duplicate unit number will be
> +! inserted into cache nor unit list when same unit was accessed in concurrency.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: i
> +  !$omp parallel private (i)
> +    do i = 1, 100
> +      open (10, file='tst.dat', asynchronous="yes")
> +      ! Delete the unit number from cache and unit list to stress write lock.
> +      close (10, status="delete")
> +    end do
> +  !$omp end parallel
> +end program


  parent reply	other threads:[~2023-12-14 15:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09  2:32 [PATCH v4] " Zhu, Lipeng
2023-05-16  7:08 ` Zhu, Lipeng
2023-05-23  2:53   ` Zhu, Lipeng
2023-05-24 19:18     ` Thomas Koenig
2023-08-18  3:06       ` Zhu, Lipeng
2023-09-14  8:33         ` Zhu, Lipeng
2023-10-23  1:21           ` Zhu, Lipeng
2023-10-23  5:52             ` Thomas Koenig
2023-10-23 23:59               ` Zhu, Lipeng
2023-11-01 10:14                 ` Zhu, Lipeng
2023-11-02  9:58                   ` Bernhard Reutner-Fischer
2023-11-23  9:36                     ` Zhu, Lipeng
2023-12-07  5:18                       ` Zhu, Lipeng
2023-08-18  3:18       ` [PATCH v6] " Zhu, Lipeng
2023-12-08 10:19         ` Jakub Jelinek
2023-12-09 15:13           ` Zhu, Lipeng
2023-12-09 15:39             ` [PATCH v7] " Lipeng Zhu
2023-12-09 15:23               ` Jakub Jelinek
2023-12-10  3:25                 ` Zhu, Lipeng
2023-12-11 17:45                   ` H.J. Lu
2023-12-12  2:05                     ` Zhu, Lipeng
2023-12-13 20:52                       ` Thomas Schwinge
2023-12-14  2:28                         ` Zhu, Lipeng
2023-12-14 12:29                           ` Thomas Schwinge
2023-12-14 12:39                             ` Jakub Jelinek
2023-12-15  5:43                               ` Zhu, Lipeng
2023-12-21 11:42                         ` Thomas Schwinge
2023-12-22  6:48                           ` Lipeng Zhu
2024-01-03  9:14                           ` Lipeng Zhu
2024-01-17 13:25                             ` Lipeng Zhu
2023-12-14 15:50               ` Richard Earnshaw (lists) [this message]
2023-12-15 11:31                 ` Lipeng Zhu
2023-12-15 19:23                   ` Richard Earnshaw
2024-01-02 11:57                     ` Vaseeharan Vinayagamoorthy
2024-01-03  1:02                       ` Lipeng Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=523c6fbf-344c-48b8-8508-7e2acb088606@arm.com \
    --to=richard.earnshaw@arm.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongjiu.lu@intel.com \
    --cc=jakub@redhat.com \
    --cc=lipeng.zhu@intel.com \
    --cc=pan.deng@intel.com \
    --cc=rep.dot.nop@gmail.com \
    --cc=tianyou.li@intel.com \
    --cc=tkoenig@netcologne.de \
    --cc=wangyang.guo@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).