* Re: [PATCH v3] libgfortran: Replace mutex with rwlock
@ 2023-04-20 13:13 Zhu, Lipeng
2023-04-24 19:45 ` Bernhard Reutner-Fischer
0 siblings, 1 reply; 8+ messages in thread
From: Zhu, Lipeng @ 2023-04-20 13:13 UTC (permalink / raw)
To: Bernhard Reutner-Fischer, tkoenig, fortran, gcc-patches
Cc: hongjiu.lu, tianyou.li, pan.deng, wangyang.guo
Hi Bernhard,
Thanks for your questions and suggestions.
The rwlock could allow multiple threads to have concurrent read-only
access to the cache/unit list, only a single writer is allowed.
Write lock will not be acquired until all read lock are released.
And I didn't change the mutex scope when refactor the code, only make a
more fine-grained distinction for the read/write cache/unit list.
I complete the comment according to your template, I will insert the
comment in the source code in next version patch with other refinement
by your suggestions.
"
Now we did not get a unit in cache and unit list, so we need to create a
new unit, and update it to cache and unit list.
Prior to update the cache and list, we need to release all read locks,
and then immediately to acquire write lock, thus ensure the exclusive
update to the cache and unit list.
Either way, we will manipulate the cache and/or the unit list so we must
take a write lock now.
We don't take the write bit in *addition* to the read lock because:
1. It will needlessly complicate releasing the respective lock;
2. By separate the read/write lock, it will greatly reduce the
contention at the read part, while write part is not always necessary or
most unlikely once the unit hit in cache;
3. We try to balance the implementation complexity and the performance
gains that fit into current cases we observed.
"
Best Regards,
Zhu, Lipeng
On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
> On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote:
>> This patch try to introduce the rwlock and split the read/write to
>> unit_root tree and unit_cache with rwlock instead of the mutex to
>> increase CPU efficiency. In the get_gfc_unit function, the percentage
>> to step into the insert_unit function is around 30%, in most instances,
>> we can get the unit in the phase of reading the unit_cache or unit_root
>> tree. So split the read/write phase by rwlock would be an approach to
>> make it more parallel.
>>
>> BTW, the IPC metrics can gain around 9x in our test server with 220
>> cores. The benchmark we used is https://github.com/rwesson/NEAT
>>
>
>> +#define RD_TO_WRLOCK(rwlock) \
>> + RWUNLOCK (rwlock);\
>> + WRLOCK (rwlock);
>> +#endif
>> +
>
>
>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
>> 82664dc5f98..4312c5f36de 100644
>> --- a/libgfortran/io/unit.c
>> +++ b/libgfortran/io/unit.c
>
>> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create)
>> int c, created = 0;
>>
>> NOTE ("Unit n=%d, do_create = %d", n, do_create);
>> - LOCK (&unit_lock);
>> + RDLOCK (&unit_rwlock);
>>
>> retry:
>> for (c = 0; c < CACHE_SIZE; c++)
>> @@ -350,6 +356,7 @@ retry:
>> if (c == 0)
>> break;
>> }
>> + RD_TO_WRLOCK (&unit_rwlock);
>
> So I'm trying to convince myself why it's safe to unlock and only then take the write lock.
>
> Can you please elaborate/confirm why that's ok?
>
> I wouldn't mind a comment like
> We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in
> do_create.
> Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now.
>
> We don't take the write bit in *addition* to the read lock because..
>
> (that needlessly complicates releasing the respective locks / it triggers too much contention when we..
> / ...?)
>
> thanks,
>
>>
>> if (p == NULL && do_create)
>> {
>> @@ -368,8 +375,8 @@ retry:
>> if (created)
>> {
>> /* Newly created units have their lock held already
>> - from insert_unit. Just unlock UNIT_LOCK and return. */
>> - UNLOCK (&unit_lock);
>> + from insert_unit. Just unlock UNIT_RWLOCK and return. */
>> + RWUNLOCK (&unit_rwlock);
>> return p;
>> }
>>
>> @@ -380,7 +387,7 @@ found:
>> if (! TRYLOCK (&p->lock))
>> {
>> /* assert (p->closed == 0); */
>> - UNLOCK (&unit_lock);
>> + RWUNLOCK (&unit_rwlock);
>> return p;
>> }
>>
>> @@ -388,14 +395,14 @@ found:
>> }
>>
>>
>> - UNLOCK (&unit_lock);
>> + RWUNLOCK (&unit_rwlock);
>>
>> if (p != NULL && (p->child_dtio == 0))
>> {
>> LOCK (&p->lock);
>> if (p->closed)
>> {
>> - LOCK (&unit_lock);
>> + WRLOCK (&unit_rwlock);
>> UNLOCK (&p->lock);
>> if (predec_waiting_locked (p) == 0)
>> destroy_unit_mutex (p);
>> @@ -593,8 +600,8 @@ init_units (void)
>> #endif
>> #endif
>>
>> -#ifndef __GTHREAD_MUTEX_INIT
>> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
>> +#if (!defined(__GTHREAD_RWLOCK_INIT) &&
>> +!defined(__GTHREAD_MUTEX_INIT))
>> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
>> #endif
>>
>> if (sizeof (max_offset) == 8)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock 2023-04-20 13:13 [PATCH v3] libgfortran: Replace mutex with rwlock Zhu, Lipeng @ 2023-04-24 19:45 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 8+ messages in thread From: Bernhard Reutner-Fischer @ 2023-04-24 19:45 UTC (permalink / raw) To: Zhu, Lipeng Cc: rep.dot.nop, tkoenig, fortran, gcc-patches, hongjiu.lu, tianyou.li, pan.deng, wangyang.guo Hi! [please do not top-post] On Thu, 20 Apr 2023 21:13:08 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com> wrote: > Hi Bernhard, > > Thanks for your questions and suggestions. > The rwlock could allow multiple threads to have concurrent read-only > access to the cache/unit list, only a single writer is allowed. right. > Write lock will not be acquired until all read lock are released. So i must have confused rwlock with something else, something that allows self to hold a read-lock and upgrade that to a write-lock, purposely starving all successive incoming readers. I.e. just toggle your RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in some situations in the past. Doesn't seem to work with your rwlock, does it > And I didn't change the mutex scope when refactor the code, only make a > more fine-grained distinction for the read/write cache/unit list. Yes of course, i can see you did that. > I complete the comment according to your template, I will insert the > comment in the source code in next version patch with other refinement > by your suggestions. > " > Now we did not get a unit in cache and unit list, so we need to create a > new unit, and update it to cache and unit list. s/Now/By now/ or s/Now w/W/ and s/get/find/ " We did not find a unit in the cache nor in the unit list, create a new (locked) unit and insert into the unit list and cache. Manipulating either or both the unit list and the unit cache requires to hold a write-lock [for obvious reasons]" Superfluous when talking about pthread_rwlock_wrlock since that implies that even the process acquiring the wrlock has to first release it's very own rdlock. > Prior to update the cache and list, we need to release all read locks, > and then immediately to acquire write lock, thus ensure the exclusive > update to the cache and unit list. > Either way, we will manipulate the cache and/or the unit list so we must > take a write lock now. > We don't take the write bit in *addition* to the read lock because: > 1. It will needlessly complicate releasing the respective lock; Under pthread_rwlock_wrlock it will deadlock, so that's wrong? Drop that point then? If not, what's your reasoning / observation? Under my lock, you hold the R, additionally take the W and then immediately release the R because you yourself won't read, just write. But mine's not the pthread_rwlock you talk about, admittedly. > 2. By separate the read/write lock, it will greatly reduce the > contention at the read part, while write part is not always necessary or > most unlikely once the unit hit in cache; We know that. > 3. We try to balance the implementation complexity and the performance > gains that fit into current cases we observed. .. by just using a pthread_rwlock. And that's the top point iff you keep it at that. That's a fair step, sure. BTW, did you look at the RELEASE semantics, respectively the note that one day (and now is that very day), we might improve on the release semantics? Can of course be incremental AFAIC > " If folks agree on this first step then you have my OK with a catchy malloc and the discussion recorded here on the list. A second step would be RELEASE. And, depending on the underlying capabilities of available locks, further tweaks, obviously. PS: and, please, don't top-post thanks, > > Best Regards, > Zhu, Lipeng > > On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: > > On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote: > >> This patch try to introduce the rwlock and split the read/write to > >> unit_root tree and unit_cache with rwlock instead of the mutex to > >> increase CPU efficiency. In the get_gfc_unit function, the percentage > >> to step into the insert_unit function is around 30%, in most instances, > >> we can get the unit in the phase of reading the unit_cache or unit_root > >> tree. So split the read/write phase by rwlock would be an approach to > >> make it more parallel. > >> > >> BTW, the IPC metrics can gain around 9x in our test server with 220 > >> cores. The benchmark we used is https://github.com/rwesson/NEAT > >> > > > >> +#define RD_TO_WRLOCK(rwlock) \ > >> + RWUNLOCK (rwlock);\ > >> + WRLOCK (rwlock); > >> +#endif > >> + > > > > > >> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index > >> 82664dc5f98..4312c5f36de 100644 > >> --- a/libgfortran/io/unit.c > >> +++ b/libgfortran/io/unit.c > > > >> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) > >> int c, created = 0; > >> > >> NOTE ("Unit n=%d, do_create = %d", n, do_create); > >> - LOCK (&unit_lock); > >> + RDLOCK (&unit_rwlock); > >> > >> retry: > >> for (c = 0; c < CACHE_SIZE; c++) > >> @@ -350,6 +356,7 @@ retry: > >> if (c == 0) > >> break; > >> } > >> + RD_TO_WRLOCK (&unit_rwlock); > > > > So I'm trying to convince myself why it's safe to unlock and only then take the write lock. > > > > Can you please elaborate/confirm why that's ok? > > > > I wouldn't mind a comment like > > We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in > > do_create. > > Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now. > > > > We don't take the write bit in *addition* to the read lock because.. > > > > (that needlessly complicates releasing the respective locks / it triggers too much contention when we.. > > / ...?) > > > > thanks, > > > >> > >> if (p == NULL && do_create) > >> { > >> @@ -368,8 +375,8 @@ retry: > >> if (created) > >> { > >> /* Newly created units have their lock held already > >> - from insert_unit. Just unlock UNIT_LOCK and return. */ > >> - UNLOCK (&unit_lock); > >> + from insert_unit. Just unlock UNIT_RWLOCK and return. */ > >> + RWUNLOCK (&unit_rwlock); > >> return p; > >> } > >> > >> @@ -380,7 +387,7 @@ found: > >> if (! TRYLOCK (&p->lock)) > >> { > >> /* assert (p->closed == 0); */ > >> - UNLOCK (&unit_lock); > >> + RWUNLOCK (&unit_rwlock); > >> return p; > >> } > >> > >> @@ -388,14 +395,14 @@ found: > >> } > >> > >> > >> - UNLOCK (&unit_lock); > >> + RWUNLOCK (&unit_rwlock); > >> > >> if (p != NULL && (p->child_dtio == 0)) > >> { > >> LOCK (&p->lock); > >> if (p->closed) > >> { > >> - LOCK (&unit_lock); > >> + WRLOCK (&unit_rwlock); > >> UNLOCK (&p->lock); > >> if (predec_waiting_locked (p) == 0) > >> destroy_unit_mutex (p); > >> @@ -593,8 +600,8 @@ init_units (void) > >> #endif > >> #endif > >> > >> -#ifndef __GTHREAD_MUTEX_INIT > >> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); > >> +#if (!defined(__GTHREAD_RWLOCK_INIT) && > >> +!defined(__GTHREAD_MUTEX_INIT)) > >> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); > >> #endif > >> > >> if (sizeof (max_offset) == 8) > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock @ 2023-05-08 9:31 Zhu, Lipeng 2023-05-08 10:04 ` Bernhard Reutner-Fischer 0 siblings, 1 reply; 8+ messages in thread From: Zhu, Lipeng @ 2023-05-08 9:31 UTC (permalink / raw) To: Bernhard Reutner-Fischer Cc: tkoenig, fortran, gcc-patches, Lu, Hongjiu, Li, Tianyou, Deng, Pan, Guo, Wangyang On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: > Hi! > > [please do not top-post] > Sure, thanks for the reminder. > On Thu, 20 Apr 2023 21:13:08 +0800 > "Zhu, Lipeng" <lipeng.zhu@intel.com> wrote: > >> Hi Bernhard, >> >> Thanks for your questions and suggestions. >> The rwlock could allow multiple threads to have concurrent read-only >> access to the cache/unit list, only a single writer is allowed. > > right. > >> Write lock will not be acquired until all read lock are released. > > So i must have confused rwlock with something else, something that allows self to hold a read-lock and > upgrade that to a write-lock, purposely starving all successive incoming readers. I.e. just toggle your > RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in some situations in the past. > Doesn't seem to work with your rwlock, does it > Pthread API doesn't support the upgrade rdlock to wrlock directly, the calling thread may deadlock if at the time the call is made it holds the read-write lock (whether a read or write lock). https://linux.die.net/man/3/pthread_rwlock_wrlock. That is why we implement RD_TO_WRLOCK like this: release read lock firstly and then acquire write lock. >> And I didn't change the mutex scope when refactor the code, only make >> a more fine-grained distinction for the read/write cache/unit list. > > Yes of course, i can see you did that. > >> I complete the comment according to your template, I will insert the >> comment in the source code in next version patch with other refinement >> by your suggestions. >> " >> Now we did not get a unit in cache and unit list, so we need to create >> a new unit, and update it to cache and unit list. > > s/Now/By now/ or s/Now w/W/ and s/get/find/ " > We did not find a unit in the cache nor in the unit list, create a new > (locked) unit and insert into the unit list and cache. > Manipulating either or both the unit list and the unit cache requires to hold a write-lock [for obvious > reasons]" > > Superfluous when talking about pthread_rwlock_wrlock since that implies that even the process > acquiring the wrlock has to first release it's very own rdlock. > Thanks for the correction, I will update the comments in next v4 patch. >> Prior to update the cache and list, we need to release all read locks, >> and then immediately to acquire write lock, thus ensure the exclusive >> update to the cache and unit list. >> Either way, we will manipulate the cache and/or the unit list so we >> must take a write lock now. >> We don't take the write bit in *addition* to the read lock because: >> 1. It will needlessly complicate releasing the respective lock; > > Under pthread_rwlock_wrlock it will deadlock, so that's wrong? > Drop that point then? If not, what's your reasoning / observation? > > Under my lock, you hold the R, additionally take the W and then immediately release the R because you > yourself won't read, just write. > But mine's not the pthread_rwlock you talk about, admittedly. > Yes, pthread_rwlock doesn't support upgrade rdlock to wrlock directly, so we can’t hold rdlock while trying to acquire wrlock. I will drop the first point and update the comment accordingly. >> 2. By separate the read/write lock, it will greatly reduce the >> contention at the read part, while write part is not always necessary >> or most unlikely once the unit hit in cache; > > We know that. > >> 3. We try to balance the implementation complexity and the performance >> gains that fit into current cases we observed. > > .. by just using a pthread_rwlock. And that's the top point iff you keep it at that. That's a fair step, sure. > BTW, did you look at the RELEASE semantics, respectively the note that one day (and now is that very > day), we might improve on the release semantics? Can of course be incremental AFAIC > Not quite understand your question about looking at the RELEASE semantics. Can you be more specific? >> " > > If folks agree on this first step then you have my OK with a catchy malloc and the discussion recorded > here on the list. A second step would be RELEASE. > And, depending on the underlying capabilities of available locks, further tweaks, obviously. > > PS: and, please, don't top-post > > thanks, > >> >> Best Regards, >> Zhu, Lipeng >> >> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: >>> On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote: >>>> This patch try to introduce the rwlock and split the read/write to >>>> unit_root tree and unit_cache with rwlock instead of the mutex to >>>> increase CPU efficiency. In the get_gfc_unit function, the >>>> percentage to step into the insert_unit function is around 30%, in >>>> most instances, we can get the unit in the phase of reading the >>>> unit_cache or unit_root tree. So split the read/write phase by >>>> rwlock would be an approach to make it more parallel. >>>> >>>> BTW, the IPC metrics can gain around 9x in our test server with 220 >>>> cores. The benchmark we used is https://github.com/rwesson/NEAT >>>> >>> >>>> +#define RD_TO_WRLOCK(rwlock) \ >>>> + RWUNLOCK (rwlock);\ >>>> + WRLOCK (rwlock); >>>> +#endif >>>> + >>> >>> >>>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >>>> 82664dc5f98..4312c5f36de 100644 >>>> --- a/libgfortran/io/unit.c >>>> +++ b/libgfortran/io/unit.c >>> >>>> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) >>>> int c, created = 0; >>>> >>>> NOTE ("Unit n=%d, do_create = %d", n, do_create); >>>> - LOCK (&unit_lock); >>>> + RDLOCK (&unit_rwlock); >>>> >>>> retry: >>>> for (c = 0; c < CACHE_SIZE; c++) @@ -350,6 +356,7 @@ retry: >>>> if (c == 0) >>>> break; >>>> } >>>> + RD_TO_WRLOCK (&unit_rwlock); >>> >>> So I'm trying to convince myself why it's safe to unlock and only then take the write lock. >>> >>> Can you please elaborate/confirm why that's ok? >>> >>> I wouldn't mind a comment like >>> We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in >>> do_create. >>> Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now. >>> >>> We don't take the write bit in *addition* to the read lock because.. >>> >>> (that needlessly complicates releasing the respective locks / it triggers too much contention when > we.. >>> / ...?) >>> >>> thanks, >>> >>>> >>>> if (p == NULL && do_create) >>>> { >>>> @@ -368,8 +375,8 @@ retry: >>>> if (created) >>>> { >>>> /* Newly created units have their lock held already >>>> - from insert_unit. Just unlock UNIT_LOCK and return. */ >>>> - UNLOCK (&unit_lock); >>>> + from insert_unit. Just unlock UNIT_RWLOCK and return. */ >>>> + RWUNLOCK (&unit_rwlock); >>>> return p; >>>> } >>>> >>>> @@ -380,7 +387,7 @@ found: >>>> if (! TRYLOCK (&p->lock)) >>>> { >>>> /* assert (p->closed == 0); */ >>>> - UNLOCK (&unit_lock); >>>> + RWUNLOCK (&unit_rwlock); >>>> return p; >>>> } >>>> >>>> @@ -388,14 +395,14 @@ found: >>>> } >>>> >>>> >>>> - UNLOCK (&unit_lock); >>>> + RWUNLOCK (&unit_rwlock); >>>> >>>> if (p != NULL && (p->child_dtio == 0)) >>>> { >>>> LOCK (&p->lock); >>>> if (p->closed) >>>> { >>>> - LOCK (&unit_lock); >>>> + WRLOCK (&unit_rwlock); >>>> UNLOCK (&p->lock); >>>> if (predec_waiting_locked (p) == 0) >>>> destroy_unit_mutex (p); >>>> @@ -593,8 +600,8 @@ init_units (void) >>>> #endif >>>> #endif >>>> >>>> -#ifndef __GTHREAD_MUTEX_INIT >>>> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); >>>> +#if (!defined(__GTHREAD_RWLOCK_INIT) && >>>> +!defined(__GTHREAD_MUTEX_INIT)) >>>> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); >>>> #endif >>>> >>>> if (sizeof (max_offset) == 8) >>> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock 2023-05-08 9:31 Zhu, Lipeng @ 2023-05-08 10:04 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 8+ messages in thread From: Bernhard Reutner-Fischer @ 2023-05-08 10:04 UTC (permalink / raw) To: Zhu, Lipeng Cc: tkoenig, fortran, gcc-patches, Lu, Hongjiu, Li, Tianyou, Deng, Pan, Guo, Wangyang On Mon, 8 May 2023 17:31:27 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com> wrote: > > BTW, did you look at the RELEASE semantics, respectively the note that one day (and now is that very > > day), we might improve on the release semantics? Can of course be incremental AFAIC > > > Not quite understand your question about looking at the RELEASE > semantics. Can you be more specific? libgfortran/io/io.h: could be further optimized by making this be an __ATOMIC_RELEASE, But let's defer that to a follow-up improvement independently of the rwlock. thanks, ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20221230001607.2232962-1-lipeng.zhu () intel ! com>]
* [PATCH v3] libgfortran: Replace mutex with rwlock [not found] <20221230001607.2232962-1-lipeng.zhu () intel ! com> @ 2023-04-19 7:06 ` Lipeng Zhu 2023-04-19 12:51 ` Bernhard Reutner-Fischer 2023-04-19 21:49 ` Bernhard Reutner-Fischer 0 siblings, 2 replies; 8+ messages in thread From: Lipeng Zhu @ 2023-04-19 7:06 UTC (permalink / raw) To: tkoenig, fortran, gcc-patches Cc: hongjiu.lu, tianyou.li, pan.deng, wangyang.guo, Lipeng Zhu This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> --- libgcc/gthr-posix.h | 60 +++++++++++++++ libgfortran/io/async.c | 4 + libgfortran/io/async.h | 151 ++++++++++++++++++++++++++++++++++++++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 ++++++++-------- libgfortran/io/unix.c | 16 ++-- 7 files changed, 273 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) + return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else + return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) + return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else + return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) + return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else + return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) + return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else + return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) + return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else + return 0; +} +#endif + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c index 81d1d8175aa..a97272ce5e4 100644 --- a/libgfortran/io/async.c +++ b/libgfortran/io/async.c @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX); DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;) DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;) +#ifdef __GTHREAD_RWLOCK_INIT +DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;) +DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;) +#endif /* Current unit for asynchronous I/O. Needed for error reporting. */ diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index ad226c8e856..f3a6247f3bb 100644 --- a/libgfortran/io/async.h +++ b/libgfortran/io/async.h @@ -210,6 +210,128 @@ DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \ } while (0) +#ifdef __GTHREAD_RWLOCK_INIT +#define RWLOCK_DEBUG_ADD(rwlock) do { \ + aio_rwlock_debug *n; \ + n = malloc (sizeof(aio_rwlock_debug)); \ + n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \ + if (n->prev) \ + n->prev->next = n; \ + n->next = NULL; \ + n->line = __LINE__; \ + n->func = __FUNCTION__; \ + n->rw = rwlock; \ + if (!aio_rwlock_debug_head) { \ + aio_rwlock_debug_head = n; \ + } \ + } while (0) + +#define CHECK_RDLOCK(rwlock, status) do { \ + aio_rwlock_debug *curr; \ + INTERN_WRLOCK (&debug_queue_rwlock); \ + if (__gthread_rwlock_tryrdlock (rwlock)) { \ + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \ + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \ + } else \ + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \ + } \ + else { \ + __gthread_rwlock_unlock (rwlock); \ + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \ + } \ + INTERN_RWUNLOCK (&debug_queue_rwlock); \ + }while (0) + +#define CHECK_WRLOCK(rwlock, status) do { \ + aio_rwlock_debug *curr; \ + INTERN_WRLOCK (&debug_queue_rwlock); \ + if (__gthread_rwlock_trywrlock (rwlock)) { \ + if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \ + sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \ + } else \ + sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \ + } \ + else { \ + __gthread_rwlock_unlock (rwlock); \ + sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \ + } \ + INTERN_RWUNLOCK (&debug_queue_rwlock); \ + }while (0) + +#define TAIL_RWLOCK_DEBUG_QUEUE ({ \ + aio_rwlock_debug *curr = aio_rwlock_debug_head; \ + while (curr && curr->next) { \ + curr = curr->next; \ + } \ + curr; \ + }) + +#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({ \ + __label__ end; \ + aio_rwlock_debug *curr = aio_rwlock_debug_head; \ + while (curr) { \ + if (curr->rw == rwlock) { \ + goto end; \ + } \ + curr = curr->next; \ + } \ + end:; \ + curr; \ + }) + +#define RDLOCK(rwlock) do { \ + char status[200]; \ + CHECK_RDLOCK (rwlock, status); \ + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \ + DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \ + INTERN_RDLOCK (rwlock); \ + INTERN_WRLOCK (&debug_queue_rwlock); \ + RWLOCK_DEBUG_ADD (rwlock); \ + INTERN_RWUNLOCK (&debug_queue_rwlock); \ + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \ + } while (0) + +#define WRLOCK(rwlock) do { \ + char status[200]; \ + CHECK_WRLOCK (rwlock, status); \ + DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \ + DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \ + INTERN_WRLOCK (rwlock); \ + INTERN_WRLOCK (&debug_queue_rwlock); \ + RWLOCK_DEBUG_ADD (rwlock); \ + INTERN_RWUNLOCK (&debug_queue_rwlock); \ + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \ + } while (0) + +#define RWUNLOCK(rwlock) do { \ + aio_rwlock_debug *curr; \ + DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN "RWUNLOCK: " DEBUG_NORM #rwlock, \ + __FUNCTION__, __LINE__, (void *) rwlock); \ + INTERN_WRLOCK (&debug_queue_rwlock); \ + curr = IN_RWLOCK_DEBUG_QUEUE (rwlock); \ + if (curr) \ + { \ + if (curr->prev) \ + curr->prev->next = curr->next; \ + if (curr->next) { \ + curr->next->prev = curr->prev; \ + if (curr == aio_rwlock_debug_head) \ + aio_rwlock_debug_head = curr->next; \ + } else { \ + if (curr == aio_rwlock_debug_head) \ + aio_rwlock_debug_head = NULL; \ + } \ + free (curr); \ + } \ + INTERN_RWUNLOCK (&debug_queue_rwlock); \ + INTERN_RWUNLOCK (rwlock); \ + } while (0) + +#define RD_TO_WRLOCK(rwlock) \ + RWUNLOCK (rwlock);\ + WRLOCK (rwlock); +#endif + #define DEBUG_LINE(...) __VA_ARGS__ #else @@ -221,12 +343,31 @@ #define LOCK(mutex) INTERN_LOCK (mutex) #define UNLOCK(mutex) INTERN_UNLOCK (mutex) #define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex)) +#ifdef __GTHREAD_RWLOCK_INIT +#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock) +#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock) +#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock) +#define RD_TO_WRLOCK(rwlock) \ + RWUNLOCK (rwlock);\ + WRLOCK (rwlock); +#endif +#endif + +#ifndef __GTHREAD_RWLOCK_INIT +#define RDLOCK(rwlock) LOCK (rwlock) +#define WRLOCK(rwlock) LOCK (rwlock) +#define RWUNLOCK(rwlock) UNLOCK (rwlock) +#define RD_TO_WRLOCK(rwlock) {} #endif #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex); #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex); +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock); +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock); +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock); + #if ASYNC_IO /* au->lock has to be held when calling this macro. */ @@ -288,8 +429,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{ struct aio_lock_debug *prev; } aio_lock_debug;) +DEBUG_LINE (typedef struct aio_rwlock_debug{ + __gthread_rwlock_t *rw; + int line; + const char *func; + struct aio_rwlock_debug *next; + struct aio_rwlock_debug *prev; +} aio_rwlock_debug;) + DEBUG_LINE (extern aio_lock_debug *aio_debug_head;) DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;) +DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;) +DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;) /* Thread - local storage of the current unit we are looking at. Needed for error reporting. */ diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index ecdf1dd3f05..15daa0995b1 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -690,7 +690,7 @@ typedef struct gfc_unit from the UNIT_ROOT tree, but doesn't free it and the last of the waiting threads will do that. This must be either atomically increased/decreased, or - always guarded by UNIT_LOCK. */ + always guarded by UNIT_RWLOCK. */ int waiting; /* Flag set by close_unit if the unit as been closed. Must be manipulated under unit's lock. */ @@ -769,8 +769,13 @@ internal_proto(default_recl); extern gfc_unit *unit_root; internal_proto(unit_root); -extern __gthread_mutex_t unit_lock; -internal_proto(unit_lock); +#ifdef __GTHREAD_RWLOCK_INIT +extern __gthread_rwlock_t unit_rwlock; +internal_proto(unit_rwlock); +#else +extern __gthread_mutex_t unit_rwlock; +internal_proto(unit_rwlock); +#endif extern int close_unit (gfc_unit *); internal_proto(close_unit); @@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - __gthread_mutex_lock (&unit_lock); + WRLOCK (&unit_rwlock); u->waiting--; - __gthread_mutex_unlock (&unit_lock); + RWUNLOCK (&unit_rwlock); #endif } diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 8bb5d1101ca..b01f45b80f6 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -4539,9 +4539,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock) if (free_newunit) { /* Avoid inverse lock issues by placing after unlock_unit. */ - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); newunit_free (dtp->common.unit); - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); } } @@ -4636,9 +4636,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock) if (free_newunit) { /* Avoid inverse lock issues by placing after unlock_unit. */ - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); newunit_free (dtp->common.unit); - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); } } diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index 82664dc5f98..4312c5f36de 100644 --- a/libgfortran/io/unit.c +++ b/libgfortran/io/unit.c @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* IO locking rules: - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. + And use the rwlock to spilt read and write phase to UNIT_ROOT tree + and UNIT_CACHE to increase CPU efficiency. Concurrent use of different units should be supported, so each unit has its own lock, LOCK. Open should be atomic with its reopening of units and list_read.c in several places needs find_unit another unit while holding stdin - unit's lock, so it must be possible to acquire UNIT_LOCK while holding + unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding some unit's lock. Therefore to avoid deadlocks, it is forbidden - to acquire unit's private locks while holding UNIT_LOCK, except + to acquire unit's private locks while holding UNIT_RWLOCK, except for freshly created units (where no other thread can get at their address yet) or when using just trylock rather than lock operation. In addition to unit's private lock each unit has a WAITERS counter and CLOSED flag. WAITERS counter must be either only atomically incremented/decremented in all places (if atomic builtins - are supported), or protected by UNIT_LOCK in all places (otherwise). + are supported), or protected by UNIT_RWLOCK in all places (otherwise). CLOSED flag must be always protected by unit's LOCK. - After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held, + After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held, WAITERS must be incremented to avoid concurrent close from freeing - the unit between unlocking UNIT_LOCK and acquiring unit's LOCK. - Unit freeing is always done under UNIT_LOCK. If close_unit sees any + the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK. + Unit freeing is always done under UNIT_RWLOCK. If close_unit sees any WAITERS, it doesn't free the unit but instead sets the CLOSED flag and the thread that decrements WAITERS to zero while CLOSED flag is - set is responsible for freeing it (while holding UNIT_LOCK). + set is responsible for freeing it (while holding UNIT_RWLOCK). flush_all_units operation is iterating over the unit tree with - increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to + increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to flush each unit (and therefore needs the unit's LOCK held as well). To avoid deadlocks, it just trylocks the LOCK and if unsuccessful, - remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires - unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with + remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires + unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with the smallest UNIT_NUMBER above the last one flushed. If find_unit/find_or_create_unit/find_file/get_unit routines return @@ -101,10 +103,14 @@ gfc_offset max_offset; gfc_offset default_recl; gfc_unit *unit_root; +#ifdef __GTHREAD_RWLOCK_INIT +__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT; +#else #ifdef __GTHREAD_MUTEX_INIT -__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT; +__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT; #else -__gthread_mutex_t unit_lock; +__gthread_mutex_t unit_rwlock; +#endif #endif /* We use these filenames for error reporting. */ @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) int c, created = 0; NOTE ("Unit n=%d, do_create = %d", n, do_create); - LOCK (&unit_lock); + RDLOCK (&unit_rwlock); retry: for (c = 0; c < CACHE_SIZE; c++) @@ -350,6 +356,7 @@ retry: if (c == 0) break; } + RD_TO_WRLOCK (&unit_rwlock); if (p == NULL && do_create) { @@ -368,8 +375,8 @@ retry: if (created) { /* Newly created units have their lock held already - from insert_unit. Just unlock UNIT_LOCK and return. */ - UNLOCK (&unit_lock); + from insert_unit. Just unlock UNIT_RWLOCK and return. */ + RWUNLOCK (&unit_rwlock); return p; } @@ -380,7 +387,7 @@ found: if (! TRYLOCK (&p->lock)) { /* assert (p->closed == 0); */ - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); return p; } @@ -388,14 +395,14 @@ found: } - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); if (p != NULL && (p->child_dtio == 0)) { LOCK (&p->lock); if (p->closed) { - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); UNLOCK (&p->lock); if (predec_waiting_locked (p) == 0) destroy_unit_mutex (p); @@ -593,8 +600,8 @@ init_units (void) #endif #endif -#ifndef __GTHREAD_MUTEX_INIT - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); +#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT)) + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); #endif if (sizeof (max_offset) == 8) @@ -731,7 +738,7 @@ close_unit_1 (gfc_unit *u, int locked) u->closed = 1; if (!locked) - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); for (i = 0; i < CACHE_SIZE; i++) if (unit_cache[i] == u) @@ -758,7 +765,7 @@ close_unit_1 (gfc_unit *u, int locked) destroy_unit_mutex (u); if (!locked) - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); return rc; } @@ -795,10 +802,10 @@ close_unit (gfc_unit *u) void close_units (void) { - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); while (unit_root != NULL) close_unit_1 (unit_root, 1); - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); free (newunits); @@ -905,7 +912,7 @@ finish_last_advance_record (gfc_unit *u) int newunit_alloc (void) { - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); if (!newunits) { newunits = xcalloc (16, 1); @@ -919,7 +926,7 @@ newunit_alloc (void) { newunits[ii] = true; newunit_lwi = ii + 1; - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); return -ii + NEWUNIT_START; } } @@ -932,12 +939,12 @@ newunit_alloc (void) memset (newunits + old_size, 0, old_size); newunits[old_size] = true; newunit_lwi = old_size + 1; - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); return -old_size + NEWUNIT_START; } -/* Free a previously allocated newunit= unit number. unit_lock must +/* Free a previously allocated newunit= unit number. unit_rwlock must be held when calling. */ void diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c index ba12be08252..d53003919ab 100644 --- a/libgfortran/io/unix.c +++ b/libgfortran/io/unix.c @@ -1774,7 +1774,7 @@ find_file (const char *file, gfc_charlen_type file_len) id = id_from_path (path); #endif - LOCK (&unit_lock); + RDLOCK (&unit_rwlock); retry: u = find_file0 (unit_root, FIND_FILE0_ARGS); if (u != NULL) @@ -1783,19 +1783,19 @@ retry: if (! __gthread_mutex_trylock (&u->lock)) { /* assert (u->closed == 0); */ - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); goto done; } inc_waiting_locked (u); } - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); if (u != NULL) { LOCK (&u->lock); if (u->closed) { - LOCK (&unit_lock); + RDLOCK (&unit_rwlock); UNLOCK (&u->lock); if (predec_waiting_locked (u) == 0) free (u); @@ -1839,13 +1839,13 @@ flush_all_units (void) gfc_unit *u; int min_unit = 0; - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); do { u = flush_all_units_1 (unit_root, min_unit); if (u != NULL) inc_waiting_locked (u); - UNLOCK (&unit_lock); + RWUNLOCK (&unit_rwlock); if (u == NULL) return; @@ -1856,13 +1856,13 @@ flush_all_units (void) if (u->closed == 0) { sflush (u->s); - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); UNLOCK (&u->lock); (void) predec_waiting_locked (u); } else { - LOCK (&unit_lock); + WRLOCK (&unit_rwlock); UNLOCK (&u->lock); if (predec_waiting_locked (u) == 0) free (u); -- 2.39.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock 2023-04-19 7:06 ` Lipeng Zhu @ 2023-04-19 12:51 ` Bernhard Reutner-Fischer 2023-04-19 14:50 ` Bernhard Reutner-Fischer 2023-04-19 21:49 ` Bernhard Reutner-Fischer 1 sibling, 1 reply; 8+ messages in thread From: Bernhard Reutner-Fischer @ 2023-04-19 12:51 UTC (permalink / raw) To: Lipeng Zhu, Lipeng Zhu via Fortran, tkoenig, fortran, gcc-patches Cc: hongjiu.lu, tianyou.li, pan.deng, wangyang.guo On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote: >+#ifdef __GTHREAD_RWLOCK_INIT >+#define RWLOCK_DEBUG_ADD(rwlock) do { \ >+ aio_rwlock_debug *n; \ >+ n = malloc (sizeof(aio_rwlock_debug)); \ My malloc can fail. >+ n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \ >+ if (n->prev) \ >+ n->prev->next = n; \ >+ n->next = NULL; \ >+ n->line = __LINE__; \ >+ n->func = __FUNCTION__; \ >+ n->rw = rwlock; \ >+ if (!aio_rwlock_debug_head) { \ >+ aio_rwlock_debug_head = n; \ >+ } \ >+ } while (0) >+ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock 2023-04-19 12:51 ` Bernhard Reutner-Fischer @ 2023-04-19 14:50 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 8+ messages in thread From: Bernhard Reutner-Fischer @ 2023-04-19 14:50 UTC (permalink / raw) To: Lipeng Zhu, Lipeng Zhu via Fortran, tkoenig, gcc-patches Cc: hongjiu.lu, tianyou.li, pan.deng, wangyang.guo On Wed, 19 Apr 2023 at 14:51, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote: > > On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote: > > >+#ifdef __GTHREAD_RWLOCK_INIT > >+#define RWLOCK_DEBUG_ADD(rwlock) do { \ > >+ aio_rwlock_debug *n; \ > >+ n = malloc (sizeof(aio_rwlock_debug)); \ > > My malloc can fail. Sorry, i hit send too early. Please use xmalloc as defined in libgfortran/runtime/memory.c PS: IIRC we have likely() / unlikely() macros in libgfortran, so you may want to check if you want to annotate some conditions accordingly if predict gets it wrong. thanks, > > >+ n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \ > >+ if (n->prev) \ > >+ n->prev->next = n; \ > >+ n->next = NULL; \ > >+ n->line = __LINE__; \ > >+ n->func = __FUNCTION__; \ > >+ n->rw = rwlock; \ > >+ if (!aio_rwlock_debug_head) { \ > >+ aio_rwlock_debug_head = n; \ > >+ } \ > >+ } while (0) > >+ > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] libgfortran: Replace mutex with rwlock 2023-04-19 7:06 ` Lipeng Zhu 2023-04-19 12:51 ` Bernhard Reutner-Fischer @ 2023-04-19 21:49 ` Bernhard Reutner-Fischer 1 sibling, 0 replies; 8+ messages in thread From: Bernhard Reutner-Fischer @ 2023-04-19 21:49 UTC (permalink / raw) To: Lipeng Zhu, Lipeng Zhu via Fortran, tkoenig, fortran, gcc-patches Cc: hongjiu.lu, tianyou.li, pan.deng, wangyang.guo On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote: >This patch try to introduce the rwlock and split the read/write to >unit_root tree and unit_cache with rwlock instead of the mutex to >increase CPU efficiency. In the get_gfc_unit function, the percentage >to step into the insert_unit function is around 30%, in most instances, >we can get the unit in the phase of reading the unit_cache or unit_root >tree. So split the read/write phase by rwlock would be an approach to >make it more parallel. > >BTW, the IPC metrics can gain around 9x in our test >server with 220 cores. The benchmark we used is >https://github.com/rwesson/NEAT > >+#define RD_TO_WRLOCK(rwlock) \ >+ RWUNLOCK (rwlock);\ >+ WRLOCK (rwlock); >+#endif >+ >diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c >index 82664dc5f98..4312c5f36de 100644 >--- a/libgfortran/io/unit.c >+++ b/libgfortran/io/unit.c >@@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) > int c, created = 0; > > NOTE ("Unit n=%d, do_create = %d", n, do_create); >- LOCK (&unit_lock); >+ RDLOCK (&unit_rwlock); > > retry: > for (c = 0; c < CACHE_SIZE; c++) >@@ -350,6 +356,7 @@ retry: > if (c == 0) > break; > } >+ RD_TO_WRLOCK (&unit_rwlock); So I'm trying to convince myself why it's safe to unlock and only then take the write lock. Can you please elaborate/confirm why that's ok? I wouldn't mind a comment like We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in do_create. Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now. We don't take the write bit in *addition* to the read lock because.. (that needlessly complicates releasing the respective locks / it triggers too much contention when we.. / ...?) thanks, > > if (p == NULL && do_create) > { >@@ -368,8 +375,8 @@ retry: > if (created) > { > /* Newly created units have their lock held already >- from insert_unit. Just unlock UNIT_LOCK and return. */ >- UNLOCK (&unit_lock); >+ from insert_unit. Just unlock UNIT_RWLOCK and return. */ >+ RWUNLOCK (&unit_rwlock); > return p; > } > >@@ -380,7 +387,7 @@ found: > if (! TRYLOCK (&p->lock)) > { > /* assert (p->closed == 0); */ >- UNLOCK (&unit_lock); >+ RWUNLOCK (&unit_rwlock); > return p; > } > >@@ -388,14 +395,14 @@ found: > } > > >- UNLOCK (&unit_lock); >+ RWUNLOCK (&unit_rwlock); > > if (p != NULL && (p->child_dtio == 0)) > { > LOCK (&p->lock); > if (p->closed) > { >- LOCK (&unit_lock); >+ WRLOCK (&unit_rwlock); > UNLOCK (&p->lock); > if (predec_waiting_locked (p) == 0) > destroy_unit_mutex (p); >@@ -593,8 +600,8 @@ init_units (void) > #endif > #endif > >-#ifndef __GTHREAD_MUTEX_INIT >- __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); >+#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT)) >+ __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); > #endif > > if (sizeof (max_offset) == 8) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-05-08 10:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-20 13:13 [PATCH v3] libgfortran: Replace mutex with rwlock Zhu, Lipeng 2023-04-24 19:45 ` Bernhard Reutner-Fischer -- strict thread matches above, loose matches on Subject: below -- 2023-05-08 9:31 Zhu, Lipeng 2023-05-08 10:04 ` Bernhard Reutner-Fischer [not found] <20221230001607.2232962-1-lipeng.zhu () intel ! com> 2023-04-19 7:06 ` Lipeng Zhu 2023-04-19 12:51 ` Bernhard Reutner-Fischer 2023-04-19 14:50 ` Bernhard Reutner-Fischer 2023-04-19 21:49 ` Bernhard Reutner-Fischer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).