From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 694423858D35; Mon, 8 May 2023 09:31:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 694423858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683538293; x=1715074293; h=message-id:date:mime-version:subject:to:cc:from: content-transfer-encoding; bh=PNJY1/RSGSEURlxF4trDxTdjlk1+uZtrOxv5hFmrsKw=; b=TZVMtKr+3dNJVJGpPYT5M2jmpxy6JnMOZaH4tivTkpac//9W7SfesaKN TPOJa2SVhbEU/YlQvhDrQVPf05u3OKI1yO9bhgDqevao7uikGwvAhsQLX V9kv2aHo5GnpUuMuxx6SXK+JQxxwxkepoXi4jPquUuvj4jT5y8Kdi9PMO PpAVVokQjE7xmjNpwR63c+7vNdaZFkRLJstpCttiQ7C1tZl6F+ZIU9cR7 1RYbXjyedlKJYtFZKEJ0aLUPgTlplWrxxA2cOucMFmlBSNDCTWxDH7GW8 jszsyJDkvGEtj7myGehL1wSBuerqV4o1iv73L6SXPwQbYlK2MJcSk6xWy A==; X-IronPort-AV: E=McAfee;i="6600,9927,10703"; a="349631676" X-IronPort-AV: E=Sophos;i="5.99,258,1677571200"; d="scan'208";a="349631676" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 May 2023 02:31:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10703"; a="701351898" X-IronPort-AV: E=Sophos;i="5.99,258,1677571200"; d="scan'208";a="701351898" Received: from zhulipeng-win.ccr.corp.intel.com (HELO [10.238.2.93]) ([10.238.2.93]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 May 2023 02:31:30 -0700 Message-ID: Date: Mon, 8 May 2023 17:31:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.10.0 Subject: Re: [PATCH v3] libgfortran: Replace mutex with rwlock Content-Language: en-US To: Bernhard Reutner-Fischer Cc: tkoenig@netcologne.de, fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, "Lu, Hongjiu" , "Li, Tianyou" , "Deng, Pan" , "Guo, Wangyang" From: "Zhu, Lipeng" Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: > Hi! > > [please do not top-post] > Sure, thanks for the reminder. > On Thu, 20 Apr 2023 21:13:08 +0800 > "Zhu, Lipeng" wrote: > >> Hi Bernhard, >> >> Thanks for your questions and suggestions. >> The rwlock could allow multiple threads to have concurrent read-only >> access to the cache/unit list, only a single writer is allowed. > > right. > >> Write lock will not be acquired until all read lock are released. > > So i must have confused rwlock with something else, something that allows self to hold a read-lock and > upgrade that to a write-lock, purposely starving all successive incoming readers. I.e. just toggle your > RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in some situations in the past. > Doesn't seem to work with your rwlock, does it > Pthread API doesn't support the upgrade rdlock to wrlock directly, the calling thread may deadlock if at the time the call is made it holds the read-write lock (whether a read or write lock). https://linux.die.net/man/3/pthread_rwlock_wrlock. That is why we implement RD_TO_WRLOCK like this: release read lock firstly and then acquire write lock. >> And I didn't change the mutex scope when refactor the code, only make >> a more fine-grained distinction for the read/write cache/unit list. > > Yes of course, i can see you did that. > >> I complete the comment according to your template, I will insert the >> comment in the source code in next version patch with other refinement >> by your suggestions. >> " >> Now we did not get a unit in cache and unit list, so we need to create >> a new unit, and update it to cache and unit list. > > s/Now/By now/ or s/Now w/W/ and s/get/find/ " > We did not find a unit in the cache nor in the unit list, create a new > (locked) unit and insert into the unit list and cache. > Manipulating either or both the unit list and the unit cache requires to hold a write-lock [for obvious > reasons]" > > Superfluous when talking about pthread_rwlock_wrlock since that implies that even the process > acquiring the wrlock has to first release it's very own rdlock. > Thanks for the correction, I will update the comments in next v4 patch. >> Prior to update the cache and list, we need to release all read locks, >> and then immediately to acquire write lock, thus ensure the exclusive >> update to the cache and unit list. >> Either way, we will manipulate the cache and/or the unit list so we >> must take a write lock now. >> We don't take the write bit in *addition* to the read lock because: >> 1. It will needlessly complicate releasing the respective lock; > > Under pthread_rwlock_wrlock it will deadlock, so that's wrong? > Drop that point then? If not, what's your reasoning / observation? > > Under my lock, you hold the R, additionally take the W and then immediately release the R because you > yourself won't read, just write. > But mine's not the pthread_rwlock you talk about, admittedly. > Yes, pthread_rwlock doesn't support upgrade rdlock to wrlock directly, so we can’t hold rdlock while trying to acquire wrlock. I will drop the first point and update the comment accordingly. >> 2. By separate the read/write lock, it will greatly reduce the >> contention at the read part, while write part is not always necessary >> or most unlikely once the unit hit in cache; > > We know that. > >> 3. We try to balance the implementation complexity and the performance >> gains that fit into current cases we observed. > > .. by just using a pthread_rwlock. And that's the top point iff you keep it at that. That's a fair step, sure. > BTW, did you look at the RELEASE semantics, respectively the note that one day (and now is that very > day), we might improve on the release semantics? Can of course be incremental AFAIC > Not quite understand your question about looking at the RELEASE semantics. Can you be more specific? >> " > > If folks agree on this first step then you have my OK with a catchy malloc and the discussion recorded > here on the list. A second step would be RELEASE. > And, depending on the underlying capabilities of available locks, further tweaks, obviously. > > PS: and, please, don't top-post > > thanks, > >> >> Best Regards, >> Zhu, Lipeng >> >> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: >>> On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran wrote: >>>> This patch try to introduce the rwlock and split the read/write to >>>> unit_root tree and unit_cache with rwlock instead of the mutex to >>>> increase CPU efficiency. In the get_gfc_unit function, the >>>> percentage to step into the insert_unit function is around 30%, in >>>> most instances, we can get the unit in the phase of reading the >>>> unit_cache or unit_root tree. So split the read/write phase by >>>> rwlock would be an approach to make it more parallel. >>>> >>>> BTW, the IPC metrics can gain around 9x in our test server with 220 >>>> cores. The benchmark we used is https://github.com/rwesson/NEAT >>>> >>> >>>> +#define RD_TO_WRLOCK(rwlock) \ >>>> + RWUNLOCK (rwlock);\ >>>> + WRLOCK (rwlock); >>>> +#endif >>>> + >>> >>> >>>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >>>> 82664dc5f98..4312c5f36de 100644 >>>> --- a/libgfortran/io/unit.c >>>> +++ b/libgfortran/io/unit.c >>> >>>> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) >>>> int c, created = 0; >>>> >>>> NOTE ("Unit n=%d, do_create = %d", n, do_create); >>>> - LOCK (&unit_lock); >>>> + RDLOCK (&unit_rwlock); >>>> >>>> retry: >>>> for (c = 0; c < CACHE_SIZE; c++) @@ -350,6 +356,7 @@ retry: >>>> if (c == 0) >>>> break; >>>> } >>>> + RD_TO_WRLOCK (&unit_rwlock); >>> >>> So I'm trying to convince myself why it's safe to unlock and only then take the write lock. >>> >>> Can you please elaborate/confirm why that's ok? >>> >>> I wouldn't mind a comment like >>> We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in >>> do_create. >>> Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now. >>> >>> We don't take the write bit in *addition* to the read lock because.. >>> >>> (that needlessly complicates releasing the respective locks / it triggers too much contention when > we.. >>> / ...?) >>> >>> thanks, >>> >>>> >>>> if (p == NULL && do_create) >>>> { >>>> @@ -368,8 +375,8 @@ retry: >>>> if (created) >>>> { >>>> /* Newly created units have their lock held already >>>> - from insert_unit. Just unlock UNIT_LOCK and return. */ >>>> - UNLOCK (&unit_lock); >>>> + from insert_unit. Just unlock UNIT_RWLOCK and return. */ >>>> + RWUNLOCK (&unit_rwlock); >>>> return p; >>>> } >>>> >>>> @@ -380,7 +387,7 @@ found: >>>> if (! TRYLOCK (&p->lock)) >>>> { >>>> /* assert (p->closed == 0); */ >>>> - UNLOCK (&unit_lock); >>>> + RWUNLOCK (&unit_rwlock); >>>> return p; >>>> } >>>> >>>> @@ -388,14 +395,14 @@ found: >>>> } >>>> >>>> >>>> - UNLOCK (&unit_lock); >>>> + RWUNLOCK (&unit_rwlock); >>>> >>>> if (p != NULL && (p->child_dtio == 0)) >>>> { >>>> LOCK (&p->lock); >>>> if (p->closed) >>>> { >>>> - LOCK (&unit_lock); >>>> + WRLOCK (&unit_rwlock); >>>> UNLOCK (&p->lock); >>>> if (predec_waiting_locked (p) == 0) >>>> destroy_unit_mutex (p); >>>> @@ -593,8 +600,8 @@ init_units (void) >>>> #endif >>>> #endif >>>> >>>> -#ifndef __GTHREAD_MUTEX_INIT >>>> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); >>>> +#if (!defined(__GTHREAD_RWLOCK_INIT) && >>>> +!defined(__GTHREAD_MUTEX_INIT)) >>>> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); >>>> #endif >>>> >>>> if (sizeof (max_offset) == 8) >>> >