From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 9782E3858D37; Thu, 20 Apr 2023 13:13:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9782E3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681996395; x=1713532395; h=message-id:date:mime-version:subject:to:cc:from: content-transfer-encoding; bh=BdvFbl7WToeu8qvMYmVduNm6Ysn2GKG7rDvWRBA2a1g=; b=YzbwUZTg4V2bCu1OnvthoBbcaNVdTZV1ryHpoHmYTuAqx3V3Kjy+riZ4 y/t62RZ9+ldY1GcFq5rnPs+mFnumW3q4T64g2jIB4HPvDbTwdyIFbgK+4 JQjdJOyDBOcdHr8gILNpwKH8R35wlp2VL+qJP22lblK57cIF2B7/JZfHG FUtvT7f3Knokf0NjpZ76qh8Kp0xnLoLlH8XcZAXwud8nmmrOdY1zE1y66 n1/FtI6mDNIQ69R0QouLs7kjMqcTozpiMRKplJQto3+5pj+bmL3lWwh+a Vn2+UfBuEyxoTxyPb4zaY6PXzhstQxohN8UM4d6f+N3Yicl5pvPwJhq2F A==; X-IronPort-AV: E=McAfee;i="6600,9927,10686"; a="408642532" X-IronPort-AV: E=Sophos;i="5.99,212,1677571200"; d="scan'208";a="408642532" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Apr 2023 06:13:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10686"; a="724417653" X-IronPort-AV: E=Sophos;i="5.99,212,1677571200"; d="scan'208";a="724417653" Received: from jlu19-mobl1.ccr.corp.intel.com (HELO [10.255.29.145]) ([10.255.29.145]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Apr 2023 06:13:11 -0700 Message-ID: Date: Thu, 20 Apr 2023 21:13:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.10.0 Subject: Re: [PATCH v3] libgfortran: Replace mutex with rwlock Content-Language: en-US To: Bernhard Reutner-Fischer , tkoenig@netcologne.de, fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: hongjiu.lu@intel.com, tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com From: "Zhu, Lipeng" Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Bernhard, Thanks for your questions and suggestions. The rwlock could allow multiple threads to have concurrent read-only access to the cache/unit list, only a single writer is allowed. Write lock will not be acquired until all read lock are released. And I didn't change the mutex scope when refactor the code, only make a more fine-grained distinction for the read/write cache/unit list. I complete the comment according to your template, I will insert the comment in the source code in next version patch with other refinement by your suggestions. " Now we did not get a unit in cache and unit list, so we need to create a new unit, and update it to cache and unit list. Prior to update the cache and list, we need to release all read locks, and then immediately to acquire write lock, thus ensure the exclusive update to the cache and unit list. Either way, we will manipulate the cache and/or the unit list so we must take a write lock now. We don't take the write bit in *addition* to the read lock because: 1. It will needlessly complicate releasing the respective lock; 2. By separate the read/write lock, it will greatly reduce the contention at the read part, while write part is not always necessary or most unlikely once the unit hit in cache; 3. We try to balance the implementation complexity and the performance gains that fit into current cases we observed. " Best Regards, Zhu, Lipeng On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: > On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran wrote: >> This patch try to introduce the rwlock and split the read/write to >> unit_root tree and unit_cache with rwlock instead of the mutex to >> increase CPU efficiency. In the get_gfc_unit function, the percentage >> to step into the insert_unit function is around 30%, in most instances, >> we can get the unit in the phase of reading the unit_cache or unit_root >> tree. So split the read/write phase by rwlock would be an approach to >> make it more parallel. >> >> BTW, the IPC metrics can gain around 9x in our test server with 220 >> cores. The benchmark we used is https://github.com/rwesson/NEAT >> > >> +#define RD_TO_WRLOCK(rwlock) \ >> + RWUNLOCK (rwlock);\ >> + WRLOCK (rwlock); >> +#endif >> + > > >> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >> 82664dc5f98..4312c5f36de 100644 >> --- a/libgfortran/io/unit.c >> +++ b/libgfortran/io/unit.c > >> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create) >> int c, created = 0; >> >> NOTE ("Unit n=%d, do_create = %d", n, do_create); >> - LOCK (&unit_lock); >> + RDLOCK (&unit_rwlock); >> >> retry: >> for (c = 0; c < CACHE_SIZE; c++) >> @@ -350,6 +356,7 @@ retry: >> if (c == 0) >> break; >> } >> + RD_TO_WRLOCK (&unit_rwlock); > > So I'm trying to convince myself why it's safe to unlock and only then take the write lock. > > Can you please elaborate/confirm why that's ok? > > I wouldn't mind a comment like > We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in > do_create. > Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now. > > We don't take the write bit in *addition* to the read lock because.. > > (that needlessly complicates releasing the respective locks / it triggers too much contention when we.. > / ...?) > > thanks, > >> >> if (p == NULL && do_create) >> { >> @@ -368,8 +375,8 @@ retry: >> if (created) >> { >> /* Newly created units have their lock held already >> - from insert_unit. Just unlock UNIT_LOCK and return. */ >> - UNLOCK (&unit_lock); >> + from insert_unit. Just unlock UNIT_RWLOCK and return. */ >> + RWUNLOCK (&unit_rwlock); >> return p; >> } >> >> @@ -380,7 +387,7 @@ found: >> if (! TRYLOCK (&p->lock)) >> { >> /* assert (p->closed == 0); */ >> - UNLOCK (&unit_lock); >> + RWUNLOCK (&unit_rwlock); >> return p; >> } >> >> @@ -388,14 +395,14 @@ found: >> } >> >> >> - UNLOCK (&unit_lock); >> + RWUNLOCK (&unit_rwlock); >> >> if (p != NULL && (p->child_dtio == 0)) >> { >> LOCK (&p->lock); >> if (p->closed) >> { >> - LOCK (&unit_lock); >> + WRLOCK (&unit_rwlock); >> UNLOCK (&p->lock); >> if (predec_waiting_locked (p) == 0) >> destroy_unit_mutex (p); >> @@ -593,8 +600,8 @@ init_units (void) >> #endif >> #endif >> >> -#ifndef __GTHREAD_MUTEX_INIT >> - __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock); >> +#if (!defined(__GTHREAD_RWLOCK_INIT) && >> +!defined(__GTHREAD_MUTEX_INIT)) >> + __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock); >> #endif >> >> if (sizeof (max_offset) == 8) >