From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=93gl=BM=intel.com=lipeng.zhu@sourceware.org>
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
	by sourceware.org (Postfix) with ESMTPS id B8B373858D1E;
	Tue, 23 May 2023 02:54:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B8B373858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1684810451; x=1716346451;
  h=message-id:date:mime-version:subject:from:to:cc:
   references:in-reply-to:content-transfer-encoding;
  bh=+h2o9sQJhnidwspeNEKO1VQFxB4ApGc5QKuUeb/16K8=;
  b=F9k6QHcxbKsBywZ4wIPun+y3ScRKGMKMA8T36Smw/MEIFneliz1aO4Nr
   +TxHVHrEW7b2TBCVI+aAKdJ14uwDf50dcnd3AcO5ZxIJ40x8/9ULdrttX
   ywI9G+ErnuzLeHFLbitxxZgxVkBGKdmHCa6PWSyR6ZDOtV13wy5LOctCy
   e1MNHw4lce5HTrS6x9qlB3cCdn1jMK7J6MXlKUZ247Lx9Z6/FDCbQfD9C
   gnWC9tH87oPD0Ql1j8I1vNttWi/o4UsEiQOByTg474d2vblo7bdFEiXSg
   YjR7JeoMFjycRrxnAVM3L/WtDKUrGQ95wN3GXfb66XdfZyCccxXj8v+hV
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="333472421"
X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; 
   d="scan'208";a="333472421"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 19:54:10 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="793568864"
X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; 
   d="scan'208";a="793568864"
Received: from zhulipeng-win.ccr.corp.intel.com (HELO [10.254.214.96]) ([10.254.214.96])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 19:53:55 -0700
Message-ID: <9e30db8a-2a6f-89d0-84fb-2f549f61954c@intel.com>
Date: Tue, 23 May 2023 10:53:52 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Firefox/102.0 Thunderbird/102.11.0
Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock
From: "Zhu, Lipeng" <lipeng.zhu@intel.com>
To: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com,
 tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com
References: <81c359ae-ab69-7f03-f113-4b865441de44@intel.com>
 <dbaa667c-c0f0-ab53-ca27-a4239b55907c@intel.com>
Content-Language: en-US
In-Reply-To: <dbaa667c-c0f0-ab53-ca27-a4239b55907c@intel.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <fortran.gcc.gnu.org>


On 5/16/2023 3:08 PM, Zhu, Lipeng wrote:
> 
> 
> On 5/9/2023 10:32 AM, Zhu, Lipeng wrote:
>>
>>
>> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
>>> On Mon,  8 May 2023 17:44:43 +0800
>>> Lipeng Zhu <lipeng.zhu@intel.com> wrote:
>>>
>>>> This patch try to introduce the rwlock and split the read/write to
>>>> unit_root tree and unit_cache with rwlock instead of the mutex to
>>>> increase CPU efficiency. In the get_gfc_unit function, the percentage
>>>> to step into the insert_unit function is around 30%, in most
>>>> instances, we can get the unit in the phase of reading the unit_cache
>>>> or unit_root tree. So split the read/write phase by rwlock would be an
>>>> approach to make it more parallel.
>>>>
>>>> BTW, the IPC metrics can gain around 9x in our test server with 220
>>>> cores. The benchmark we used is https://github.com/rwesson/NEAT
>>>
>>> See commentary typos below.
>>> You did not state if you regression tested the patch?
>> I use valgrind --tool=helgrind or --tool=drd to test 'make 
>> check-fortran'. Is it necessary to add an additional unit test for 
>> this patch?
>>
>>> Other than that it LGTM but i cannot approve it.
>> Thank you for your kind help for this patch, is there anything that I 
>> can do or can you help to push this patch forward?
>>
> Hi Bernhard,
> 
> Is there any other refinement that need I to do for this patch?
> 
> Thanks.
> 
> 

May I know any comment or concern on this patch, thanks for your time :)

>>>
>>>> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index
>>>> ad226c8e856..0033cc74252 100644
>>>> --- a/libgfortran/io/async.h
>>>> +++ b/libgfortran/io/async.h
>>>> @@ -210,6 +210,128 @@
>>>>       DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s 
>>>> %78p\n", aio_prefix, #mutex,
>>> mutex); \
>> Thanks, corrected in Patch v5.
>>
>>>>     } while (0)
>>>> +#ifdef __GTHREAD_RWLOCK_INIT
>>>> +#define RWLOCK_DEBUG_ADD(rwlock) do {        \
>>>> +    aio_rwlock_debug *n;                \
>>>> +    n = xmalloc (sizeof(aio_rwlock_debug));    \
>>>
>>> Missing space before the open brace: sizeof (
>>>
>> Thanks, corrected in Patch v5.
>>
>>>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
>>>> 82664dc5f98..62f1db21d34 100644
>>>> --- a/libgfortran/io/unit.c
>>>> +++ b/libgfortran/io/unit.c
>>>> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME
>>>> respectively.  If not, see
>>>>   /* IO locking rules:
>>>> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and 
>>>> UNIT_CACHE.
>>>> +   UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and 
>>>> UNIT_CACHE.
>>>> +   And use the rwlock to spilt read and write phase to UNIT_ROOT tree
>>>> +   and UNIT_CACHE to increase CPU efficiency.
>>>
>>> s/spilt/split. Maybe:
>>>
>>> Using an rwlock improves efficiency by allowing us to separate 
>>> readers and writers of both UNIT_ROOT
>>> and UNIT_CACHE.
>>>
>> Thanks, corrected in Patch v5.
>>
>>>> @@ -350,6 +356,17 @@ retry:
>>>>         if (c == 0)
>>>>       break;
>>>>       }
>>>> +  /* We did not find a unit in the cache nor in the unit list, 
>>>> create a new
>>>> +    (locked) unit and insert into the unit list and cache.
>>>> +    Manipulating either or both the unit list and the unit cache 
>>>> requires to
>>>> +    hold a write-lock [for obvious reasons]:
>>>> +    1. By separating the read/write lock, it will greatly reduce 
>>>> the contention
>>>> +       at the read part, while write part is not always necessary 
>>>> or most
>>>> +       unlikely once the unit hit in cache.
>>>
>>> +    By separating the read/write lock, we will greatly reduce the 
>>> contention
>>> +    on the read part, while the write part is unlikely once the unit 
>>> hits
>>> +    the cache.
>>>
>>>> +    2. We try to balance the implementation complexity and the 
>>>> performance
>>>> +       gains that fit into current cases we observed by just using a
>>>> +       pthread_rwlock. */
>>>
>>> Let's drop 2.
>>
>> Got it, thanks!
>>> thanks,