From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id B8B373858D1E; Tue, 23 May 2023 02:54:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B8B373858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684810451; x=1716346451; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=+h2o9sQJhnidwspeNEKO1VQFxB4ApGc5QKuUeb/16K8=; b=F9k6QHcxbKsBywZ4wIPun+y3ScRKGMKMA8T36Smw/MEIFneliz1aO4Nr +TxHVHrEW7b2TBCVI+aAKdJ14uwDf50dcnd3AcO5ZxIJ40x8/9ULdrttX ywI9G+ErnuzLeHFLbitxxZgxVkBGKdmHCa6PWSyR6ZDOtV13wy5LOctCy e1MNHw4lce5HTrS6x9qlB3cCdn1jMK7J6MXlKUZ247Lx9Z6/FDCbQfD9C gnWC9tH87oPD0Ql1j8I1vNttWi/o4UsEiQOByTg474d2vblo7bdFEiXSg YjR7JeoMFjycRrxnAVM3L/WtDKUrGQ95wN3GXfb66XdfZyCccxXj8v+hV w==; X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="333472421" X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; d="scan'208";a="333472421" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 19:54:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="793568864" X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; d="scan'208";a="793568864" Received: from zhulipeng-win.ccr.corp.intel.com (HELO [10.254.214.96]) ([10.254.214.96]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 19:53:55 -0700 Message-ID: <9e30db8a-2a6f-89d0-84fb-2f549f61954c@intel.com> Date: Tue, 23 May 2023 10:53:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.11.0 Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock From: "Zhu, Lipeng" To: Bernhard Reutner-Fischer Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com References: <81c359ae-ab69-7f03-f113-4b865441de44@intel.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/16/2023 3:08 PM, Zhu, Lipeng wrote: > > > On 5/9/2023 10:32 AM, Zhu, Lipeng wrote: >> >> >> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: >>> On Mon,  8 May 2023 17:44:43 +0800 >>> Lipeng Zhu wrote: >>> >>>> This patch try to introduce the rwlock and split the read/write to >>>> unit_root tree and unit_cache with rwlock instead of the mutex to >>>> increase CPU efficiency. In the get_gfc_unit function, the percentage >>>> to step into the insert_unit function is around 30%, in most >>>> instances, we can get the unit in the phase of reading the unit_cache >>>> or unit_root tree. So split the read/write phase by rwlock would be an >>>> approach to make it more parallel. >>>> >>>> BTW, the IPC metrics can gain around 9x in our test server with 220 >>>> cores. The benchmark we used is https://github.com/rwesson/NEAT >>> >>> See commentary typos below. >>> You did not state if you regression tested the patch? >> I use valgrind --tool=helgrind or --tool=drd to test 'make >> check-fortran'. Is it necessary to add an additional unit test for >> this patch? >> >>> Other than that it LGTM but i cannot approve it. >> Thank you for your kind help for this patch, is there anything that I >> can do or can you help to push this patch forward? >> > Hi Bernhard, > > Is there any other refinement that need I to do for this patch? > > Thanks. > > May I know any comment or concern on this patch, thanks for your time :) >>> >>>> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index >>>> ad226c8e856..0033cc74252 100644 >>>> --- a/libgfortran/io/async.h >>>> +++ b/libgfortran/io/async.h >>>> @@ -210,6 +210,128 @@ >>>>       DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s >>>> %78p\n", aio_prefix, #mutex, >>> mutex); \ >> Thanks, corrected in Patch v5. >> >>>>     } while (0) >>>> +#ifdef __GTHREAD_RWLOCK_INIT >>>> +#define RWLOCK_DEBUG_ADD(rwlock) do {        \ >>>> +    aio_rwlock_debug *n;                \ >>>> +    n = xmalloc (sizeof(aio_rwlock_debug));    \ >>> >>> Missing space before the open brace: sizeof ( >>> >> Thanks, corrected in Patch v5. >> >>>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >>>> 82664dc5f98..62f1db21d34 100644 >>>> --- a/libgfortran/io/unit.c >>>> +++ b/libgfortran/io/unit.c >>>> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME >>>> respectively.  If not, see >>>>   /* IO locking rules: >>>> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and >>>> UNIT_CACHE. >>>> +   UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and >>>> UNIT_CACHE. >>>> +   And use the rwlock to spilt read and write phase to UNIT_ROOT tree >>>> +   and UNIT_CACHE to increase CPU efficiency. >>> >>> s/spilt/split. Maybe: >>> >>> Using an rwlock improves efficiency by allowing us to separate >>> readers and writers of both UNIT_ROOT >>> and UNIT_CACHE. >>> >> Thanks, corrected in Patch v5. >> >>>> @@ -350,6 +356,17 @@ retry: >>>>         if (c == 0) >>>>       break; >>>>       } >>>> +  /* We did not find a unit in the cache nor in the unit list, >>>> create a new >>>> +    (locked) unit and insert into the unit list and cache. >>>> +    Manipulating either or both the unit list and the unit cache >>>> requires to >>>> +    hold a write-lock [for obvious reasons]: >>>> +    1. By separating the read/write lock, it will greatly reduce >>>> the contention >>>> +       at the read part, while write part is not always necessary >>>> or most >>>> +       unlikely once the unit hit in cache. >>> >>> +    By separating the read/write lock, we will greatly reduce the >>> contention >>> +    on the read part, while the write part is unlikely once the unit >>> hits >>> +    the cache. >>> >>>> +    2. We try to balance the implementation complexity and the >>>> performance >>>> +       gains that fit into current cases we observed by just using a >>>> +       pthread_rwlock. */ >>> >>> Let's drop 2. >> >> Got it, thanks! >>> thanks,