From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 92CA33858D35; Tue, 9 May 2023 02:32:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 92CA33858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683599550; x=1715135550; h=message-id:date:mime-version:subject:to:cc:from: content-transfer-encoding; bh=QbEp87na7MeY2wvY4EDLQf5Cf8xuqVsPKAbWmHJiNrY=; b=ENYBFVWPPDrX5EOgrk9IN+ddCsdB1Bgtahs8RZkEw58zuLxC91HQckr2 mtUVLvlOfvMiX3kOTyPBU54pjhS0a9mcY4rm78sXDGdYQFgeWhHpF0M3d g7tVIZ8rfw1shbNAI0HAa1BNW1+SWEj7Dc2BTl6vKpwtzRS+PCC05pW91 apiYRDrLlW050t2gOWQRYTKPzf+wUnTyhj3KqOhHB0boVC21tFyZEt+cc 64VA398MTB9GytUVSWSXg/05swrrdFznkGWbshF99XiBpuBqjrGOz3mtG WtAuipd0rwoBEjyZTayvdZxN4IqOxTJCm82sdKi2iJzkllmjrFl1WyDhq Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10704"; a="415359381" X-IronPort-AV: E=Sophos;i="5.99,259,1677571200"; d="scan'208";a="415359381" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 May 2023 19:32:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10704"; a="768281643" X-IronPort-AV: E=Sophos;i="5.99,259,1677571200"; d="scan'208";a="768281643" Received: from zhulipeng-win.ccr.corp.intel.com (HELO [10.238.2.93]) ([10.238.2.93]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 May 2023 19:32:27 -0700 Message-ID: <81c359ae-ab69-7f03-f113-4b865441de44@intel.com> Date: Tue, 9 May 2023 10:32:25 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.10.0 Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock Content-Language: en-US To: Bernhard Reutner-Fischer Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com From: "Zhu, Lipeng" Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: > On Mon, 8 May 2023 17:44:43 +0800 > Lipeng Zhu wrote: > >> This patch try to introduce the rwlock and split the read/write to >> unit_root tree and unit_cache with rwlock instead of the mutex to >> increase CPU efficiency. In the get_gfc_unit function, the percentage >> to step into the insert_unit function is around 30%, in most >> instances, we can get the unit in the phase of reading the unit_cache >> or unit_root tree. So split the read/write phase by rwlock would be an >> approach to make it more parallel. >> >> BTW, the IPC metrics can gain around 9x in our test server with 220 >> cores. The benchmark we used is https://github.com/rwesson/NEAT > > See commentary typos below. > You did not state if you regression tested the patch? I use valgrind --tool=helgrind or --tool=drd to test 'make check-fortran'. Is it necessary to add an additional unit test for this patch? > Other than that it LGTM but i cannot approve it. Thank you for your kind help for this patch, is there anything that I can do or can you help to push this patch forward? > >> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index >> ad226c8e856..0033cc74252 100644 >> --- a/libgfortran/io/async.h >> +++ b/libgfortran/io/async.h >> @@ -210,6 +210,128 @@ >> DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, > mutex); \ Thanks, corrected in Patch v5. >> } while (0) >> >> +#ifdef __GTHREAD_RWLOCK_INIT >> +#define RWLOCK_DEBUG_ADD(rwlock) do { \ >> + aio_rwlock_debug *n; \ >> + n = xmalloc (sizeof(aio_rwlock_debug)); \ > > Missing space before the open brace: sizeof ( > Thanks, corrected in Patch v5. >> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >> 82664dc5f98..62f1db21d34 100644 >> --- a/libgfortran/io/unit.c >> +++ b/libgfortran/io/unit.c >> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME >> respectively. If not, see >> >> >> /* IO locking rules: >> - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. >> + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. >> + And use the rwlock to spilt read and write phase to UNIT_ROOT tree >> + and UNIT_CACHE to increase CPU efficiency. > > s/spilt/split. Maybe: > > Using an rwlock improves efficiency by allowing us to separate readers and writers of both UNIT_ROOT > and UNIT_CACHE. > Thanks, corrected in Patch v5. >> @@ -350,6 +356,17 @@ retry: >> if (c == 0) >> break; >> } >> + /* We did not find a unit in the cache nor in the unit list, create a new >> + (locked) unit and insert into the unit list and cache. >> + Manipulating either or both the unit list and the unit cache requires to >> + hold a write-lock [for obvious reasons]: >> + 1. By separating the read/write lock, it will greatly reduce the contention >> + at the read part, while write part is not always necessary or most >> + unlikely once the unit hit in cache. > > + By separating the read/write lock, we will greatly reduce the contention > + on the read part, while the write part is unlikely once the unit hits > + the cache. > >> + 2. We try to balance the implementation complexity and the performance >> + gains that fit into current cases we observed by just using a >> + pthread_rwlock. */ > > Let's drop 2. Got it, thanks! > thanks,