From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 880703857722; Tue, 16 May 2023 07:08:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 880703857722 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684220915; x=1715756915; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=VHXREn9e+/Cir3+y5xsj5nIS9y2URFgvIkXEav+t3e0=; b=aQfZOyBUAAwzkrLxnY/q57q1RjIeRAn3CpBn1cYEJR0Kh0IiSOWK6Lgp RlFu7tMd0rzSWiOgBDqaGQqrJNHVXCq4hGGJ+VXVcsM43hY/+yjcD4/GJ XdkENDdKRDWV6hrN0WYudOLHm+QZuIVwXeaDMaLL8wY3ou+ime36f1x9g YOItK18CRsdUbga+npO1z/2xIqr+a2SZgFY7IJMbDjD+RauquBSHe4f68 x5j5lWynY+Cmy3Wjvnc4mFTtgYvxCFhbNegMkTluhjx3/M0zfy+2EFO1U nAdz3KQqZmsIFsvaAq4Vl3HG8+DUiNh19dtzS7hAFGTWzZemdVDP4zK61 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="331020421" X-IronPort-AV: E=Sophos;i="5.99,278,1677571200"; d="scan'208";a="331020421" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2023 00:08:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="651720399" X-IronPort-AV: E=Sophos;i="5.99,278,1677571200"; d="scan'208";a="651720399" Received: from zhulipeng-win.ccr.corp.intel.com (HELO [10.238.2.93]) ([10.238.2.93]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2023 00:08:25 -0700 Message-ID: Date: Tue, 16 May 2023 15:08:23 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.10.0 Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock From: "Zhu, Lipeng" To: Bernhard Reutner-Fischer Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com References: <81c359ae-ab69-7f03-f113-4b865441de44@intel.com> Content-Language: en-US In-Reply-To: <81c359ae-ab69-7f03-f113-4b865441de44@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/9/2023 10:32 AM, Zhu, Lipeng wrote: > > > On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote: >> On Mon,  8 May 2023 17:44:43 +0800 >> Lipeng Zhu wrote: >> >>> This patch try to introduce the rwlock and split the read/write to >>> unit_root tree and unit_cache with rwlock instead of the mutex to >>> increase CPU efficiency. In the get_gfc_unit function, the percentage >>> to step into the insert_unit function is around 30%, in most >>> instances, we can get the unit in the phase of reading the unit_cache >>> or unit_root tree. So split the read/write phase by rwlock would be an >>> approach to make it more parallel. >>> >>> BTW, the IPC metrics can gain around 9x in our test server with 220 >>> cores. The benchmark we used is https://github.com/rwesson/NEAT >> >> See commentary typos below. >> You did not state if you regression tested the patch? > I use valgrind --tool=helgrind or --tool=drd to test 'make > check-fortran'. Is it necessary to add an additional unit test for this > patch? > >> Other than that it LGTM but i cannot approve it. > Thank you for your kind help for this patch, is there anything that I > can do or can you help to push this patch forward? > Hi Bernhard, Is there any other refinement that need I to do for this patch? Thanks. >> >>> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index >>> ad226c8e856..0033cc74252 100644 >>> --- a/libgfortran/io/async.h >>> +++ b/libgfortran/io/async.h >>> @@ -210,6 +210,128 @@ >>>       DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", >>> aio_prefix, #mutex, >> mutex); \ > Thanks, corrected in Patch v5. > >>>     } while (0) >>> +#ifdef __GTHREAD_RWLOCK_INIT >>> +#define RWLOCK_DEBUG_ADD(rwlock) do {        \ >>> +    aio_rwlock_debug *n;                \ >>> +    n = xmalloc (sizeof(aio_rwlock_debug));    \ >> >> Missing space before the open brace: sizeof ( >> > Thanks, corrected in Patch v5. > >>> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index >>> 82664dc5f98..62f1db21d34 100644 >>> --- a/libgfortran/io/unit.c >>> +++ b/libgfortran/io/unit.c >>> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME >>> respectively.  If not, see >>>   /* IO locking rules: >>> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and >>> UNIT_CACHE. >>> +   UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and >>> UNIT_CACHE. >>> +   And use the rwlock to spilt read and write phase to UNIT_ROOT tree >>> +   and UNIT_CACHE to increase CPU efficiency. >> >> s/spilt/split. Maybe: >> >> Using an rwlock improves efficiency by allowing us to separate readers >> and writers of both UNIT_ROOT >> and UNIT_CACHE. >> > Thanks, corrected in Patch v5. > >>> @@ -350,6 +356,17 @@ retry: >>>         if (c == 0) >>>       break; >>>       } >>> +  /* We did not find a unit in the cache nor in the unit list, >>> create a new >>> +    (locked) unit and insert into the unit list and cache. >>> +    Manipulating either or both the unit list and the unit cache >>> requires to >>> +    hold a write-lock [for obvious reasons]: >>> +    1. By separating the read/write lock, it will greatly reduce the >>> contention >>> +       at the read part, while write part is not always necessary or >>> most >>> +       unlikely once the unit hit in cache. >> >> +    By separating the read/write lock, we will greatly reduce the >> contention >> +    on the read part, while the write part is unlikely once the unit >> hits >> +    the cache. >> >>> +    2. We try to balance the implementation complexity and the >>> performance >>> +       gains that fit into current cases we observed by just using a >>> +       pthread_rwlock. */ >> >> Let's drop 2. > > Got it, thanks! >> thanks,