From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=LCFj=AP=gmail.com=rep.dot.nop@sourceware.org>
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d])
	by sourceware.org (Postfix) with ESMTPS id 26FD13858D28;
	Mon, 24 Apr 2023 19:45:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26FD13858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-956ff2399c9so826639266b.3;
        Mon, 24 Apr 2023 12:45:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1682365540; x=1684957540;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:subject:cc:to:from:date:from:to:cc:subject:date
         :message-id:reply-to;
        bh=7X1pkxbypEuaBjjquEhv14QklaXJzO18bzg+Zwl6718=;
        b=eppG6Dn8NeUqs2w5bADEL5+aGvXC9YyuaOkacZja9SO/NStrXeL/OI/NdkEECDN0mE
         DNgV+UcEyu9cauoXwyfXtmqe3eF+ZdoB/MBuS9hEEBL+/fAtfEYGZ7nFsJWNlhX7NhUx
         v6wwWY8vDW4SBtZv8+v14zgu6joNClX1q7TN4kUrj7xAny62ltIQ3/1k9VzN8y277r9i
         CB1H3Cc122BLajPk8Lg+PrK/ngph+MoZx+LQMgOoGvYE0uLR98Ji8CR5wPilEyyfcNQq
         JuVwYalpcZl85r1lTEiXc4uYTcQNkEomeXa5J7fKWpW/jpHdVPlo/hoYW/I+29EWiwzG
         HmUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1682365540; x=1684957540;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=7X1pkxbypEuaBjjquEhv14QklaXJzO18bzg+Zwl6718=;
        b=cMIebrDTYcbqK1uykN8xFUyrefL3Nz0Ld9s2cRGrRuv1CGu8e4jM0OPGfEWXuoXoPX
         O2K+RdRM5SZDUBLjlsTF10aEuItzEEhukKXAVjSxoEhLkJc2yyBo1BprhJ1D/Kt3ond2
         BQJPYa0mrp8PEzaXeBUhuybdu5VFciSxbakf7wG6WAFvT4DjjXbNVwCz2Jwtz8gzdivh
         JmLhzO2ukcFZ0Tb6FWHcBJt2iP5nFjkFkLaQZGYwMWcaQgeEda1s/BEy+ZLRyWGfYo2l
         p2eLfZnHTh/EeA2zBY8cTKUictGiEjKWvxJZR7SPbR4/IaEwR2ExdlNhZWWmfZEjH628
         vUAA==
X-Gm-Message-State: AAQBX9cLBCw+HRnAEhydFgob/AOps1O7KTrwWP27jizoVDKO6I8hh2Wh
	CLmv45ZGrigR2D1cuz+GoZ8=
X-Google-Smtp-Source: AKy350bwoVbwXtItXZMfDTN1H4xeULkooslwJv+jbAGJ2xUFVQX97HDfDIjgoznZnTSMFf/jN/YItg==
X-Received: by 2002:a17:907:8d89:b0:958:4c5d:7dd2 with SMTP id tf9-20020a1709078d8900b009584c5d7dd2mr5591880ejc.72.1682365539619;
        Mon, 24 Apr 2023 12:45:39 -0700 (PDT)
Received: from nbbrfq (80-110-214-113.static.upcbusiness.at. [80.110.214.113])
        by smtp.gmail.com with ESMTPSA id oz5-20020a170906cd0500b0094f58a85bc5sm5785730ejb.180.2023.04.24.12.45.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 24 Apr 2023 12:45:39 -0700 (PDT)
Date: Mon, 24 Apr 2023 21:45:34 +0200
From: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
To: "Zhu, Lipeng" <lipeng.zhu@intel.com>
Cc: rep.dot.nop@gmail.com, tkoenig@netcologne.de, fortran@gcc.gnu.org,
 gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, tianyou.li@intel.com,
 pan.deng@intel.com, wangyang.guo@intel.com
Subject: Re: [PATCH v3] libgfortran: Replace mutex with rwlock
Message-ID: <20230424214534.77117b73@nbbrfq>
In-Reply-To: <d232307c-d3d8-b0e2-f86b-8d48d3d6ed7d@intel.com>
References: <d232307c-d3d8-b0e2-f86b-8d48d3d6ed7d@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <fortran.gcc.gnu.org>

Hi!

[please do not top-post]

On Thu, 20 Apr 2023 21:13:08 +0800
"Zhu, Lipeng" <lipeng.zhu@intel.com> wrote:

> Hi Bernhard,
> 
> Thanks for your questions and suggestions.
> The rwlock could allow multiple threads to have concurrent read-only 
> access to the cache/unit list, only a single writer is allowed.

right.

> Write lock will not be acquired until all read lock are released.

So i must have confused rwlock with something else, something that
allows self to hold a read-lock and upgrade that to a write-lock,
purposely starving all successive incoming readers. I.e. just toggle
your RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in
some situations in the past. Doesn't seem to work with your rwlock,
does it

> And I didn't change the mutex scope when refactor the code, only make a 
> more fine-grained distinction for the read/write cache/unit list.

Yes of course, i can see you did that.

> I complete the comment according to your template, I will insert the 
> comment in the source code in next version patch with other refinement 
> by your suggestions.
> "
> Now we did not get a unit in cache and unit list, so we need to create a
> new unit, and update it to cache and unit list.

s/Now/By now/ or s/Now w/W/ and s/get/find/
"
We did not find a unit in the cache nor in the unit list, create a new
(locked) unit and insert into the unit list and cache.
Manipulating either or both the unit list and the unit cache requires to
hold a write-lock [for obvious reasons]"

Superfluous when talking about pthread_rwlock_wrlock since that
implies that even the process acquiring the wrlock has to first
release it's very own rdlock.

> Prior to update the cache and list, we need to release all read locks,
> and then immediately to acquire write lock, thus ensure the exclusive
> update to the cache and unit list.
> Either way, we will manipulate the cache and/or the unit list so we must
> take a write lock now.
> We don't take the write bit in *addition* to the read lock because:
> 1. It will needlessly complicate releasing the respective lock;

Under pthread_rwlock_wrlock it will deadlock, so that's wrong?
Drop that point then? If not, what's your reasoning / observation?

Under my lock, you hold the R, additionally take the W and then
immediately release the R because you yourself won't read, just write.
But mine's not the pthread_rwlock you talk about, admittedly.

> 2. By separate the read/write lock, it will greatly reduce the
> contention at the read part, while write part is not always necessary or
> most unlikely once the unit hit in cache;

We know that.

> 3. We try to balance the implementation complexity and the performance
> gains that fit into current cases we observed.

.. by just using a pthread_rwlock. And that's the top point iff you
keep it at that. That's a fair step, sure. BTW, did you look at the
RELEASE semantics, respectively the note that one day (and now is that
very day), we might improve on the release semantics? Can of course be
incremental AFAIC

> "

If folks agree on this first step then you have my OK with a catchy
malloc and the discussion recorded here on the list. A second step would
be RELEASE.
And, depending on the underlying capabilities of available locks,
further tweaks, obviously.

PS: and, please, don't top-post

thanks,

> 
> Best Regards,
> Zhu, Lipeng
> 
> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
> > On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> wrote:  
> >> This patch try to introduce the rwlock and split the read/write to
> >> unit_root tree and unit_cache with rwlock instead of the mutex to
> >> increase CPU efficiency. In the get_gfc_unit function, the percentage
> >> to step into the insert_unit function is around 30%, in most instances,
> >> we can get the unit in the phase of reading the unit_cache or unit_root
> >> tree. So split the read/write phase by rwlock would be an approach to
> >> make it more parallel.
> >>
> >> BTW, the IPC metrics can gain around 9x in our test server with 220
> >> cores. The benchmark we used is https://github.com/rwesson/NEAT
> >>  
> >   
> >> +#define RD_TO_WRLOCK(rwlock) \
> >> +  RWUNLOCK (rwlock);\
> >> +  WRLOCK (rwlock);
> >> +#endif
> >> +  
> > 
> >   
> >> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
> >> 82664dc5f98..4312c5f36de 100644
> >> --- a/libgfortran/io/unit.c
> >> +++ b/libgfortran/io/unit.c  
> >   
> >> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create)
> >>    int c, created = 0;
> >>
> >>    NOTE ("Unit n=%d, do_create = %d", n, do_create);
> >> -  LOCK (&unit_lock);
> >> +  RDLOCK (&unit_rwlock);
> >>
> >> retry:
> >>    for (c = 0; c < CACHE_SIZE; c++)
> >> @@ -350,6 +356,7 @@ retry:
> >>        if (c == 0)
> >> 	break;
> >>      }
> >> +  RD_TO_WRLOCK (&unit_rwlock);  
> > 
> > So I'm trying to convince myself why it's safe to unlock and only then take the write lock.
> > 
> > Can you please elaborate/confirm why that's ok?
> > 
> > I wouldn't mind a comment like
> > We can release the unit and cache read lock now. We might have to allocate a (locked) unit, below in
> > do_create.
> > Either way, we will manipulate the cache and/or the unit list so we have to take a write lock now.
> > 
> > We don't take the write bit in *addition* to the read lock because..
> > 
> > (that needlessly complicates releasing the respective locks / it triggers too much contention when we..
> > / ...?)
> > 
> > thanks,
> >   
> >>
> >>    if (p == NULL && do_create)
> >>      {
> >> @@ -368,8 +375,8 @@ retry:
> >>    if (created)
> >>      {
> >>        /* Newly created units have their lock held already
> >> -	 from insert_unit.  Just unlock UNIT_LOCK and return.  */
> >> -      UNLOCK (&unit_lock);
> >> +	 from insert_unit.  Just unlock UNIT_RWLOCK and return.  */
> >> +      RWUNLOCK (&unit_rwlock);
> >>        return p;
> >>      }
> >>
> >> @@ -380,7 +387,7 @@ found:
> >>        if (! TRYLOCK (&p->lock))
> >> 	{
> >> 	  /* assert (p->closed == 0); */
> >> -	  UNLOCK (&unit_lock);
> >> +	  RWUNLOCK (&unit_rwlock);
> >> 	  return p;
> >> 	}
> >>
> >> @@ -388,14 +395,14 @@ found:
> >>      }
> >>
> >>
> >> -  UNLOCK (&unit_lock);
> >> +  RWUNLOCK (&unit_rwlock);
> >>
> >>    if (p != NULL && (p->child_dtio == 0))
> >>      {
> >>        LOCK (&p->lock);
> >>        if (p->closed)
> >> 	{
> >> -	  LOCK (&unit_lock);
> >> +	  WRLOCK (&unit_rwlock);
> >> 	  UNLOCK (&p->lock);
> >> 	  if (predec_waiting_locked (p) == 0)
> >> 	    destroy_unit_mutex (p);
> >> @@ -593,8 +600,8 @@ init_units (void)
> >> #endif
> >> #endif
> >>
> >> -#ifndef __GTHREAD_MUTEX_INIT
> >> -  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
> >> +#if (!defined(__GTHREAD_RWLOCK_INIT) &&
> >> +!defined(__GTHREAD_MUTEX_INIT))
> >> +  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
> >> #endif
> >>
> >>    if (sizeof (max_offset) == 8)  
> >