From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=2RbA=A5=gmail.com=rep.dot.nop@sourceware.org>
Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534])
	by sourceware.org (Postfix) with ESMTPS id 1D0933858D32;
	Mon,  8 May 2023 10:28:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D0933858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-50bc34b98edso6690200a12.3;
        Mon, 08 May 2023 03:28:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1683541733; x=1686133733;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:subject:cc:to:from:date:from:to:cc:subject:date
         :message-id:reply-to;
        bh=XTFyyB6Se4hoDW06YSK2aHRYgAwiFBURb9zhaT1C1QE=;
        b=YQnJl1sbw/JNAdEAtOKhEwSl/I36FgR3rhjYYfjQycAw3pGR5jxquMLOM3dHT0VZCz
         DmMuIT028G7uYqhbJYMZdWH7163heVBV2T8t8mzZEdo1PnFDEHveugSp+5uW5kGqrh/Y
         sPxcoi9VbAnh4ApBe2Kg2b+CZxQxT+NELlI+Ntf2JXDZFcGTfK+ZJlAK+HEB9wO6owPE
         eIt4cMQA+snHrnrbN1ACK6YKfhntGUGhZavZgI8IEeGrhB1zM083wAe1g9xpMdK3UMcX
         aPP5zLH582ozcTMaQZKzL3BdTAaARvvdQnZRZERxqNpZ5YBSJOeqqwfFBpyFru+5NCL+
         hwrw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1683541733; x=1686133733;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=XTFyyB6Se4hoDW06YSK2aHRYgAwiFBURb9zhaT1C1QE=;
        b=SMNfEakjcmQ9ldpcjlIak0seyojBzEHE4a1oa6UyCOskE7vGyyoVO6coHE+GbV8uKF
         ryAxHizs+hvFNjCJPhVJM79QVepE3+ntSBCijby9VZuC7TuQyg4LwE6oHwAoIxFewkE5
         CpVA3nsAzMU05JL7aDJWv4Nl1kZaZiuGSIFKcM6K9TrLq9kgdud+8cC63uHiJt0Jo5vu
         ymUiWyq7AQVpD5VlOlAGN363S7mozzRSGltw481a4ss5K3yhjs9f1vDjVhIII4+4f8g9
         aapz+ZO9K37a9nJcuSE2esxwayUu2ekfOhEHE/Y5qi2aHABkd2BvJXmvPWG0uUBCbLCT
         N9+Q==
X-Gm-Message-State: AC+VfDyHMV+35vRL0DMyPrmiJcOg4/Rub/ZpolGiJkmKmEl8KKtEmiZU
	kKUyE0mxOv+wVbjaWQ0TdWc=
X-Google-Smtp-Source: ACHHUZ7feKb6eqPFI+xwac1OixD/f71yF4mkdxhY9vUz8XfKEBsf8FknvvKamTFER5hKziLgqwttRA==
X-Received: by 2002:a50:eb07:0:b0:50b:c48d:5d5b with SMTP id y7-20020a50eb07000000b0050bc48d5d5bmr7479493edp.24.1683541732670;
        Mon, 08 May 2023 03:28:52 -0700 (PDT)
Received: from nbbrfq (80-110-214-113.static.upcbusiness.at. [80.110.214.113])
        by smtp.gmail.com with ESMTPSA id s24-20020a056402165800b0050be1c28a0fsm6095039edx.7.2023.05.08.03.28.51
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 08 May 2023 03:28:52 -0700 (PDT)
Date: Mon, 8 May 2023 12:28:48 +0200
From: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
To: Lipeng Zhu <lipeng.zhu@intel.com>
Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com,
 tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com
Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock
Message-ID: <20230508122848.1dbf109a@nbbrfq>
In-Reply-To: <20230508094442.1413139-1-lipeng.zhu@intel.com>
References: <20230424214534.77117b73 () nbbrfq>
	<20230508094442.1413139-1-lipeng.zhu@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <fortran.gcc.gnu.org>

On Mon,  8 May 2023 17:44:43 +0800
Lipeng Zhu <lipeng.zhu@intel.com> wrote:

> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT

See commentary typos below.
You did not state if you regression tested the patch?
Other than that it LGTM but i cannot approve it.

> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h
> index ad226c8e856..0033cc74252 100644
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -210,6 +210,128 @@
>      DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
>    } while (0)
>  
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RWLOCK_DEBUG_ADD(rwlock) do {		\
> +    aio_rwlock_debug *n;				\
> +    n = xmalloc (sizeof(aio_rwlock_debug));	\

Missing space before the open brace: sizeof (

> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
> index 82664dc5f98..62f1db21d34 100644
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   And use the rwlock to spilt read and write phase to UNIT_ROOT tree
> +   and UNIT_CACHE to increase CPU efficiency.

s/spilt/split. Maybe:

Using an rwlock improves efficiency by allowing us to separate readers
and writers of both UNIT_ROOT and UNIT_CACHE.

> @@ -350,6 +356,17 @@ retry:
>        if (c == 0)
>  	break;
>      }
> +  /* We did not find a unit in the cache nor in the unit list, create a new
> +    (locked) unit and insert into the unit list and cache.
> +    Manipulating either or both the unit list and the unit cache requires to
> +    hold a write-lock [for obvious reasons]:
> +    1. By separating the read/write lock, it will greatly reduce the contention
> +       at the read part, while write part is not always necessary or most
> +       unlikely once the unit hit in cache.

+    By separating the read/write lock, we will greatly reduce the contention
+    on the read part, while the write part is unlikely once the unit hits
+    the cache.

> +    2. We try to balance the implementation complexity and the performance
> +       gains that fit into current cases we observed by just using a
> +       pthread_rwlock. */

Let's drop 2.
thanks,