From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by sourceware.org (Postfix) with ESMTPS id 1D0933858D32; Mon, 8 May 2023 10:28:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D0933858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-50bc34b98edso6690200a12.3; Mon, 08 May 2023 03:28:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683541733; x=1686133733; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=XTFyyB6Se4hoDW06YSK2aHRYgAwiFBURb9zhaT1C1QE=; b=YQnJl1sbw/JNAdEAtOKhEwSl/I36FgR3rhjYYfjQycAw3pGR5jxquMLOM3dHT0VZCz DmMuIT028G7uYqhbJYMZdWH7163heVBV2T8t8mzZEdo1PnFDEHveugSp+5uW5kGqrh/Y sPxcoi9VbAnh4ApBe2Kg2b+CZxQxT+NELlI+Ntf2JXDZFcGTfK+ZJlAK+HEB9wO6owPE eIt4cMQA+snHrnrbN1ACK6YKfhntGUGhZavZgI8IEeGrhB1zM083wAe1g9xpMdK3UMcX aPP5zLH582ozcTMaQZKzL3BdTAaARvvdQnZRZERxqNpZ5YBSJOeqqwfFBpyFru+5NCL+ hwrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683541733; x=1686133733; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XTFyyB6Se4hoDW06YSK2aHRYgAwiFBURb9zhaT1C1QE=; b=SMNfEakjcmQ9ldpcjlIak0seyojBzEHE4a1oa6UyCOskE7vGyyoVO6coHE+GbV8uKF ryAxHizs+hvFNjCJPhVJM79QVepE3+ntSBCijby9VZuC7TuQyg4LwE6oHwAoIxFewkE5 CpVA3nsAzMU05JL7aDJWv4Nl1kZaZiuGSIFKcM6K9TrLq9kgdud+8cC63uHiJt0Jo5vu ymUiWyq7AQVpD5VlOlAGN363S7mozzRSGltw481a4ss5K3yhjs9f1vDjVhIII4+4f8g9 aapz+ZO9K37a9nJcuSE2esxwayUu2ekfOhEHE/Y5qi2aHABkd2BvJXmvPWG0uUBCbLCT N9+Q== X-Gm-Message-State: AC+VfDyHMV+35vRL0DMyPrmiJcOg4/Rub/ZpolGiJkmKmEl8KKtEmiZU kKUyE0mxOv+wVbjaWQ0TdWc= X-Google-Smtp-Source: ACHHUZ7feKb6eqPFI+xwac1OixD/f71yF4mkdxhY9vUz8XfKEBsf8FknvvKamTFER5hKziLgqwttRA== X-Received: by 2002:a50:eb07:0:b0:50b:c48d:5d5b with SMTP id y7-20020a50eb07000000b0050bc48d5d5bmr7479493edp.24.1683541732670; Mon, 08 May 2023 03:28:52 -0700 (PDT) Received: from nbbrfq (80-110-214-113.static.upcbusiness.at. [80.110.214.113]) by smtp.gmail.com with ESMTPSA id s24-20020a056402165800b0050be1c28a0fsm6095039edx.7.2023.05.08.03.28.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 May 2023 03:28:52 -0700 (PDT) Date: Mon, 8 May 2023 12:28:48 +0200 From: Bernhard Reutner-Fischer To: Lipeng Zhu Cc: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org, hongjiu.lu@intel.com, tianyou.li@intel.com, pan.deng@intel.com, wangyang.guo@intel.com Subject: Re: [PATCH v4] libgfortran: Replace mutex with rwlock Message-ID: <20230508122848.1dbf109a@nbbrfq> In-Reply-To: <20230508094442.1413139-1-lipeng.zhu@intel.com> References: <20230424214534.77117b73 () nbbrfq> <20230508094442.1413139-1-lipeng.zhu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 8 May 2023 17:44:43 +0800 Lipeng Zhu wrote: > This patch try to introduce the rwlock and split the read/write to > unit_root tree and unit_cache with rwlock instead of the mutex to > increase CPU efficiency. In the get_gfc_unit function, the percentage > to step into the insert_unit function is around 30%, in most instances, > we can get the unit in the phase of reading the unit_cache or unit_root > tree. So split the read/write phase by rwlock would be an approach to > make it more parallel. > > BTW, the IPC metrics can gain around 9x in our test > server with 220 cores. The benchmark we used is > https://github.com/rwesson/NEAT See commentary typos below. You did not state if you regression tested the patch? Other than that it LGTM but i cannot approve it. > diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h > index ad226c8e856..0033cc74252 100644 > --- a/libgfortran/io/async.h > +++ b/libgfortran/io/async.h > @@ -210,6 +210,128 @@ > DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \ > } while (0) > > +#ifdef __GTHREAD_RWLOCK_INIT > +#define RWLOCK_DEBUG_ADD(rwlock) do { \ > + aio_rwlock_debug *n; \ > + n = xmalloc (sizeof(aio_rwlock_debug)); \ Missing space before the open brace: sizeof ( > diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c > index 82664dc5f98..62f1db21d34 100644 > --- a/libgfortran/io/unit.c > +++ b/libgfortran/io/unit.c > @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > > > /* IO locking rules: > - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + And use the rwlock to spilt read and write phase to UNIT_ROOT tree > + and UNIT_CACHE to increase CPU efficiency. s/spilt/split. Maybe: Using an rwlock improves efficiency by allowing us to separate readers and writers of both UNIT_ROOT and UNIT_CACHE. > @@ -350,6 +356,17 @@ retry: > if (c == 0) > break; > } > + /* We did not find a unit in the cache nor in the unit list, create a new > + (locked) unit and insert into the unit list and cache. > + Manipulating either or both the unit list and the unit cache requires to > + hold a write-lock [for obvious reasons]: > + 1. By separating the read/write lock, it will greatly reduce the contention > + at the read part, while write part is not always necessary or most > + unlikely once the unit hit in cache. + By separating the read/write lock, we will greatly reduce the contention + on the read part, while the write part is unlikely once the unit hits + the cache. > + 2. We try to balance the implementation complexity and the performance > + gains that fit into current cases we observed by just using a > + pthread_rwlock. */ Let's drop 2. thanks,