From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hjl.tools@gmail.com>
Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com
 [IPv6:2607:f8b0:4864:20::102e])
 by sourceware.org (Postfix) with ESMTPS id 04BB93858D28
 for <libc-alpha@sourceware.org>; Wed, 27 Apr 2022 20:38:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 04BB93858D28
Received: by mail-pj1-x102e.google.com with SMTP id
 cu23-20020a17090afa9700b001d98d8e53b7so4668087pjb.0
 for <libc-alpha@sourceware.org>; Wed, 27 Apr 2022 13:38:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=pr1dFSs/I98CGKftQgSMTXWVw2NWi4z/qj6bD9eYuSM=;
 b=4+uk57VRXtEyk6klBuhhOgFfh5tvZf9zfotSaXBgslMyfvWhPX2+Ck5TRqZznPJMPy
 jA1kz5ETqJqNqJKTvw4kO0p+Z+FBQOgVPmlnBHWjyYU+zHMtPWzAnAtEwQB7IXJDOcgy
 r7/LOQFMRpIrng8l2OSV+Mn6B4x9g6FDLeopyX2SsHp650aOl8rPlStyJZPs2CgToCv0
 UGrLYh+t8st40MQ+BKPjkwMQgD/G63l3AzBBUAsEnjofwUyL1tgOf+b11hu7g8U0MtgH
 ehxVWOU53orYmYNUuF3wlGbBu0zY4cwMzpj+YFR+5gaqsYqUeV/irYD9sx8tl34hgkos
 p1Bw==
X-Gm-Message-State: AOAM530k5BiTbNPzwa5hKoI/URi+KiNquBidtJk3oGUBVkB3qHNJi4xV
 rOOAWf6U3P5XDg0SHptBNkFgtLpb/hjwVslEcoo=
X-Google-Smtp-Source: ABdhPJyFN5bgoOQCjm6uTUhSnEVvRICVqL1FgAqZYVpqE2EYRxh9dgtXOn7qHHu1HzvZDYt45UjvQSO/e7XXN5PUVHg=
X-Received: by 2002:a17:903:1108:b0:156:73a7:7c1 with SMTP id
 n8-20020a170903110800b0015673a707c1mr30029115plh.101.1651091919721; Wed, 27
 Apr 2022 13:38:39 -0700 (PDT)
MIME-Version: 1.0
References: <20220420054848.2774374-1-wangyang.guo@intel.com>
 <20220421032841.3004316-1-wangyang.guo@intel.com>
 <CAFUsyfLhsyA4Gtjt0xnEKbTm76odfDabKyPJK+NHUgEvVOp8mA@mail.gmail.com>
 <bbf1f488-1c90-3f72-d806-a1a96482e71d@intel.com>
 <CAFUsyfKrXH9rQFMCoLk-JxwPtLu-qMvCBUuPL1aStQXoAa0jSw@mail.gmail.com>
In-Reply-To: <CAFUsyfKrXH9rQFMCoLk-JxwPtLu-qMvCBUuPL1aStQXoAa0jSw@mail.gmail.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Wed, 27 Apr 2022 13:38:03 -0700
Message-ID: <CAMe9rOpS21OcXNLwEYE02-mSwh1n3o0jDb11ELqQs1rL355U5Q@mail.gmail.com>
Subject: Re: [PATCH v2] benchtests: Add pthread-mutex-locks bench
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: "Guo, Wangyang" <wangyang.guo@intel.com>,
 GNU C Library <libc-alpha@sourceware.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3026.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Apr 2022 20:38:43 -0000

On Sat, Apr 23, 2022 at 8:04 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Thu, Apr 21, 2022 at 5:58 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
> >
> > On 4/21/2022 9:13 PM, Noah Goldstein via Libc-alpha wrote:
> > > On Wed, Apr 20, 2022 at 10:29 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
> > >>
> > >> Benchmark for testing pthread mutex locks performance with different
> > >> threads and critical sections.
> > >>
> > >> The test configuration consists of 3 parts:
> > >> 1. thread number
> > >> 2. critical-section length
> > >> 3. non-critical-section length
> > >>
> > >> Thread number starts from 1 and increased by 2x until num of CPU cores
> > >> (nprocs). An additional over-saturation case (1.25 * nprocs) is also
> > >> included.
> > >> Critical-section is represented by a loop of shared do_filler(),
> > >> length can be determined by the loop iters.
> > >> Non-critical-section is similiar to the critical-section, except it's
> > >> based on non-shared do_filler().
> > >>
> > >> Currently, adaptive pthread_mutex lock is tested.
> > >>
> > >> v2: Fix benchout json schema validation error.
> > >> ---
> > >>   benchtests/Makefile                    |   2 +
> > >>   benchtests/bench-pthread-mutex-locks.c | 288 +++++++++++++++++++++++++
> > >>   2 files changed, 290 insertions(+)
> > >>   create mode 100644 benchtests/bench-pthread-mutex-locks.c
> > >>
> > >> diff --git a/benchtests/Makefile b/benchtests/Makefile
> > >> index 8dfca592fd..b477042e6c 100644
> > >> --- a/benchtests/Makefile
> > >> +++ b/benchtests/Makefile
> > >> @@ -102,6 +102,7 @@ endif
> > >>
> > >>   bench-pthread := \
> > >>     pthread-locks \
> > >> +  pthread-mutex-locks \
> > >>     pthread_once \
> > >>     thread_create \
> > >>   # bench-pthread
> > >> @@ -281,6 +282,7 @@ $(addprefix $(objpfx)bench-,$(math-benchset)): $(libm-benchtests)
> > >>   $(addprefix $(objpfx)bench-,$(bench-pthread)): $(thread-library-benchtests)
> > >>   $(addprefix $(objpfx)bench-,$(bench-malloc)): $(thread-library-benchtests)
> > >>   $(addprefix $(objpfx)bench-,pthread-locks): $(libm-benchtests)
> > >> +$(addprefix $(objpfx)bench-,pthread-mutex-locks): $(libm-benchtests)
> > >>
> > >>
> > >>
> > >> diff --git a/benchtests/bench-pthread-mutex-locks.c b/benchtests/bench-pthread-mutex-locks.c
> > >> new file mode 100644
> > >> index 0000000000..e934b0001a
> > >> --- /dev/null
> > >> +++ b/benchtests/bench-pthread-mutex-locks.c
> > >> @@ -0,0 +1,288 @@
> > >> +/* Measure mutex_lock for different threads and critical sections.
> > >> +   Copyright (C) 2020-2022 Free Software Foundation, Inc.
> > >> +   This file is part of the GNU C Library.
> > >> +
> > >> +   The GNU C Library is free software; you can redistribute it and/or
> > >> +   modify it under the terms of the GNU Lesser General Public
> > >> +   License as published by the Free Software Foundation; either
> > >> +   version 2.1 of the License, or (at your option) any later version.
> > >> +
> > >> +   The GNU C Library is distributed in the hope that it will be useful,
> > >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > >> +   Lesser General Public License for more details.
> > >> +
> > >> +   You should have received a copy of the GNU Lesser General Public
> > >> +   License along with the GNU C Library; if not, see
> > >> +   <https://www.gnu.org/licenses/>.  */
> > >> +
> > >> +#define TEST_MAIN
> > >> +#define TEST_NAME "pthread-mutex-locks"
> > >> +#define TIMEOUT (20 * 60)
> > >> +
> > >> +#include <stdio.h>
> > >> +#include <stdlib.h>
> > >> +#include <string.h>
> > >> +#include <unistd.h>
> > >> +#include <math.h>
> > >> +#include <pthread.h>
> > >> +#include <sys/time.h>
> > >> +#include <sys/sysinfo.h>
> > >> +#include "bench-timing.h"
> > >> +#include "json-lib.h"
> > >> +
> > >> +static pthread_mutex_t lock;
> > >> +static pthread_mutexattr_t attr;
> > >> +static pthread_barrier_t barrier;
> > >> +
> > >> +#define START_ITERS 1000
> > >> +
> > >> +#pragma GCC push_options
> > >> +#pragma GCC optimize(1)
> > >> +
> > >> +static int __attribute__ ((noinline)) fibonacci (int i)
> > >> +{
> > >> +  asm("");
> > >> +  if (i > 2)
> > >> +    return fibonacci (i - 1) + fibonacci (i - 2);
> > >> +  return 10 + i;
> > >> +}
> > >> +
> > >> +static void
> > >> +do_filler (void)
> > >> +{
> > >> +  char buf1[512], buf2[512];
> > >> +  int f = fibonacci (4);
> > >> +  memcpy (buf1, buf2, f);
> > >> +}
> > >> +
> > >> +static void
> > >> +do_filler_shared (void)
> > >> +{
> > >> +  static char buf1[512], buf2[512];
> > >> +  int f = fibonacci (4);
> > >> +  memcpy (buf1, buf2, f);
> > >> +}
> > >> +
> > >> +#pragma GCC pop_options
> > >> +
> > >> +#define UNIT_WORK_CRT do_filler_shared ()
> > >> +#define UNIT_WORK_NON_CRT do_filler ()
> > >> +
> > >> +static inline void
> > >> +critical_section (int length)
> > >> +{
> > >> +  for (int i = length; i >= 0; i--)
> > >> +    UNIT_WORK_CRT;
> > >> +}
> > >> +
> > >> +static inline void
> > >> +non_critical_section (int length)
> > >> +{
> > >> +  for (int i = length; i >= 0; i--)
> > >> +    UNIT_WORK_NON_CRT;
> > >> +}
> > >> +
> > >> +typedef struct Worker_Params
> > >> +{
> > >> +  long iters;
> > >> +  int crt_len;
> > >> +  int non_crt_len;
> > >> +  timing_t duration;
> > >> +} Worker_Params;
> > >> +
> > >> +static void *
> > >> +worker (void *v)
> > >> +{
> > >> +  timing_t start, stop;
> > >> +  Worker_Params *p = (Worker_Params *) v;
> > >> +  long iters = p->iters;
> > >> +  int crt_len = p->crt_len;
> > >> +  int non_crt_len = p->non_crt_len;
> > >> +
> > >> +  pthread_barrier_wait (&barrier);
> > >> +  TIMING_NOW (start);
> > >> +  while (iters--)
> > >> +    {
> > >> +      pthread_mutex_lock (&lock);
> > >> +      critical_section (crt_len);
> > >> +      pthread_mutex_unlock (&lock);
> > >> +      non_critical_section (non_crt_len);
> > >> +    }
> > >> +  TIMING_NOW (stop);
> > >> +
> > >> +  TIMING_DIFF (p->duration, start, stop);
> > >> +  return NULL;
> > >> +}
> > >> +
> > >> +static double
> > >> +do_one_test (int num_threads, int crt_len, int non_crt_len, long iters)
> > >> +{
> > >> +  int i;
> > >> +  timing_t mean;
> > >> +  Worker_Params *p, params[num_threads];
> > >> +  pthread_t threads[num_threads];
> > >> +
> > >> +  pthread_mutex_init (&lock, &attr);
> > >> +  pthread_barrier_init (&barrier, NULL, num_threads);
> > >> +
> > >> +  for (i = 0; i < num_threads; i++)
> > >> +    {
> > >> +      p = &params[i];
> > >> +      p->iters = iters;
> > >> +      p->crt_len = crt_len;
> > >> +      p->non_crt_len = non_crt_len;
> > >> +      pthread_create (&threads[i], NULL, worker, (void *) p);
> > >> +    }
> > >> +  for (i = 0; i < num_threads; i++)
> > >> +    pthread_join (threads[i], NULL);
> > >> +
> > >> +  pthread_mutex_destroy (&lock);
> > >> +  pthread_barrier_destroy (&barrier);
> > >> +
> > >> +  mean = 0;
> > >> +  for (i = 0; i < num_threads; i++)
> > >> +    mean += params[i].duration;
> > >> +  mean /= num_threads;
> > >> +  return mean;
> > >> +}
> > >> +
> > >> +#define RUN_COUNT 10
> > >> +#define MIN_TEST_SEC 0.01
> > >> +
> > >> +static void
> > >> +do_bench_one (const char *name, int num_threads, int crt_len, int non_crt_len,
> > >> +             json_ctx_t *js)
> > >> +{
> > >> +  timing_t cur;
> > >> +  struct timeval ts, te;
> > >> +  double tsd, ted, td;
> > >> +  long iters, iters_limit, total_iters;
> > >> +  timing_t curs[RUN_COUNT + 2];
> > >> +  int i, j;
> > >> +  double mean, stdev;
> > >> +
> > >> +  iters = START_ITERS;
> > >> +  iters_limit = LONG_MAX / 100;
> > >> +
> > >> +  while (1)
> > >> +    {
> > >> +      gettimeofday (&ts, NULL);
> > >> +      cur = do_one_test (num_threads, crt_len, non_crt_len, iters);
> > >> +      gettimeofday (&te, NULL);
> > >> +      /* Make sure the test to run at least MIN_TEST_SEC.  */
> > >> +      tsd = ts.tv_sec + ts.tv_usec / 1000000.0;
> > >> +      ted = te.tv_sec + te.tv_usec / 1000000.0;
> > >> +      td = ted - tsd;
> > >> +      if (td >= MIN_TEST_SEC || iters >= iters_limit)
> > >> +       break;
> > >> +
> > >> +      iters *= 10;
> > >> +    }
> > >> +
> > >> +  curs[0] = cur;
> > >> +  for (i = 1; i < RUN_COUNT + 2; i++)
> > >> +    curs[i] = do_one_test (num_threads, crt_len, non_crt_len, iters);
> > >> +
> > >> +  /* Sort the results so we can discard the fastest and slowest
> > >> +     times as outliers.  */
> > >> +  for (i = 0; i < RUN_COUNT + 1; i++)
> > >> +    for (j = i + 1; j < RUN_COUNT + 2; j++)
> > >> +      if (curs[i] > curs[j])
> > >> +       {
> > >> +         timing_t temp = curs[i];
> > >> +         curs[i] = curs[j];
> > >> +         curs[j] = temp;
> > >> +       }
> > >> +
> > >> +  /* Calculate mean and standard deviation.  */
> > >> +  mean = 0.0;
> > >> +  total_iters = iters * num_threads;
> > >> +  for (i = 1; i < RUN_COUNT + 1; i++)
> > >> +    mean += (double) curs[i] / (double) total_iters;
> > >> +  mean /= RUN_COUNT;
> > >> +
> > >> +  stdev = 0.0;
> > >> +  for (i = 1; i < RUN_COUNT + 1; i++)
> > >> +    {
> > >> +      double s = (double) curs[i] / (double) total_iters - mean;
> > >> +      stdev += s * s;
> > >> +    }
> > >> +  stdev = sqrt (stdev / (RUN_COUNT - 1));
> > >> +
> > >> +  char buf[256];
> > >> +  snprintf (buf, sizeof buf, "%s,non_crt_len=%d,crt_len=%d,threads=%d", name,
> > >> +           non_crt_len, crt_len, num_threads);
> > >> +
> > >> +  json_attr_object_begin (js, buf);
> > >> +
> > >> +  json_attr_double (js, "duration", (double) cur);
> > >> +  json_attr_double (js, "iterations", (double) total_iters);
> > >> +  json_attr_double (js, "mean", mean);
> > >> +  json_attr_double (js, "stdev", stdev);
> > >> +  json_attr_double (js, "min-outlier",
> > >> +                   (double) curs[0] / (double) total_iters);
> > >> +  json_attr_double (js, "min", (double) curs[1] / (double) total_iters);
> > >> +  json_attr_double (js, "max",
> > >> +                   (double) curs[RUN_COUNT] / (double) total_iters);
> > >> +  json_attr_double (js, "max-outlier",
> > >> +                   (double) curs[RUN_COUNT + 1] / (double) total_iters);
> > >> +
> > >> +  json_attr_object_end (js);
> > >> +}
> > >> +
> > >> +#define TH_CONF_MAX 10
> > >> +
> > >> +int
> > >> +do_bench (void)
> > >> +{
> > >> +  int rv = 0;
> > >> +  json_ctx_t json_ctx;
> > >> +  int i, j, k;
> > >> +  int th_num, th_conf, nprocs;
> > >> +  int threads[TH_CONF_MAX];
> > >> +  int crt_lens[] = { 0, 1, 2, 4, 8, 16, 32, 64, 128 };
> > >> +  int non_crt_lens[] = { 1, 32, 128 };
> > >> +  char name[128];
> > >> +
> > >> +  json_init (&json_ctx, 2, stdout);
> > >> +  json_attr_object_begin (&json_ctx, "pthread_mutex_locks");
> > >> +
> > >> +  /* The thread config begins from 1, and increases by 2x until nprocs.
> > >> +     We also wants to test over-saturation case (1.25*nprocs).  */
> > >> +  nprocs = get_nprocs ();
> > >> +  th_num = 1;
> > >> +  for (th_conf = 0; th_conf < (TH_CONF_MAX - 2) && th_num < nprocs; th_conf++)
> > >> +    {
> > >> +      threads[th_conf] = th_num;
> > >> +      th_num <<= 1;
> > >> +    }
> > >> +  threads[th_conf++] = nprocs;
> > >> +  threads[th_conf++] = nprocs + nprocs / 4;
> > >> +
> > >> +  pthread_mutexattr_init (&attr);
> > >> +  pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
> > >> +  snprintf (name, sizeof name, "type=adaptive");
> > >> +
> > >> +  for (k = 0; k < (sizeof (non_crt_lens) / sizeof (int)); k++)
> > >> +    {
> > >> +      int non_crt_len = non_crt_lens[k];
> > >> +      for (j = 0; j < (sizeof (crt_lens) / sizeof (int)); j++)
> > >> +       {
> > >> +         int crt_len = crt_lens[j];
> > >> +         for (i = 0; i < th_conf; i++)
> > >> +           {
> > >> +             th_num = threads[i];
> > >> +             do_bench_one (name, th_num, crt_len, non_crt_len, &json_ctx);
> > >> +           }
> > >> +       }
> > >> +    }
> > >> +
> > >> +  json_attr_object_end (&json_ctx);
> > >> +
> > >> +  return rv;
> > >> +}
> > >> +
> > >> +#define TEST_FUNCTION do_bench ()
> > >> +
> > >> +#include "../test-skeleton.c"
> > >> --
> > >> 2.35.1
> > >>
> > >
> > > Can you run clang-format on this? Otherwise
> > > LGTM.
> > >
> >
> > clang-format done.
> > Nothing needs to change for this patch.
>
>
> Woops.
>
> LGTM.

I am pushing it now.

Thanks.

-- 
H.J.