From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skpgkp2@gmail.com>
Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com
 [IPv6:2607:f8b0:4864:20::82e])
 by sourceware.org (Postfix) with ESMTPS id 8C1A93858C39
 for <libc-alpha@sourceware.org>; Sun, 14 Nov 2021 03:00:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C1A93858C39
Received: by mail-qt1-x82e.google.com with SMTP id t11so12281795qtw.3
 for <libc-alpha@sourceware.org>; Sat, 13 Nov 2021 19:00:12 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=rcwrrwwx3P88Dn1rexroTFHMl51ymcdt69c5jnQ1vwo=;
 b=RxCkcsQQyo9mPEmEXrv37I7HeSWtOyrJMfXb0h/Dzu0WJKokXd5sWZ0FG9VSqjMnoE
 dKSF9fYcGIVuFRu2rwV3SYGaKLCBYY4CCkU+kYlTk2ahSfeXAdqnfOL2603HW5wTk7S+
 zf3pBzuaaPZLOITFBSHq6wIu58bsPVi/AU56m+vYhHE8YjmBqwagewdRi5B1tAxSwDvT
 rRv1KZxkc37TD10SP9Yi3VbB1Uxqo2L3hmyNnUMHtKJpOmuS5IPmB/i18RWv2HmqXCT6
 QoU1O3plY3GnMekzMX2yZm4Qc2cRk04v0am9Fr7iUicK+2k5GksjyOV5tePoRIXAKxL8
 bl1w==
X-Gm-Message-State: AOAM531mwKfskOPWD+fBZrjzjAPyaNywBJIdZM0ww3mJshyPER3eSqGN
 VrDLxCzjmLFtaY9xqkujQOt6t+TttoESEReTxit6VqnU
X-Google-Smtp-Source: ABdhPJwAsyiqMe3rznJkA+NnPyav88tFwqD5RpF56yNdiyz0fnoBdA1kzWj277JxGQsm8LnIa+MJCvuvus2enyc5em4=
X-Received: by 2002:a05:622a:40a:: with SMTP id
 n10mr29503881qtx.161.1636858812046; 
 Sat, 13 Nov 2021 19:00:12 -0800 (PST)
MIME-Version: 1.0
References: <CAMAf5_fSS_dtMz-z-0edT6vgOxMg9dz4CUY+xCXRjPX5NhhURw@mail.gmail.com>
 <20211112191800.790574-1-skpgkp2@gmail.com>
 <20211112191800.790574-2-skpgkp2@gmail.com>
 <CAFUsyf+syaV9Vk6WhPdJ+OSbDtCaU3LwWvXB9JqKoAjkg1u4HA@mail.gmail.com>
 <CAMAf5_dBK1msQ+tUcJiNE45n7ZzOR8C53y=E9iLK1NrVrSCFsw@mail.gmail.com>
 <CAMe9rOqVF4-ocCw6KYeiaAmM67FWKDEuYHiRgHO+uvchfFPyDg@mail.gmail.com>
In-Reply-To: <CAMe9rOqVF4-ocCw6KYeiaAmM67FWKDEuYHiRgHO+uvchfFPyDg@mail.gmail.com>
From: Sunil Pandey <skpgkp2@gmail.com>
Date: Sat, 13 Nov 2021 18:59:36 -0800
Message-ID: <CAMAf5_eLAt+3VxoPhb-nFUE+54hCGWqqApnyqLOMF4quxVfx0g@mail.gmail.com>
Subject: Re: [PATCH v2 1/6] x86-64: Create microbenchmark infrastructure for
 libmvec
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>,
 GNU C Library <libc-alpha@sourceware.org>
X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, HTML_MESSAGE,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Nov 2021 03:00:15 -0000

On Sat, Nov 13, 2021 at 11:48 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Fri, Nov 12, 2021 at 2:51 PM Sunil Pandey via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Fri, Nov 12, 2021 at 1:02 PM Noah Goldstein <goldstein.w.n@gmail.com>
> > wrote:
> >
> > > On Fri, Nov 12, 2021 at 1:19 PM Sunil K Pandey via Libc-alpha
> > > <libc-alpha@sourceware.org> wrote:
> > > >
> > > > Add python script to generate libmvec microbenchmark from the input
> > > > values for each libmvec function using skeleton benchmark template.
> > > >
> > > > Creates double and float benchmarks with vector length 1, 2, 4, 8,
> > > > and 16 for each libmvec function.  Vector length 1 corresponds to
> > > > scalar version of function and is included for vector function perf
> > > > comparison.
> > > > ---
> > > >  sysdeps/x86_64/fpu/Makeconfig               |  35 ++
> > > >  sysdeps/x86_64/fpu/Makefile                 |  40 ++
> > > >  sysdeps/x86_64/fpu/bench-libmvec-skeleton.c | 104 +++++
> > > >  sysdeps/x86_64/fpu/scripts/bench_libmvec.py | 464
> ++++++++++++++++++++
> > > >  4 files changed, 643 insertions(+)
> > > >  create mode 100644 sysdeps/x86_64/fpu/bench-libmvec-skeleton.c
> > > >  create mode 100755 sysdeps/x86_64/fpu/scripts/bench_libmvec.py
> > > >
> > > > diff --git a/sysdeps/x86_64/fpu/Makeconfig
> > > b/sysdeps/x86_64/fpu/Makeconfig
> > > > index 24aaee1a43..503e9b5ffa 100644
> > > > --- a/sysdeps/x86_64/fpu/Makeconfig
> > > > +++ b/sysdeps/x86_64/fpu/Makeconfig
> > > > @@ -29,6 +29,23 @@ libmvec-funcs = \
> > > >    sin \
> > > >    sincos \
> > > >
> > > > +# Define libmvec function for benchtests directory.
> > > > +libmvec-bench-funcs = \
> > > > +
> > > > +bench-libmvec-double = \
> > > > +  $(addprefix double-vlen1-, $(libmvec-bench-funcs)) \
> > > > +  $(addprefix double-vlen2-, $(libmvec-bench-funcs)) \
> > > > +  $(addprefix double-vlen4-, $(libmvec-bench-funcs)) \
> > > > +  $(addprefix double-vlen4-avx2-, $(libmvec-bench-funcs)) \
> > > > +  $(addprefix double-vlen8-, $(libmvec-bench-funcs)) \
> > > > +
> > > > +bench-libmvec-float = \
> > > > +  $(addsuffix f, $(addprefix float-vlen1-, $(libmvec-bench-funcs)))
> \
> > > > +  $(addsuffix f, $(addprefix float-vlen4-, $(libmvec-bench-funcs)))
> \
> > > > +  $(addsuffix f, $(addprefix float-vlen8-, $(libmvec-bench-funcs)))
> \
> > > > +  $(addsuffix f, $(addprefix float-vlen8-avx2-,
> > > $(libmvec-bench-funcs))) \
> > > > +  $(addsuffix f, $(addprefix float-vlen16-,
> $(libmvec-bench-funcs))) \
> > > > +
> > > >  # The base libmvec ABI tests.
> > > >  libmvec-abi-func-tests = \
> > > >    $(addprefix test-double-libmvec-,$(libmvec-funcs)) \
> > > > @@ -83,5 +100,23 @@ $(common-objpfx)libmvec.mk:
> > > $(common-objpfx)config.make
> > > >            echo "  \$$(float-vlen16-arch-ext-cflags)"; \
> > > >            echo; \
> > > >          done; \
> > > > +        echo "endif"; \
> > > > +        echo "ifeq (\$$(subdir),benchtests)"; \
> > > > +        for t in $(libmvec-bench-funcs); do \
> > > > +          echo "CFLAGS-bench-double-vlen4-$$t.c = \\"; \
> > > > +          echo "  \$$(double-vlen4-arch-ext-cflags)"; \
> > > > +          echo "CFLAGS-bench-double-vlen4-avx2-$$t.c = \\"; \
> > > > +          echo "  \$$(double-vlen4-arch-ext2-cflags)"; \
> > > > +          echo "CFLAGS-bench-double-vlen8-$$t.c = \\"; \
> > > > +          echo "  \$$(double-vlen8-arch-ext-cflags)"; \
> > > > +          echo; \
> > > > +          echo "CFLAGS-bench-float-vlen8-$${t}f.c = \\"; \
> > > > +          echo "  \$$(float-vlen8-arch-ext-cflags)"; \
> > > > +          echo "CFLAGS-bench-float-vlen8-avx2-$${t}f.c = \\"; \
> > > > +          echo "  \$$(float-vlen8-arch-ext2-cflags)"; \
> > > > +          echo "CFLAGS-bench-float-vlen16-$${t}f.c = \\"; \
> > > > +          echo "  \$$(float-vlen16-arch-ext-cflags)"; \
> > > > +          echo; \
> > > > +        done; \
> > > >          echo "endif") > $@T
> > > >         mv -f $@T $@
> > > > diff --git a/sysdeps/x86_64/fpu/Makefile
> b/sysdeps/x86_64/fpu/Makefile
> > > > index d172ae815d..9fb587cf8f 100644
> > > > --- a/sysdeps/x86_64/fpu/Makefile
> > > > +++ b/sysdeps/x86_64/fpu/Makefile
> > > > @@ -72,3 +72,43 @@ ifeq
> > > ($(subdir)$(config-cflags-mprefer-vector-width),mathyes)
> > > >  # performance of sin and cos by more than 40% on Skylake.
> > > >  CFLAGS-branred.c = -mprefer-vector-width=128
> > > >  endif
> > > > +
> > > > +ifeq ($(subdir),benchtests)
> > > > +double-vlen4-arch-ext-cflags = -mavx
> > > > +double-vlen4-arch-ext2-cflags = -mavx2
> > > > +double-vlen8-arch-ext-cflags = -mavx512f
> > > > +
> > > > +float-vlen8-arch-ext-cflags = -mavx
> > > > +float-vlen8-arch-ext2-cflags = -mavx2
> > > > +float-vlen16-arch-ext-cflags = -mavx512f
> > > > +
> > > > +bench-libmvec := $(bench-libmvec-double) $(bench-libmvec-float)
> > > > +
> > > > +ifeq (${BENCHSET},)
> > > > +bench += $(bench-libmvec)
> > > > +endif
> > > > +
> > > > +ifeq (${STATIC-BENCHTESTS},yes)
> > > > +libmvec-benchtests = $(common-objpfx)mathvec/libmvec.a
> > > $(common-objpfx)math/libm.a
> > > > +else
> > > > +libmvec-benchtests = $(libmvec) $(libm)
> > > > +endif
> > > > +
> > > > +$(addprefix $(objpfx)bench-,$(bench-libmvec-double)):
> > > $(libmvec-benchtests)
> > > > +$(addprefix $(objpfx)bench-,$(bench-libmvec-float)):
> > > $(libmvec-benchtests)
> > > > +bench-libmvec-deps =
> $(..)sysdeps/x86_64/fpu/bench-libmvec-skeleton.c
> > > bench-timing.h Makefile
> > > > +
> > > > +$(objpfx)bench-float-%.c: $(bench-libmvec-deps)
> > > > +       { if [ -n "$($*-INCLUDE)" ]; then \
> > > > +         cat $($*-INCLUDE); \
> > > > +       fi; \
> > > > +       $(PYTHON) $(..)sysdeps/x86_64/fpu/scripts/bench_libmvec.py
> > > $(basename $(@F)); } > $@-tmp
> > > > +       mv -f $@-tmp $@
> > > > +
> > > > +$(objpfx)bench-double-%.c: $(bench-libmvec-deps)
> > > > +       { if [ -n "$($*-INCLUDE)" ]; then \
> > > > +         cat $($*-INCLUDE); \
> > > > +       fi; \
> > > > +       $(PYTHON) $(..)sysdeps/x86_64/fpu/scripts/bench_libmvec.py
> > > $(basename $(@F)); } > $@-tmp
> > > > +       mv -f $@-tmp $@
> > > > +endif
> > > > diff --git a/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c
> > > b/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c
> > > > new file mode 100644
> > > > index 0000000000..d56a0c4462
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c
> > > > @@ -0,0 +1,104 @@
> > > > +/* Skeleton for libmvec benchmark programs.
> > > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > > +   This file is part of the GNU C Library.
> > > > +
> > > > +   The GNU C Library is free software; you can redistribute it
> and/or
> > > > +   modify it under the terms of the GNU Lesser General Public
> > > > +   License as published by the Free Software Foundation; either
> > > > +   version 2.1 of the License, or (at your option) any later
> version.
> > > > +
> > > > +   The GNU C Library is distributed in the hope that it will be
> useful,
> > > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > > +   Lesser General Public License for more details.
> > > > +
> > > > +   You should have received a copy of the GNU Lesser General Public
> > > > +   License along with the GNU C Library; if not, see
> > > > +   <https://www.gnu.org/licenses/>.  */
> > > > +
> > > > +#include <string.h>
> > > > +#include <stdint.h>
> > > > +#include <stdbool.h>
> > > > +#include <stdio.h>
> > > > +#include <time.h>
> > > > +#include <inttypes.h>
> > > > +#include <bench-timing.h>
> > > > +#include <json-lib.h>
> > > > +#include <bench-util.h>
> > > > +
> > > > +#include <bench-util.c>
> > > > +#include <math-tests-arch.h>
> > > > +#define D_ITERS 10000
> > > > +
> > > > +int
> > > > +main (int argc, char **argv)
> > > > +{
> > > > +  unsigned long i, k;
> > > > +  timing_t start, end;
> > > > +  json_ctx_t json_ctx;
> > > > +
> > > > +#if defined REQUIRE_AVX
> > > > +  if (!CPU_FEATURE_ACTIVE (AVX))
> > > > +    {
> > > > +      printf ("AVX not supported.\n");
> > > > +      return 0;
> > > > +    }
> > > > +#elif defined REQUIRE_AVX2
> > > > +  if (!CPU_FEATURE_ACTIVE (AVX2))
> > > > +    {
> > > > +      printf ("AVX2 not supported.\n");
> > > > +      return 0;
> > > > +    }
> > > > +#elif defined REQUIRE_AVX512F
> > > > +  if (!CPU_FEATURE_ACTIVE (AVX512F))
> > > > +    {
> > > > +      printf ("AVX512F not supported.\n");
> > > > +      return 0;
> > > > +    }
> > > > +#endif
> > > > +
> > > > +  bench_start ();
> > > > +
> > > > +#ifdef BENCH_INIT
> > > > +  BENCH_INIT ();
> > > > +#endif
> > > > +
> > > > +  json_init (&json_ctx, 2, stdout);
> > > > +
> > > > +  /* Begin function.  */
> > > > +  json_attr_object_begin (&json_ctx, FUNCNAME);
> > > > +
> > > > +  for (int v = 0; v < NUM_VARIANTS; v++)
> > > > +    {
> > > > +      double d_total_time = 0;
> > > > +      uint64_t cur;
> > >
> > > Think these should also be type `timing_t`
> > >
> >
> > I do not see a difference if I use timing_t or uint64_t. In any case
> > variable cur stores the
> > difference between start and end time, not time.
> >
> >
> > >
> > > > +      for (k = 0; k < D_ITERS; k++)
> > > > +       {
> > > > +         TIMING_NOW (start);
> > > > +         for (i = 0; i < NUM_SAMPLES (v); i++)
> > >
> > > What is the rationale for both `D_ITERS` and `NUM_SAMPLES (v)`? Why not
> > > one loop that iterates for `D_ITERS * NUM_SAMPLES (v)`?
> > >
> >
> > D_ITERS define how many times each variant full data set will run.
> > NUM_SAMPLES(v)
> > represent the number of data sets in variant v. Index v and i select,
> i'th
> > data set from
> > variant v and call vector function.  Having two loops simplifies logic.
> >
> >
> > > > +           BENCH_FUNC (v, i);
> > > > +         TIMING_NOW (end);
> > > > +
> > > > +         TIMING_DIFF (cur, start, end);
> > > > +
> > > > +         d_total_time += cur;
> > >.> > Think this should be `TIMING_ACCUM(d_total_time, cur)`.
> > >
> >
> > Not much difference, if I use TIMING_ACCUM or simply add cur to
> > d_total_time.
> >
>
> Please use TIMING_ACCUM (d_total_time, cur) to be consistent with
> TIMING_DIFF (cur, start, end).
>

Sure, I will fix it in the next version.


>
> Thanks.
>
>
> --
> H.J.
>