From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id 8C1A93858C39 for ; Sun, 14 Nov 2021 03:00:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C1A93858C39 Received: by mail-qt1-x82e.google.com with SMTP id t11so12281795qtw.3 for ; Sat, 13 Nov 2021 19:00:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rcwrrwwx3P88Dn1rexroTFHMl51ymcdt69c5jnQ1vwo=; b=RxCkcsQQyo9mPEmEXrv37I7HeSWtOyrJMfXb0h/Dzu0WJKokXd5sWZ0FG9VSqjMnoE dKSF9fYcGIVuFRu2rwV3SYGaKLCBYY4CCkU+kYlTk2ahSfeXAdqnfOL2603HW5wTk7S+ zf3pBzuaaPZLOITFBSHq6wIu58bsPVi/AU56m+vYhHE8YjmBqwagewdRi5B1tAxSwDvT rRv1KZxkc37TD10SP9Yi3VbB1Uxqo2L3hmyNnUMHtKJpOmuS5IPmB/i18RWv2HmqXCT6 QoU1O3plY3GnMekzMX2yZm4Qc2cRk04v0am9Fr7iUicK+2k5GksjyOV5tePoRIXAKxL8 bl1w== X-Gm-Message-State: AOAM531mwKfskOPWD+fBZrjzjAPyaNywBJIdZM0ww3mJshyPER3eSqGN VrDLxCzjmLFtaY9xqkujQOt6t+TttoESEReTxit6VqnU X-Google-Smtp-Source: ABdhPJwAsyiqMe3rznJkA+NnPyav88tFwqD5RpF56yNdiyz0fnoBdA1kzWj277JxGQsm8LnIa+MJCvuvus2enyc5em4= X-Received: by 2002:a05:622a:40a:: with SMTP id n10mr29503881qtx.161.1636858812046; Sat, 13 Nov 2021 19:00:12 -0800 (PST) MIME-Version: 1.0 References: <20211112191800.790574-1-skpgkp2@gmail.com> <20211112191800.790574-2-skpgkp2@gmail.com> In-Reply-To: From: Sunil Pandey Date: Sat, 13 Nov 2021 18:59:36 -0800 Message-ID: Subject: Re: [PATCH v2 1/6] x86-64: Create microbenchmark infrastructure for libmvec To: "H.J. Lu" Cc: Noah Goldstein , GNU C Library X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Nov 2021 03:00:15 -0000 On Sat, Nov 13, 2021 at 11:48 AM H.J. Lu wrote: > On Fri, Nov 12, 2021 at 2:51 PM Sunil Pandey via Libc-alpha > wrote: > > > > On Fri, Nov 12, 2021 at 1:02 PM Noah Goldstein > > wrote: > > > > > On Fri, Nov 12, 2021 at 1:19 PM Sunil K Pandey via Libc-alpha > > > wrote: > > > > > > > > Add python script to generate libmvec microbenchmark from the input > > > > values for each libmvec function using skeleton benchmark template. > > > > > > > > Creates double and float benchmarks with vector length 1, 2, 4, 8, > > > > and 16 for each libmvec function. Vector length 1 corresponds to > > > > scalar version of function and is included for vector function perf > > > > comparison. > > > > --- > > > > sysdeps/x86_64/fpu/Makeconfig | 35 ++ > > > > sysdeps/x86_64/fpu/Makefile | 40 ++ > > > > sysdeps/x86_64/fpu/bench-libmvec-skeleton.c | 104 +++++ > > > > sysdeps/x86_64/fpu/scripts/bench_libmvec.py | 464 > ++++++++++++++++++++ > > > > 4 files changed, 643 insertions(+) > > > > create mode 100644 sysdeps/x86_64/fpu/bench-libmvec-skeleton.c > > > > create mode 100755 sysdeps/x86_64/fpu/scripts/bench_libmvec.py > > > > > > > > diff --git a/sysdeps/x86_64/fpu/Makeconfig > > > b/sysdeps/x86_64/fpu/Makeconfig > > > > index 24aaee1a43..503e9b5ffa 100644 > > > > --- a/sysdeps/x86_64/fpu/Makeconfig > > > > +++ b/sysdeps/x86_64/fpu/Makeconfig > > > > @@ -29,6 +29,23 @@ libmvec-funcs = \ > > > > sin \ > > > > sincos \ > > > > > > > > +# Define libmvec function for benchtests directory. > > > > +libmvec-bench-funcs = \ > > > > + > > > > +bench-libmvec-double = \ > > > > + $(addprefix double-vlen1-, $(libmvec-bench-funcs)) \ > > > > + $(addprefix double-vlen2-, $(libmvec-bench-funcs)) \ > > > > + $(addprefix double-vlen4-, $(libmvec-bench-funcs)) \ > > > > + $(addprefix double-vlen4-avx2-, $(libmvec-bench-funcs)) \ > > > > + $(addprefix double-vlen8-, $(libmvec-bench-funcs)) \ > > > > + > > > > +bench-libmvec-float = \ > > > > + $(addsuffix f, $(addprefix float-vlen1-, $(libmvec-bench-funcs))) > \ > > > > + $(addsuffix f, $(addprefix float-vlen4-, $(libmvec-bench-funcs))) > \ > > > > + $(addsuffix f, $(addprefix float-vlen8-, $(libmvec-bench-funcs))) > \ > > > > + $(addsuffix f, $(addprefix float-vlen8-avx2-, > > > $(libmvec-bench-funcs))) \ > > > > + $(addsuffix f, $(addprefix float-vlen16-, > $(libmvec-bench-funcs))) \ > > > > + > > > > # The base libmvec ABI tests. > > > > libmvec-abi-func-tests = \ > > > > $(addprefix test-double-libmvec-,$(libmvec-funcs)) \ > > > > @@ -83,5 +100,23 @@ $(common-objpfx)libmvec.mk: > > > $(common-objpfx)config.make > > > > echo " \$$(float-vlen16-arch-ext-cflags)"; \ > > > > echo; \ > > > > done; \ > > > > + echo "endif"; \ > > > > + echo "ifeq (\$$(subdir),benchtests)"; \ > > > > + for t in $(libmvec-bench-funcs); do \ > > > > + echo "CFLAGS-bench-double-vlen4-$$t.c = \\"; \ > > > > + echo " \$$(double-vlen4-arch-ext-cflags)"; \ > > > > + echo "CFLAGS-bench-double-vlen4-avx2-$$t.c = \\"; \ > > > > + echo " \$$(double-vlen4-arch-ext2-cflags)"; \ > > > > + echo "CFLAGS-bench-double-vlen8-$$t.c = \\"; \ > > > > + echo " \$$(double-vlen8-arch-ext-cflags)"; \ > > > > + echo; \ > > > > + echo "CFLAGS-bench-float-vlen8-$${t}f.c = \\"; \ > > > > + echo " \$$(float-vlen8-arch-ext-cflags)"; \ > > > > + echo "CFLAGS-bench-float-vlen8-avx2-$${t}f.c = \\"; \ > > > > + echo " \$$(float-vlen8-arch-ext2-cflags)"; \ > > > > + echo "CFLAGS-bench-float-vlen16-$${t}f.c = \\"; \ > > > > + echo " \$$(float-vlen16-arch-ext-cflags)"; \ > > > > + echo; \ > > > > + done; \ > > > > echo "endif") > $@T > > > > mv -f $@T $@ > > > > diff --git a/sysdeps/x86_64/fpu/Makefile > b/sysdeps/x86_64/fpu/Makefile > > > > index d172ae815d..9fb587cf8f 100644 > > > > --- a/sysdeps/x86_64/fpu/Makefile > > > > +++ b/sysdeps/x86_64/fpu/Makefile > > > > @@ -72,3 +72,43 @@ ifeq > > > ($(subdir)$(config-cflags-mprefer-vector-width),mathyes) > > > > # performance of sin and cos by more than 40% on Skylake. > > > > CFLAGS-branred.c = -mprefer-vector-width=128 > > > > endif > > > > + > > > > +ifeq ($(subdir),benchtests) > > > > +double-vlen4-arch-ext-cflags = -mavx > > > > +double-vlen4-arch-ext2-cflags = -mavx2 > > > > +double-vlen8-arch-ext-cflags = -mavx512f > > > > + > > > > +float-vlen8-arch-ext-cflags = -mavx > > > > +float-vlen8-arch-ext2-cflags = -mavx2 > > > > +float-vlen16-arch-ext-cflags = -mavx512f > > > > + > > > > +bench-libmvec := $(bench-libmvec-double) $(bench-libmvec-float) > > > > + > > > > +ifeq (${BENCHSET},) > > > > +bench += $(bench-libmvec) > > > > +endif > > > > + > > > > +ifeq (${STATIC-BENCHTESTS},yes) > > > > +libmvec-benchtests = $(common-objpfx)mathvec/libmvec.a > > > $(common-objpfx)math/libm.a > > > > +else > > > > +libmvec-benchtests = $(libmvec) $(libm) > > > > +endif > > > > + > > > > +$(addprefix $(objpfx)bench-,$(bench-libmvec-double)): > > > $(libmvec-benchtests) > > > > +$(addprefix $(objpfx)bench-,$(bench-libmvec-float)): > > > $(libmvec-benchtests) > > > > +bench-libmvec-deps = > $(..)sysdeps/x86_64/fpu/bench-libmvec-skeleton.c > > > bench-timing.h Makefile > > > > + > > > > +$(objpfx)bench-float-%.c: $(bench-libmvec-deps) > > > > + { if [ -n "$($*-INCLUDE)" ]; then \ > > > > + cat $($*-INCLUDE); \ > > > > + fi; \ > > > > + $(PYTHON) $(..)sysdeps/x86_64/fpu/scripts/bench_libmvec.py > > > $(basename $(@F)); } > $@-tmp > > > > + mv -f $@-tmp $@ > > > > + > > > > +$(objpfx)bench-double-%.c: $(bench-libmvec-deps) > > > > + { if [ -n "$($*-INCLUDE)" ]; then \ > > > > + cat $($*-INCLUDE); \ > > > > + fi; \ > > > > + $(PYTHON) $(..)sysdeps/x86_64/fpu/scripts/bench_libmvec.py > > > $(basename $(@F)); } > $@-tmp > > > > + mv -f $@-tmp $@ > > > > +endif > > > > diff --git a/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c > > > b/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c > > > > new file mode 100644 > > > > index 0000000000..d56a0c4462 > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/bench-libmvec-skeleton.c > > > > @@ -0,0 +1,104 @@ > > > > +/* Skeleton for libmvec benchmark programs. > > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > > + This file is part of the GNU C Library. > > > > + > > > > + The GNU C Library is free software; you can redistribute it > and/or > > > > + modify it under the terms of the GNU Lesser General Public > > > > + License as published by the Free Software Foundation; either > > > > + version 2.1 of the License, or (at your option) any later > version. > > > > + > > > > + The GNU C Library is distributed in the hope that it will be > useful, > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > + Lesser General Public License for more details. > > > > + > > > > + You should have received a copy of the GNU Lesser General Public > > > > + License along with the GNU C Library; if not, see > > > > + . */ > > > > + > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#include > > > > +#include > > > > +#define D_ITERS 10000 > > > > + > > > > +int > > > > +main (int argc, char **argv) > > > > +{ > > > > + unsigned long i, k; > > > > + timing_t start, end; > > > > + json_ctx_t json_ctx; > > > > + > > > > +#if defined REQUIRE_AVX > > > > + if (!CPU_FEATURE_ACTIVE (AVX)) > > > > + { > > > > + printf ("AVX not supported.\n"); > > > > + return 0; > > > > + } > > > > +#elif defined REQUIRE_AVX2 > > > > + if (!CPU_FEATURE_ACTIVE (AVX2)) > > > > + { > > > > + printf ("AVX2 not supported.\n"); > > > > + return 0; > > > > + } > > > > +#elif defined REQUIRE_AVX512F > > > > + if (!CPU_FEATURE_ACTIVE (AVX512F)) > > > > + { > > > > + printf ("AVX512F not supported.\n"); > > > > + return 0; > > > > + } > > > > +#endif > > > > + > > > > + bench_start (); > > > > + > > > > +#ifdef BENCH_INIT > > > > + BENCH_INIT (); > > > > +#endif > > > > + > > > > + json_init (&json_ctx, 2, stdout); > > > > + > > > > + /* Begin function. */ > > > > + json_attr_object_begin (&json_ctx, FUNCNAME); > > > > + > > > > + for (int v = 0; v < NUM_VARIANTS; v++) > > > > + { > > > > + double d_total_time = 0; > > > > + uint64_t cur; > > > > > > Think these should also be type `timing_t` > > > > > > > I do not see a difference if I use timing_t or uint64_t. In any case > > variable cur stores the > > difference between start and end time, not time. > > > > > > > > > > > + for (k = 0; k < D_ITERS; k++) > > > > + { > > > > + TIMING_NOW (start); > > > > + for (i = 0; i < NUM_SAMPLES (v); i++) > > > > > > What is the rationale for both `D_ITERS` and `NUM_SAMPLES (v)`? Why not > > > one loop that iterates for `D_ITERS * NUM_SAMPLES (v)`? > > > > > > > D_ITERS define how many times each variant full data set will run. > > NUM_SAMPLES(v) > > represent the number of data sets in variant v. Index v and i select, > i'th > > data set from > > variant v and call vector function. Having two loops simplifies logic. > > > > > > > > + BENCH_FUNC (v, i); > > > > + TIMING_NOW (end); > > > > + > > > > + TIMING_DIFF (cur, start, end); > > > > + > > > > + d_total_time += cur; > > >.> > Think this should be `TIMING_ACCUM(d_total_time, cur)`. > > > > > > > Not much difference, if I use TIMING_ACCUM or simply add cur to > > d_total_time. > > > > Please use TIMING_ACCUM (d_total_time, cur) to be consistent with > TIMING_DIFF (cur, start, end). > Sure, I will fix it in the next version. > > Thanks. > > > -- > H.J. >