From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-99155-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 64988 invoked by alias); 10 Jan 2019 13:18:47 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 64970 invoked by uid 89); 10 Jan 2019 13:18:46 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=BAYES_00,FROM_EXCESS_BASE64,GIT_PATCH_2,GIT_PATCH_3,MIME_QP_LONG_LINE,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 spammy=HContent-type:plain, lu, HContent-type:text, Single
X-HELO: out0-151.mail.aliyun.com
X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01l07391;MF=ling.ml@antfin.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---.DjuNr-0_1547126303;
User-Agent: Microsoft-MacOutlook/10.10.5.181209
Date: Thu, 10 Jan 2019 13:18:00 -0000
Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
From: "=?UTF-8?B?6ams5YeMKOW9puWGmyk=?=" <ling.ml@antfin.com>
To: <fweimer@redhat.com>,
	"H.J. Lu" <hjl.tools@gmail.com>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>
CC: libc-alpha <libc-alpha@sourceware.org>,
	"Xiao, Wei3" <wei3.xiao@intel.com>,
	nd <nd@arm.com>,
	"ling.ma.program" <ling.ma.program@gmail.com>
Message-ID: <3A1F9E3E-7654-4E54-AD1F-594CA21E8447@antfin.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="UTF-8"
Content-transfer-encoding: quoted-printable
X-SW-Source: 2019-01/txt/msg00225.txt.bz2

Hi Florian,

Thanks for your comments!

We test numa spinlock on 2s-Kunpeng 920 platform ,128 physical arm cores wi=
th 256G RAM as below.

$./tst-variable-overhead=20
Number of processors: 128, Single thread time 11657100

Number of threads:    2, Total time       33449020, Overhead: 1.43
Number of threads:    4, Total time      135449160, Overhead: 2.90
Number of threads:    8, Total time     1146508900, Overhead: 12.29
Number of threads:   16, Total time     6725395660, Overhead: 36.06
Number of threads:   32, Total time    37197114800, Overhead: 99.72
Number of threads:   64, Total time   501098134360, Overhead: 671.66
Number of threads:  128, Total time  2588795930500, Overhead: 1734.99
Number of threads:  256, Total time 14987969840860, Overhead: 5022.41
Number of threads:  384, Total time 31444706737160, Overhead: 7024.67
Number of threads:  512, Total time 60079858502060, Overhead: 10066.27=20

2. numa spinlock
$./tst-numa-variable-overhead
Number of processors: 128, Single thread time 12647780

Number of threads:    2, Total time       36606840, Overhead: 1.45
Number of threads:    4, Total time      115740060, Overhead: 2.29
Number of threads:    8, Total time      604662840, Overhead: 5.98
Number of threads:   16, Total time     2285066760, Overhead: 11.29
Number of threads:   32, Total time     8533264240, Overhead: 21.08
Number of threads:   64, Total time    72671073600, Overhead: 89.78
Number of threads:  128, Total time   287805932560, Overhead: 177.78
Number of threads:  256, Total time   837367226760, Overhead: 258.62
Number of threads:  384, Total time  1954243727660, Overhead: 402.38
Number of threads:  512, Total time  3523015939200, Overhead: 544.04
=20
The above data tell us the numa spinlock improve performance upto 17X with =
512 threads on arm platform.
And numa spinlock should improve spinlock performance on all multi-socket s=
ystems.

Thanks
Ling

=EF=BB=BF=E5=9C=A8 2019/1/4 =E4=B8=8A=E5=8D=883:59=EF=BC=8C=E2=80=9CH.J. Lu=
=E2=80=9D<hjl.tools@gmail.com> =E5=86=99=E5=85=A5:

    On Thu, Jan 3, 2019 at 6:52 AM Szabolcs Nagy <Szabolcs.Nagy@arm.com> wr=
ote:
    >
    > On 03/01/2019 05:35, =E9=A9=AC=E5=87=8C(=E5=BD=A6=E5=86=9B) wrote:
    > >      create mode 100644 manual/examples/numa-spinlock.c
    > >      create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock-priva=
te.h
    > >      create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.c
    > >      create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.h
    > >      create mode 100644 sysdeps/unix/sysv/linux/numa_spinlock_alloc=
.c
    > >      create mode 100644 sysdeps/unix/sysv/linux/x86/tst-numa-variab=
le-overhead.c
    > >      create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-ov=
erhead-skeleton.c
    > >      create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-ov=
erhead.c
    >
    > as far as i can tell the new code is generic
    > (other than the presence of efficient getcpu),
    > so i think the test should be generic too.
    >
    > >     --- /dev/null
    > >     +++ b/sysdeps/unix/sysv/linux/x86/tst-variable-overhead-skeleto=
n.c
    > >     @@ -0,0 +1,384 @@
    > ...
    > >     +/* Check spinlock overhead with large number threads.  Critica=
l region is
    > >     +   very smmall.  Critical region + spinlock overhead aren't no=
ticeable
    > >     +   when number of threads is small.  When thread number increa=
ses,
    > >     +   spinlock overhead become the bottleneck.  It shows up in wa=
ll time
    > >     +   of thread execution.  */
    >
    > yeah, this is not easy to do in a generic way, i think
    > even on x86 such measurement is problematic, you don't
    > know what goes on a system (or vm) when the glibc test
    > is running.
    >
    > but doing precise timing is not that important for
    > checking the correctness of the locks, so i think a
    > simplified version can be generic test code.
=20=20=20=20
    Here is the updated patch to make tests generic.
=20=20=20=20
    --=20
    H.J.
=20=20=20=20