From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 64988 invoked by alias); 10 Jan 2019 13:18:47 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 64970 invoked by uid 89); 10 Jan 2019 13:18:46 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=BAYES_00,FROM_EXCESS_BASE64,GIT_PATCH_2,GIT_PATCH_3,MIME_QP_LONG_LINE,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 spammy=HContent-type:plain, lu, HContent-type:text, Single X-HELO: out0-151.mail.aliyun.com X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01l07391;MF=ling.ml@antfin.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---.DjuNr-0_1547126303; User-Agent: Microsoft-MacOutlook/10.10.5.181209 Date: Thu, 10 Jan 2019 13:18:00 -0000 Subject: Re: [PATCH] NUMA spinlock [BZ #23962] From: "=?UTF-8?B?6ams5YeMKOW9puWGmyk=?=" To: , "H.J. Lu" , Szabolcs Nagy CC: libc-alpha , "Xiao, Wei3" , nd , "ling.ma.program" Message-ID: <3A1F9E3E-7654-4E54-AD1F-594CA21E8447@antfin.com> Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable X-SW-Source: 2019-01/txt/msg00225.txt.bz2 Hi Florian, Thanks for your comments! We test numa spinlock on 2s-Kunpeng 920 platform ,128 physical arm cores wi= th 256G RAM as below. $./tst-variable-overhead=20 Number of processors: 128, Single thread time 11657100 Number of threads: 2, Total time 33449020, Overhead: 1.43 Number of threads: 4, Total time 135449160, Overhead: 2.90 Number of threads: 8, Total time 1146508900, Overhead: 12.29 Number of threads: 16, Total time 6725395660, Overhead: 36.06 Number of threads: 32, Total time 37197114800, Overhead: 99.72 Number of threads: 64, Total time 501098134360, Overhead: 671.66 Number of threads: 128, Total time 2588795930500, Overhead: 1734.99 Number of threads: 256, Total time 14987969840860, Overhead: 5022.41 Number of threads: 384, Total time 31444706737160, Overhead: 7024.67 Number of threads: 512, Total time 60079858502060, Overhead: 10066.27=20 2. numa spinlock $./tst-numa-variable-overhead Number of processors: 128, Single thread time 12647780 Number of threads: 2, Total time 36606840, Overhead: 1.45 Number of threads: 4, Total time 115740060, Overhead: 2.29 Number of threads: 8, Total time 604662840, Overhead: 5.98 Number of threads: 16, Total time 2285066760, Overhead: 11.29 Number of threads: 32, Total time 8533264240, Overhead: 21.08 Number of threads: 64, Total time 72671073600, Overhead: 89.78 Number of threads: 128, Total time 287805932560, Overhead: 177.78 Number of threads: 256, Total time 837367226760, Overhead: 258.62 Number of threads: 384, Total time 1954243727660, Overhead: 402.38 Number of threads: 512, Total time 3523015939200, Overhead: 544.04 =20 The above data tell us the numa spinlock improve performance upto 17X with = 512 threads on arm platform. And numa spinlock should improve spinlock performance on all multi-socket s= ystems. Thanks Ling =EF=BB=BF=E5=9C=A8 2019/1/4 =E4=B8=8A=E5=8D=883:59=EF=BC=8C=E2=80=9CH.J. Lu= =E2=80=9D =E5=86=99=E5=85=A5: On Thu, Jan 3, 2019 at 6:52 AM Szabolcs Nagy wr= ote: > > On 03/01/2019 05:35, =E9=A9=AC=E5=87=8C(=E5=BD=A6=E5=86=9B) wrote: > > create mode 100644 manual/examples/numa-spinlock.c > > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock-priva= te.h > > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.c > > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.h > > create mode 100644 sysdeps/unix/sysv/linux/numa_spinlock_alloc= .c > > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-numa-variab= le-overhead.c > > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-ov= erhead-skeleton.c > > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-ov= erhead.c > > as far as i can tell the new code is generic > (other than the presence of efficient getcpu), > so i think the test should be generic too. > > > --- /dev/null > > +++ b/sysdeps/unix/sysv/linux/x86/tst-variable-overhead-skeleto= n.c > > @@ -0,0 +1,384 @@ > ... > > +/* Check spinlock overhead with large number threads. Critica= l region is > > + very smmall. Critical region + spinlock overhead aren't no= ticeable > > + when number of threads is small. When thread number increa= ses, > > + spinlock overhead become the bottleneck. It shows up in wa= ll time > > + of thread execution. */ > > yeah, this is not easy to do in a generic way, i think > even on x86 such measurement is problematic, you don't > know what goes on a system (or vm) when the glibc test > is running. > > but doing precise timing is not that important for > checking the correctness of the locks, so i think a > simplified version can be generic test code. =20=20=20=20 Here is the updated patch to make tests generic. =20=20=20=20 --=20 H.J. =20=20=20=20