From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CEE1238708B5; Wed, 24 Feb 2021 04:57:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CEE1238708B5 From: "xuchunmei at linux dot alibaba.com" To: glibc-bugs@sourceware.org Subject: [Bug math/27461] New: Unixbench/whetstone-double performance regression on glibc2.32 Date: Wed, 24 Feb 2021 04:57:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: math X-Bugzilla-Version: 2.32 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: xuchunmei at linux dot alibaba.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 04:57:25 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D27461 Bug ID: 27461 Summary: Unixbench/whetstone-double performance regression on glibc2.32 Product: glibc Version: 2.32 Status: UNCONFIRMED Severity: normal Priority: P2 Component: math Assignee: unassigned at sourceware dot org Reporter: xuchunmei at linux dot alibaba.com Target Milestone: --- I am comparing unixbench performance on glibc2.28 and glibc2.32 on centos8, found that whetstone-double will get performance regression. I found the discuss https://sourceware.org/legacy-ml/libc-alpha/2019-03/msg00395.html and relat= ed commit https://patchwork.ozlabs.org/project/glibc/patch/VI1PR0801MB212753501D9DA1A= A00BC7BA583E20@VI1PR0801MB2127.eurprd08.prod.outlook.com/ Since glibc2.31, math-finite.h is removed and __log_finite is just alias. A= nd since glibc2.29, exp and log are optimized. But the test result of unixbench/whetstone-double seems performance regress= ion compared with optimized before when -ffast-math is added. Testcase is unixbench, and command is =E2=80=9C./Run whetstone-double -c 8 = -i 1=E2=80=9D, before test on different glibc, recompile whetstone-double. Glibc2.28 test result: 8 CPUs in system; running 8 parallel copies of tests Double-Precision Whetstone 41535.6 MWIPS (9.6 s, 1 sampl= es) System Benchmarks Partial Index BASELINE RESULT INDEX Double-Precision Whetstone 55.0 41535.6 7551.9 =3D=3D= =3D=3D=3D=3D=3D=3D System Benchmarks Index Score (Partial Only) 7551.9 glibc2.32 test result: 8 CPUs in system; running 8 parallel copies of tests Double-Precision Whetstone 37152.0 MWIPS (10.0 s, 1 samp= les) System Benchmarks Partial Index BASELINE RESULT INDEX Double-Precision Whetstone 55.0 37152.0 6754.9 =3D=3D= =3D=3D=3D=3D=3D=3D System Benchmarks Index Score (Partial Only) 6754.9 Perf record data: Glibc2.32: + 49.12% whetstone-doubl whetstone-double [.] whetstones.constprop.1 + 10.57% whetstone-doubl libm-2.32.so [.] __atan_fma + 10.26% whetstone-doubl libm-2.32.so [.] __ieee754_log_fma + 8.76% whetstone-doubl libm-2.32.so [.] __cos_fma + 8.22% whetstone-doubl libm-2.32.so [.] __ieee754_exp_fma + 7.65% whetstone-doubl libm-2.32.so [.] __sincos + 3.52% whetstone-doubl libm-2.32.so [.] log@@GLIBC_2.29 + 1.42% whetstone-doubl libm-2.32.so [.] exp@@GLIBC_2.29 0.14% whetstone-doubl whetstone-double [.] log@plt 0.11% whetstone-doubl whetstone-double [.] cos@plt 0.05% whetstone-doubl whetstone-double [.] sincos@plt 0.01% whetstone-doubl [kernel.kallsyms] [k] _raw_spin_unlock_irqres= tore glibc2.28: + 53.86% whetstone-doubl whetstone-double [.] whetstones.constprop.1 + 11.51% whetstone-doubl libm-2.28.so [.] __ieee754_log_fma + 11.16% whetstone-doubl libm-2.28.so [.] __atan_fma + 9.19% whetstone-doubl libm-2.28.so [.] __cos_fma + 8.75% whetstone-doubl libm-2.28.so [.] __sincos + 5.00% whetstone-doubl libm-2.28.so [.] __ieee754_exp_fma 0.17% whetstone-doubl whetstone-double [.] __log_finite@plt 0.11% whetstone-doubl whetstone-double [.] cos@plt 0.06% whetstone-doubl whetstone-double [.] sincos@plt 0.01% whetstone-doubl [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore When compile whetstone-double, -O3 -ffast-math is added default, and I found that on glibc2.28, fast-math can speed up whetstone performance. Without fast-math, the data is slow as glibc2.32. But on glibc2.32, fast-ma= th has no effect. Without -fast-math on glibc2.28:=20 8 CPUs in system; running 8 parallel copies of tests Double-Precision Whetstone 37312.4 MWIPS (9.9 s, 1 sampl= es) System Benchmarks Partial Index BASELINE RESULT INDEX Double-Precision Whetstone 55.0 37312.4 6784.1 =3D=3D= =3D=3D=3D=3D=3D=3D System Benchmarks Index Score (Partial Only) 6784.1 Also I test bench-log and bench-exp on glibc2.28 and glibc2.32, bench-log a= nd bench-exp result show that glibc2.32 is faster than glibc2.28 on log and ex= p. glibc2.28: # ./bench-log "log": { "": { "duration": 2.51381e+09, "iterations": 3.212e+07, "max": 780.74, "min": 74.124, "mean": 78.2631 } } # ./bench-exp "exp": { "": { "duration": 2.50406e+09, "iterations": 2.57e+07, "max": 166.716, "min": 81.516, "mean": 97.4344 }, "144bits": { "duration": 2.50084e+09, "iterations": 2.7023e+07, "max": 2649.79, "min": 88.826, "mean": 92.5448 }, "768bits": { "duration": 2.49121e+09, "iterations": 2.714e+07, "max": 139.302, "min": 88.924, "mean": 91.7911 } } Glibc2.32: # ./bench-log "log": { "": { "duration": 2.5072e+09, "iterations": 3.5332e+07, "max": 112.336, "min": 69.486, "mean": 70.9611 } } # ./bench-exp "exp": { "": { "duration": 2.50791e+09, "iterations": 2.9812e+07, "max": 2759.88, "min": 76.894, "mean": 84.1241 }, "144bits": { "duration": 2.4979e+09, "iterations": 2.8182e+07, "max": 479.622, "min": 77.824, "mean": 88.6346 }, "768bits": { "duration": 2.49041e+09, "iterations": 2.9995e+07, "max": 450.432, "min": 77.01, "mean": 83.0275 } } Cpuinfo: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz Stepping: 4 CPU MHz: 2499.442 BogoMIPS: 4998.88 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m= ca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb st= ibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 --=20 You are receiving this mail because: You are on the CC list for the bug.=