public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug math/27461] New: Unixbench/whetstone-double performance regression on glibc2.32
@ 2021-02-24  4:57 xuchunmei at linux dot alibaba.com
  2021-02-24  4:58 ` [Bug math/27461] " xuchunmei at linux dot alibaba.com
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-24  4:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27461

            Bug ID: 27461
           Summary: Unixbench/whetstone-double performance regression on
                    glibc2.32
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: math
          Assignee: unassigned at sourceware dot org
          Reporter: xuchunmei at linux dot alibaba.com
  Target Milestone: ---

I am comparing unixbench performance on glibc2.28 and glibc2.32 on centos8,
found that whetstone-double will get performance regression.
I found the discuss
https://sourceware.org/legacy-ml/libc-alpha/2019-03/msg00395.html and related
commit
https://patchwork.ozlabs.org/project/glibc/patch/VI1PR0801MB212753501D9DA1AA00BC7BA583E20@VI1PR0801MB2127.eurprd08.prod.outlook.com/

Since glibc2.31, math-finite.h is removed and __log_finite is just alias. And
since glibc2.29, exp and log are optimized.
But the test result of unixbench/whetstone-double seems performance regression
compared with optimized before when -ffast-math is added.

Testcase is unixbench, and command is “./Run whetstone-double -c 8 -i 1”,
before test on different glibc, recompile whetstone-double.

Glibc2.28 test result:
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    41535.6 MWIPS (9.6 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      41535.6   7551.9
                                                                   ========
System Benchmarks Index Score (Partial Only)                         7551.9

glibc2.32 test result:
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    37152.0 MWIPS (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      37152.0   6754.9
                                                                   ========
System Benchmarks Index Score (Partial Only)                         6754.9

Perf record data:
Glibc2.32:
+   49.12%  whetstone-doubl  whetstone-double   [.] whetstones.constprop.1
+   10.57%  whetstone-doubl  libm-2.32.so       [.] __atan_fma
+   10.26%  whetstone-doubl  libm-2.32.so       [.] __ieee754_log_fma
+    8.76%  whetstone-doubl  libm-2.32.so       [.] __cos_fma
+    8.22%  whetstone-doubl  libm-2.32.so       [.] __ieee754_exp_fma
+    7.65%  whetstone-doubl  libm-2.32.so       [.] __sincos
+    3.52%  whetstone-doubl  libm-2.32.so       [.] log@@GLIBC_2.29
+    1.42%  whetstone-doubl  libm-2.32.so       [.] exp@@GLIBC_2.29
     0.14%  whetstone-doubl  whetstone-double   [.] log@plt
     0.11%  whetstone-doubl  whetstone-double   [.] cos@plt
     0.05%  whetstone-doubl  whetstone-double   [.] sincos@plt
     0.01%  whetstone-doubl  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore

glibc2.28:
+   53.86%  whetstone-doubl  whetstone-double    [.] whetstones.constprop.1
+   11.51%  whetstone-doubl  libm-2.28.so        [.] __ieee754_log_fma
+   11.16%  whetstone-doubl  libm-2.28.so        [.] __atan_fma
+    9.19%  whetstone-doubl  libm-2.28.so        [.] __cos_fma
+    8.75%  whetstone-doubl  libm-2.28.so        [.] __sincos
+    5.00%  whetstone-doubl  libm-2.28.so        [.] __ieee754_exp_fma
     0.17%  whetstone-doubl  whetstone-double    [.] __log_finite@plt
     0.11%  whetstone-doubl  whetstone-double    [.] cos@plt
     0.06%  whetstone-doubl  whetstone-double    [.] sincos@plt
     0.01%  whetstone-doubl  [kernel.kallsyms]   [k]
_raw_spin_unlock_irqrestore


When compile whetstone-double, -O3 -ffast-math is added default, and I found
that on glibc2.28, fast-math can speed up whetstone performance.
Without fast-math, the data is slow as glibc2.32. But on glibc2.32, fast-math
has no effect.

Without -fast-math on glibc2.28: 
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    37312.4 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      37312.4   6784.1
                                                                   ========
System Benchmarks Index Score (Partial Only)                         6784.1

Also I test bench-log and bench-exp on glibc2.28 and glibc2.32, bench-log and
bench-exp result show that glibc2.32 is faster than glibc2.28 on log and exp.

glibc2.28:
# ./bench-log
  "log": {
   "": {
    "duration": 2.51381e+09,
    "iterations": 3.212e+07,
    "max": 780.74,
    "min": 74.124,
    "mean": 78.2631
   }
  }
# ./bench-exp
  "exp": {
   "": {
    "duration": 2.50406e+09,
    "iterations": 2.57e+07,
    "max": 166.716,
    "min": 81.516,
    "mean": 97.4344
   },
   "144bits": {
    "duration": 2.50084e+09,
    "iterations": 2.7023e+07,
    "max": 2649.79,
    "min": 88.826,
    "mean": 92.5448
   },
   "768bits": {
    "duration": 2.49121e+09,
    "iterations": 2.714e+07,
    "max": 139.302,
    "min": 88.924,
    "mean": 91.7911
   }
  }

Glibc2.32:
# ./bench-log
  "log": {
   "": {
    "duration": 2.5072e+09,
    "iterations": 3.5332e+07,
    "max": 112.336,
    "min": 69.486,
    "mean": 70.9611
   }
  }
# ./bench-exp
  "exp": {
   "": {
    "duration": 2.50791e+09,
    "iterations": 2.9812e+07,
    "max": 2759.88,
    "min": 76.894,
    "mean": 84.1241
   },
   "144bits": {
    "duration": 2.4979e+09,
    "iterations": 2.8182e+07,
    "max": 479.622,
    "min": 77.824,
    "mean": 88.6346
   },
   "768bits": {
    "duration": 2.49041e+09,
    "iterations": 2.9995e+07,
    "max": 450.432,
    "min": 77.01,
    "mean": 83.0275
   }
  }


Cpuinfo:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           2
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping:            4
CPU MHz:             2499.442
BogoMIPS:            4998.88
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            33792K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm
constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16
pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb stibp
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-02  4:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24  4:57 [Bug math/27461] New: Unixbench/whetstone-double performance regression on glibc2.32 xuchunmei at linux dot alibaba.com
2021-02-24  4:58 ` [Bug math/27461] " xuchunmei at linux dot alibaba.com
2021-02-24  4:59 ` xuchunmei at linux dot alibaba.com
2021-02-24  5:36 ` siddhesh at sourceware dot org
2021-02-24  9:45 ` nsz at gcc dot gnu.org
2021-02-24 10:49 ` wdijkstr at arm dot com
2021-02-25 11:04 ` xuchunmei at linux dot alibaba.com
2021-03-01 12:06 ` xuchunmei at linux dot alibaba.com
2021-03-02  4:59 ` xuchunmei at linux dot alibaba.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).