From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id CEE1238708B5; Wed, 24 Feb 2021 04:57:25 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CEE1238708B5
From: "xuchunmei at linux dot alibaba.com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug math/27461] New: Unixbench/whetstone-double performance
 regression on glibc2.32
Date: Wed, 24 Feb 2021 04:57:25 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: math
X-Bugzilla-Version: 2.32
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: xuchunmei at linux dot alibaba.com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-27461-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: glibc-bugs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Glibc-bugs mailing list <glibc-bugs.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/glibc-bugs/>
List-Help: <mailto:glibc-bugs-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Feb 2021 04:57:25 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D27461

            Bug ID: 27461
           Summary: Unixbench/whetstone-double performance regression on
                    glibc2.32
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: math
          Assignee: unassigned at sourceware dot org
          Reporter: xuchunmei at linux dot alibaba.com
  Target Milestone: ---

I am comparing unixbench performance on glibc2.28 and glibc2.32 on centos8,
found that whetstone-double will get performance regression.
I found the discuss
https://sourceware.org/legacy-ml/libc-alpha/2019-03/msg00395.html and relat=
ed
commit
https://patchwork.ozlabs.org/project/glibc/patch/VI1PR0801MB212753501D9DA1A=
A00BC7BA583E20@VI1PR0801MB2127.eurprd08.prod.outlook.com/

Since glibc2.31, math-finite.h is removed and __log_finite is just alias. A=
nd
since glibc2.29, exp and log are optimized.
But the test result of unixbench/whetstone-double seems performance regress=
ion
compared with optimized before when -ffast-math is added.

Testcase is unixbench, and command is =E2=80=9C./Run whetstone-double -c 8 =
-i 1=E2=80=9D,
before test on different glibc, recompile whetstone-double.

Glibc2.28 test result:
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    41535.6 MWIPS (9.6 s, 1 sampl=
es)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      41535.6   7551.9
                                                                   =3D=3D=
=3D=3D=3D=3D=3D=3D
System Benchmarks Index Score (Partial Only)                         7551.9

glibc2.32 test result:
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    37152.0 MWIPS (10.0 s, 1 samp=
les)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      37152.0   6754.9
                                                                   =3D=3D=
=3D=3D=3D=3D=3D=3D
System Benchmarks Index Score (Partial Only)                         6754.9

Perf record data:
Glibc2.32:
+   49.12%  whetstone-doubl  whetstone-double   [.] whetstones.constprop.1
+   10.57%  whetstone-doubl  libm-2.32.so       [.] __atan_fma
+   10.26%  whetstone-doubl  libm-2.32.so       [.] __ieee754_log_fma
+    8.76%  whetstone-doubl  libm-2.32.so       [.] __cos_fma
+    8.22%  whetstone-doubl  libm-2.32.so       [.] __ieee754_exp_fma
+    7.65%  whetstone-doubl  libm-2.32.so       [.] __sincos
+    3.52%  whetstone-doubl  libm-2.32.so       [.] log@@GLIBC_2.29
+    1.42%  whetstone-doubl  libm-2.32.so       [.] exp@@GLIBC_2.29
     0.14%  whetstone-doubl  whetstone-double   [.] log@plt
     0.11%  whetstone-doubl  whetstone-double   [.] cos@plt
     0.05%  whetstone-doubl  whetstone-double   [.] sincos@plt
     0.01%  whetstone-doubl  [kernel.kallsyms]  [k] _raw_spin_unlock_irqres=
tore

glibc2.28:
+   53.86%  whetstone-doubl  whetstone-double    [.] whetstones.constprop.1
+   11.51%  whetstone-doubl  libm-2.28.so        [.] __ieee754_log_fma
+   11.16%  whetstone-doubl  libm-2.28.so        [.] __atan_fma
+    9.19%  whetstone-doubl  libm-2.28.so        [.] __cos_fma
+    8.75%  whetstone-doubl  libm-2.28.so        [.] __sincos
+    5.00%  whetstone-doubl  libm-2.28.so        [.] __ieee754_exp_fma
     0.17%  whetstone-doubl  whetstone-double    [.] __log_finite@plt
     0.11%  whetstone-doubl  whetstone-double    [.] cos@plt
     0.06%  whetstone-doubl  whetstone-double    [.] sincos@plt
     0.01%  whetstone-doubl  [kernel.kallsyms]   [k]
_raw_spin_unlock_irqrestore


When compile whetstone-double, -O3 -ffast-math is added default, and I found
that on glibc2.28, fast-math can speed up whetstone performance.
Without fast-math, the data is slow as glibc2.32. But on glibc2.32, fast-ma=
th
has no effect.

Without -fast-math on glibc2.28:=20
8 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    37312.4 MWIPS (9.9 s, 1 sampl=
es)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      37312.4   6784.1
                                                                   =3D=3D=
=3D=3D=3D=3D=3D=3D
System Benchmarks Index Score (Partial Only)                         6784.1

Also I test bench-log and bench-exp on glibc2.28 and glibc2.32, bench-log a=
nd
bench-exp result show that glibc2.32 is faster than glibc2.28 on log and ex=
p.

glibc2.28:
# ./bench-log
  "log": {
   "": {
    "duration": 2.51381e+09,
    "iterations": 3.212e+07,
    "max": 780.74,
    "min": 74.124,
    "mean": 78.2631
   }
  }
# ./bench-exp
  "exp": {
   "": {
    "duration": 2.50406e+09,
    "iterations": 2.57e+07,
    "max": 166.716,
    "min": 81.516,
    "mean": 97.4344
   },
   "144bits": {
    "duration": 2.50084e+09,
    "iterations": 2.7023e+07,
    "max": 2649.79,
    "min": 88.826,
    "mean": 92.5448
   },
   "768bits": {
    "duration": 2.49121e+09,
    "iterations": 2.714e+07,
    "max": 139.302,
    "min": 88.924,
    "mean": 91.7911
   }
  }

Glibc2.32:
# ./bench-log
  "log": {
   "": {
    "duration": 2.5072e+09,
    "iterations": 3.5332e+07,
    "max": 112.336,
    "min": 69.486,
    "mean": 70.9611
   }
  }
# ./bench-exp
  "exp": {
   "": {
    "duration": 2.50791e+09,
    "iterations": 2.9812e+07,
    "max": 2759.88,
    "min": 76.894,
    "mean": 84.1241
   },
   "144bits": {
    "duration": 2.4979e+09,
    "iterations": 2.8182e+07,
    "max": 479.622,
    "min": 77.824,
    "mean": 88.6346
   },
   "768bits": {
    "duration": 2.49041e+09,
    "iterations": 2.9995e+07,
    "max": 450.432,
    "min": 77.01,
    "mean": 83.0275
   }
  }


Cpuinfo:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           2
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping:            4
CPU MHz:             2499.442
BogoMIPS:            4998.88
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            33792K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m=
ca
cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm
constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16
pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb st=
ibp
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1

--=20
You are receiving this mail because:
You are on the CC list for the bug.=