From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50973 invoked by alias); 4 Aug 2017 13:21:47 -0000 Mailing-List: contact libc-stable-help@sourceware.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: List-Archive: Sender: libc-stable-owner@sourceware.org Received: (qmail 48862 invoked by uid 89); 4 Aug 2017 13:21:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy= X-Spam-Status: No, score=-24.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-oi0-f65.google.com Received: from mail-oi0-f65.google.com (HELO mail-oi0-f65.google.com) (209.85.218.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 04 Aug 2017 13:21:42 +0000 Received: by mail-oi0-f65.google.com with SMTP id e124so1496061oig.0; Fri, 04 Aug 2017 06:21:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=l/cxDYklolinKZdLWgMUQludeLQLA8ZyLaEZmOZYhGU=; b=M3fc2nKBMe30WooYislYtpyw6lB9oG3YU6LBNR6um9LMRt0s+8pzX7nyET0Vj6lFEa bnW4H/W/ZkPD1EwqYks11fUL4VQui9/XW+6pNiDhYPvteWOTsUU69xvD42TYXe3/nuVD VZpln5ne7aDQ+v6adcoCRyfvsnuMmKadvNZJ6kRaPNF+M2o0CKrGwsdNr1n23irhxjUL l59Y8r3OOLrfHsBPnhyliZ3ktGozN5N/TuXxnu6+lkxiZxI0GEFPcw0OaWkxV/rLlQgh FonG8LZbfGczirZbHQxveZgvq03GWuvQf0vogd+NnSCrsYf56v8oeTsVQLyMJeM+WKnY dkMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=l/cxDYklolinKZdLWgMUQludeLQLA8ZyLaEZmOZYhGU=; b=BUZgxVpbewclpfuXRlyqtdBomECB4K10yR+g7bjgSpe1ZcCj2SZe3Ngowgy9o2h9GI jb7g2wyOCB8CD++gaJfAguxm4hE+Iycug+XwRP71632JfCDDN2Gq0aZaUZ6DSZCICQ0/ KMiq0Sz+tRs8mK2rX3s0tX9ZkH3VbD6rIoxS08r9V+7uv/Ja5FL+FZNEQGtOf7ZEwOSt cPNtwjaA2BYgtToXUoI9Oj+2ukeC91Kc8SfOKmlITMIMwNt9bQhdTA85WiAgLlQrmM0C Md3GClmzwRsohKdCV1xLOoOySvzJdSkWVaJo24O/hk/Qx1mG5JvtwQq0U9c8kvqlPB03 zN8g== X-Gm-Message-State: AHYfb5g+naqJQ1zXzAdrMeebLMw9739sD6uijHzOZt7lLG0CNLIhVStJ 80JXFevZmEDWuo8QrY/t0CWdJdwv9lIQ X-Received: by 10.202.192.84 with SMTP id q81mr1989522oif.88.1501852900121; Fri, 04 Aug 2017 06:21:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.169.200 with HTTP; Fri, 4 Aug 2017 06:21:39 -0700 (PDT) In-Reply-To: <20170802151213.GA12009@gmail.com> References: <20170802151213.GA12009@gmail.com> From: "H.J. Lu" Date: Sun, 01 Jan 2017 00:00:00 -0000 Message-ID: Subject: Re: [PATCH] x86-64: Use _dl_runtime_resolve_opt only with AVX512F To: GNU C Library , Libc-stable Mailing List Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00006.txt.bz2 On Wed, Aug 2, 2017 at 8:12 AM, H.J. Lu wrote: > On AVX machines with XGETBV (ECX == 1) like Skylake processors, > > (gdb) disass _dl_runtime_resolve_avx_opt > Dump of assembler code for function _dl_runtime_resolve_avx_opt: > 0x0000000000015890 <+0>: push %rax > 0x0000000000015891 <+1>: push %rcx > 0x0000000000015892 <+2>: push %rdx > 0x0000000000015893 <+3>: mov $0x1,%ecx > 0x0000000000015898 <+8>: xgetbv > 0x000000000001589b <+11>: mov %eax,%r11d > 0x000000000001589e <+14>: pop %rdx > 0x000000000001589f <+15>: pop %rcx > 0x00000000000158a0 <+16>: pop %rax > 0x00000000000158a1 <+17>: and $0x4,%r11d > 0x00000000000158a5 <+21>: bnd je 0x16200 <_dl_runtime_resolve_sse_vex> > End of assembler dump. > > is slower than: > > (gdb) disass _dl_runtime_resolve_avx_slow > Dump of assembler code for function _dl_runtime_resolve_avx_slow: > 0x0000000000015850 <+0>: vorpd %ymm0,%ymm1,%ymm8 > 0x0000000000015854 <+4>: vorpd %ymm2,%ymm3,%ymm9 > 0x0000000000015858 <+8>: vorpd %ymm4,%ymm5,%ymm10 > 0x000000000001585c <+12>: vorpd %ymm6,%ymm7,%ymm11 > 0x0000000000015860 <+16>: vorpd %ymm8,%ymm9,%ymm9 > 0x0000000000015865 <+21>: vorpd %ymm10,%ymm11,%ymm10 > 0x000000000001586a <+26>: vpcmpeqd %xmm8,%xmm8,%xmm8 > 0x000000000001586f <+31>: vorpd %ymm9,%ymm10,%ymm10 > 0x0000000000015874 <+36>: vptest %ymm10,%ymm8 > 0x0000000000015879 <+41>: bnd jae 0x158b0 <_dl_runtime_resolve_avx> > 0x000000000001587c <+44>: vzeroupper > 0x000000000001587f <+47>: bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex> > End of assembler dump. > (gdb) > > since xgetbv takes much more cycles than single cycle operations like > vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with > AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake > server. > > Any comments or objections? > > H.J. > --- > [BZ #21871] > * sysdeps/x86/cpu-features.c (init_cpu_features): Set > bit_arch_Use_dl_runtime_resolve_opt only with AVX512F. > --- > sysdeps/x86/cpu-features.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 1d087ea732..6f900840d4 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -244,10 +244,13 @@ init_cpu_features (struct cpu_features *cpu_features) > |= bit_arch_Prefer_No_AVX512; > > /* To avoid SSE transition penalty, use _dl_runtime_resolve_slow. > - If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt. */ > + If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt. > + Use _dl_runtime_resolve_opt only with AVX512F since it is > + slower than _dl_runtime_resolve_slow with AVX. */ > cpu_features->feature[index_arch_Use_dl_runtime_resolve_slow] > |= bit_arch_Use_dl_runtime_resolve_slow; > - if (cpu_features->max_cpuid >= 0xd) > + if (CPU_FEATURES_ARCH_P (cpu_features, AVX512F_Usable) > + && cpu_features->max_cpuid >= 0xd) > { > unsigned int eax; > > -- > 2.13.3 > I am checking it in today and will backport it to 2.26/2.25/2.24 branches next week. -- H.J.