* [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI @ 2021-12-06 3:23 H.J. Lu 2021-12-07 7:47 ` Noah Goldstein 0 siblings, 1 reply; 12+ messages in thread From: H.J. Lu @ 2021-12-06 3:23 UTC (permalink / raw) To: libc-alpha Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used. --- sysdeps/x86/cpu-features.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index be2498b2e7..311ade1f26 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features) |= bit_arch_Prefer_No_VZEROUPPER; else { - cpu_features->preferred[index_arch_Prefer_No_AVX512] - |= bit_arch_Prefer_No_AVX512; + /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency + when ZMM load and store instructions are used. */ + if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI)) + cpu_features->preferred[index_arch_Prefer_No_AVX512] + |= bit_arch_Prefer_No_AVX512; /* Avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. */ -- 2.33.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-06 3:23 [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI H.J. Lu @ 2021-12-07 7:47 ` Noah Goldstein 2021-12-07 12:53 ` H.J. Lu 0 siblings, 1 reply; 12+ messages in thread From: Noah Goldstein @ 2021-12-07 7:47 UTC (permalink / raw) To: H.J. Lu; +Cc: GNU C Library On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha <libc-alpha@sourceware.org> wrote: > > Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since > they won't lower CPU frequency when ZMM load and store instructions are > used. > --- > sysdeps/x86/cpu-features.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index be2498b2e7..311ade1f26 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features) > |= bit_arch_Prefer_No_VZEROUPPER; > else > { > - cpu_features->preferred[index_arch_Prefer_No_AVX512] > - |= bit_arch_Prefer_No_AVX512; > + /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency > + when ZMM load and store instructions are used. */ > + if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI)) > + cpu_features->preferred[index_arch_Prefer_No_AVX512] > + |= bit_arch_Prefer_No_AVX512; > > /* Avoid RTM abort triggered by VZEROUPPER inside a > transactionally executing RTM region. */ > -- > 2.33.1 > Should we also do Rocket Lake? According to Travis Downs at least downclocking is an issue there ether: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 7:47 ` Noah Goldstein @ 2021-12-07 12:53 ` H.J. Lu 2021-12-07 13:17 ` Arjan van de Ven 0 siblings, 1 reply; 12+ messages in thread From: H.J. Lu @ 2021-12-07 12:53 UTC (permalink / raw) To: Noah Goldstein, Thiago Macieira, Arjan van de Ven; +Cc: GNU C Library On Mon, Dec 6, 2021 at 11:47 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha > <libc-alpha@sourceware.org> wrote: > > > > Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since > > they won't lower CPU frequency when ZMM load and store instructions are > > used. > > --- > > sysdeps/x86/cpu-features.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index be2498b2e7..311ade1f26 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features) > > |= bit_arch_Prefer_No_VZEROUPPER; > > else > > { > > - cpu_features->preferred[index_arch_Prefer_No_AVX512] > > - |= bit_arch_Prefer_No_AVX512; > > + /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency > > + when ZMM load and store instructions are used. */ > > + if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI)) > > + cpu_features->preferred[index_arch_Prefer_No_AVX512] > > + |= bit_arch_Prefer_No_AVX512; > > > > /* Avoid RTM abort triggered by VZEROUPPER inside a > > transactionally executing RTM region. */ > > -- > > 2.33.1 > > > > Should we also do Rocket Lake? > According to Travis Downs at least downclocking is an issue there ether: > https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake Thiago, Arjan, Is this true that Rocket Lake can use ZMM load/store? -- H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 12:53 ` H.J. Lu @ 2021-12-07 13:17 ` Arjan van de Ven 2021-12-07 13:34 ` H.J. Lu 0 siblings, 1 reply; 12+ messages in thread From: Arjan van de Ven @ 2021-12-07 13:17 UTC (permalink / raw) To: H.J. Lu, Noah Goldstein, Thiago Macieira; +Cc: GNU C Library On 12/7/2021 4:53 AM, H.J. Lu wrote: >> Should we also do Rocket Lake? >> According to Travis Downs at least downclocking is an issue there ether: >> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake > > Thiago, Arjan, > > Is this true that Rocket Lake can use ZMM load/store? > I have no specific data myself about rocket lake... but data is data... so I'm all for trying it, but other than looking at cpuid's model number I wouldn't know of an easy way to detect RKL vs ICL or others ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 13:17 ` Arjan van de Ven @ 2021-12-07 13:34 ` H.J. Lu 2021-12-07 14:05 ` Florian Weimer 0 siblings, 1 reply; 12+ messages in thread From: H.J. Lu @ 2021-12-07 13:34 UTC (permalink / raw) To: Arjan van de Ven, Hongyu Wang, liuhongt Cc: Noah Goldstein, Thiago Macieira, GNU C Library On Tue, Dec 7, 2021 at 5:18 AM Arjan van de Ven <arjan@linux.intel.com> wrote: > > On 12/7/2021 4:53 AM, H.J. Lu wrote: > >> Should we also do Rocket Lake? > >> According to Travis Downs at least downclocking is an issue there ether: > >> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake > > > > Thiago, Arjan, > > > > Is this true that Rocket Lake can use ZMM load/store? > > > > > I have no specific data myself about rocket lake... but data is data... > so I'm all for trying it, but other than looking at cpuid's model number Hongtao, Hongyu, can you find a Rocket Lake to test? > I wouldn't know of an easy way to detect RKL vs ICL or others In GCC, RKL ISAs are ICL ISAs without SGX. -- H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 13:34 ` H.J. Lu @ 2021-12-07 14:05 ` Florian Weimer 2021-12-07 14:15 ` H.J. Lu 0 siblings, 1 reply; 12+ messages in thread From: Florian Weimer @ 2021-12-07 14:05 UTC (permalink / raw) To: H.J. Lu via Libc-alpha Cc: Arjan van de Ven, Hongyu Wang, liuhongt, H.J. Lu, Thiago Macieira * H. J. Lu via Libc-alpha: > Hongtao, Hongyu, can you find a Rocket Lake to test? I've found a lab machine with an i7-11700 CPU. Is there something I could test for you? (This could be non-production silicon, though.) Thanks, Florian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 14:05 ` Florian Weimer @ 2021-12-07 14:15 ` H.J. Lu 2021-12-07 15:47 ` Florian Weimer 0 siblings, 1 reply; 12+ messages in thread From: H.J. Lu @ 2021-12-07 14:15 UTC (permalink / raw) To: Florian Weimer Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, Hongyu Wang, liuhongt, Thiago Macieira On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu via Libc-alpha: > > > Hongtao, Hongyu, can you find a Rocket Lake to test? > > I've found a lab machine with an i7-11700 CPU. Is there something I > could test for you? You can enable AVX512 in glibc with: $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512 While bootstrapping GCC with -j8, track CPU frequency with turbostat. If there is no CPU frequency drop and build time is less comparing against without GLIBC_TUNABLES, we can enable AVX512. > (This could be non-production silicon, though.) > The frequency behavior of non-production silicon can be different. -- H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 14:15 ` H.J. Lu @ 2021-12-07 15:47 ` Florian Weimer 2021-12-07 15:52 ` H.J. Lu 0 siblings, 1 reply; 12+ messages in thread From: Florian Weimer @ 2021-12-07 15:47 UTC (permalink / raw) To: H.J. Lu via Libc-alpha Cc: H.J. Lu, Arjan van de Ven, liuhongt, Thiago Macieira, Hongyu Wang * H. J. Lu via Libc-alpha: > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * H. J. Lu via Libc-alpha: >> >> > Hongtao, Hongyu, can you find a Rocket Lake to test? >> >> I've found a lab machine with an i7-11700 CPU. Is there something I >> could test for you? > > You can enable AVX512 in glibc with: > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512 > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If > there is no CPU frequency drop and build time is less comparing against > without GLIBC_TUNABLES, we can enable AVX512. > >> (This could be non-production silicon, though.) >> > > The frequency behavior of non-production silicon can be different. With that caveat, it seems that frequencies drop further with GLIBC_TUNABLES set as above, and the build is also a little bit slower (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a little bit warmer for the second run). Would it make sense to run more extensive tests, or should we wait for someone with production silicon to show up? Thanks, Florian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 15:47 ` Florian Weimer @ 2021-12-07 15:52 ` H.J. Lu 2021-12-07 16:22 ` Thiago Macieira 2021-12-07 19:32 ` Noah Goldstein 0 siblings, 2 replies; 12+ messages in thread From: H.J. Lu @ 2021-12-07 15:52 UTC (permalink / raw) To: Florian Weimer Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, liuhongt, Thiago Macieira, Hongyu Wang On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu via Libc-alpha: > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * H. J. Lu via Libc-alpha: > >> > >> > Hongtao, Hongyu, can you find a Rocket Lake to test? > >> > >> I've found a lab machine with an i7-11700 CPU. Is there something I > >> could test for you? > > > > You can enable AVX512 in glibc with: > > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512 > > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If > > there is no CPU frequency drop and build time is less comparing against > > without GLIBC_TUNABLES, we can enable AVX512. > > > >> (This could be non-production silicon, though.) > >> > > > > The frequency behavior of non-production silicon can be different. > > With that caveat, it seems that frequencies drop further with > GLIBC_TUNABLES set as above, and the build is also a little bit slower > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a > little bit warmer for the second run). > > Would it make sense to run more extensive tests, or should we wait for > someone with production silicon to show up? GCC is a heavy user of memcpy/memset, which is a good proxy of ZMM load/store impact on CPU frequency. We need to run the same test on a production Rocket Lake. -- H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 15:52 ` H.J. Lu @ 2021-12-07 16:22 ` Thiago Macieira 2021-12-07 19:32 ` Noah Goldstein 1 sibling, 0 replies; 12+ messages in thread From: Thiago Macieira @ 2021-12-07 16:22 UTC (permalink / raw) To: Florian Weimer, H.J. Lu Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, liuhongt, Hongyu Wang On Tuesday, 7 December 2021 07:52:44 PST H.J. Lu wrote: > > Would it make sense to run more extensive tests, or should we wait for > > someone with production silicon to show up? > > GCC is a heavy user of memcpy/memset, which is a good proxy of > ZMM load/store impact on CPU frequency. We need to run the same > test on a production Rocket Lake. Can someone run the same test on an Ice Lake? That will also answer whether we should enable the same thing for ICL / ICX. RKL is a Cypress Cove, so I'd expect it to have the same performance numbers as ICL's Sunny Cove. The data I have says that, in theory, we should not see a frequency drop for 512-bit memcpy / memset on ICL or TGL, but I haven't got experimental data confirming that. And I can't really run the benchmark test on a laptop with very poor thermal dissipation (freq drops to 1500 MHz all on its own). If a good ICL has the drop, then I'd assume RKL will too. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 15:52 ` H.J. Lu 2021-12-07 16:22 ` Thiago Macieira @ 2021-12-07 19:32 ` Noah Goldstein 2022-04-23 1:51 ` Sunil Pandey 1 sibling, 1 reply; 12+ messages in thread From: Noah Goldstein @ 2021-12-07 19:32 UTC (permalink / raw) To: H.J. Lu Cc: Florian Weimer, Arjan van de Ven, liuhongt, Hongyu Wang, H.J. Lu via Libc-alpha, Thiago Macieira On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * H. J. Lu via Libc-alpha: > > > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote: > > >> > > >> * H. J. Lu via Libc-alpha: > > >> > > >> > Hongtao, Hongyu, can you find a Rocket Lake to test? > > >> > > >> I've found a lab machine with an i7-11700 CPU. Is there something I > > >> could test for you? > > > > > > You can enable AVX512 in glibc with: > > > > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512 > > > > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If > > > there is no CPU frequency drop and build time is less comparing against > > > without GLIBC_TUNABLES, we can enable AVX512. > > > > > >> (This could be non-production silicon, though.) > > >> > > > > > > The frequency behavior of non-production silicon can be different. > > > > With that caveat, it seems that frequencies drop further with > > GLIBC_TUNABLES set as above, and the build is also a little bit slower > > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a > > little bit warmer for the second run). > > > > Would it make sense to run more extensive tests, or should we wait for > > someone with production silicon to show up? > > GCC is a heavy user of memcpy/memset, which is a good proxy of > ZMM load/store impact on CPU frequency. We need to run the same > test on a production Rocket Lake. I would think a microbenchmark would be better for determining if rocketlake actually has throttling. Testing the full j8 GCC build will add a bunch of frequency "noise" due to thermal throttling. > > -- > H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2021-12-07 19:32 ` Noah Goldstein @ 2022-04-23 1:51 ` Sunil Pandey 0 siblings, 0 replies; 12+ messages in thread From: Sunil Pandey @ 2022-04-23 1:51 UTC (permalink / raw) To: Noah Goldstein, libc-stable Cc: H.J. Lu, Florian Weimer, H.J. Lu via Libc-alpha, Hongyu Wang, Thiago Macieira, liuhongt, Arjan van de Ven On Tue, Dec 7, 2021 at 11:33 AM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha > <libc-alpha@sourceware.org> wrote: > > > > On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > * H. J. Lu via Libc-alpha: > > > > > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote: > > > >> > > > >> * H. J. Lu via Libc-alpha: > > > >> > > > >> > Hongtao, Hongyu, can you find a Rocket Lake to test? > > > >> > > > >> I've found a lab machine with an i7-11700 CPU. Is there something I > > > >> could test for you? > > > > > > > > You can enable AVX512 in glibc with: > > > > > > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512 > > > > > > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If > > > > there is no CPU frequency drop and build time is less comparing against > > > > without GLIBC_TUNABLES, we can enable AVX512. > > > > > > > >> (This could be non-production silicon, though.) > > > >> > > > > > > > > The frequency behavior of non-production silicon can be different. > > > > > > With that caveat, it seems that frequencies drop further with > > > GLIBC_TUNABLES set as above, and the build is also a little bit slower > > > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a > > > little bit warmer for the second run). > > > > > > Would it make sense to run more extensive tests, or should we wait for > > > someone with production silicon to show up? > > > > GCC is a heavy user of memcpy/memset, which is a good proxy of > > ZMM load/store impact on CPU frequency. We need to run the same > > test on a production Rocket Lake. > > I would think a microbenchmark would be better for determining if > rocketlake actually has throttling. > > Testing the full j8 GCC build will add a bunch of frequency "noise" > due to thermal throttling. > > > > > -- > > H.J. I would like to backport this patch to release branches. Any comments or objections? --Sunil ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-23 1:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-06 3:23 [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI H.J. Lu 2021-12-07 7:47 ` Noah Goldstein 2021-12-07 12:53 ` H.J. Lu 2021-12-07 13:17 ` Arjan van de Ven 2021-12-07 13:34 ` H.J. Lu 2021-12-07 14:05 ` Florian Weimer 2021-12-07 14:15 ` H.J. Lu 2021-12-07 15:47 ` Florian Weimer 2021-12-07 15:52 ` H.J. Lu 2021-12-07 16:22 ` Thiago Macieira 2021-12-07 19:32 ` Noah Goldstein 2022-04-23 1:51 ` Sunil Pandey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).