public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
@ 2021-12-06  3:23 H.J. Lu
  2021-12-07  7:47 ` Noah Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2021-12-06  3:23 UTC (permalink / raw)
  To: libc-alpha

Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
---
 sysdeps/x86/cpu-features.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index be2498b2e7..311ade1f26 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
 	  |= bit_arch_Prefer_No_VZEROUPPER;
       else
 	{
-	  cpu_features->preferred[index_arch_Prefer_No_AVX512]
-	    |= bit_arch_Prefer_No_AVX512;
+	  /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
+	     when ZMM load and store instructions are used.  */
+	  if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
+	    cpu_features->preferred[index_arch_Prefer_No_AVX512]
+	      |= bit_arch_Prefer_No_AVX512;
 
 	  /* Avoid RTM abort triggered by VZEROUPPER inside a
 	     transactionally executing RTM region.  */
-- 
2.33.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-06  3:23 [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI H.J. Lu
@ 2021-12-07  7:47 ` Noah Goldstein
  2021-12-07 12:53   ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Noah Goldstein @ 2021-12-07  7:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> they won't lower CPU frequency when ZMM load and store instructions are
> used.
> ---
>  sysdeps/x86/cpu-features.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index be2498b2e7..311ade1f26 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
>           |= bit_arch_Prefer_No_VZEROUPPER;
>        else
>         {
> -         cpu_features->preferred[index_arch_Prefer_No_AVX512]
> -           |= bit_arch_Prefer_No_AVX512;
> +         /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> +            when ZMM load and store instructions are used.  */
> +         if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> +           cpu_features->preferred[index_arch_Prefer_No_AVX512]
> +             |= bit_arch_Prefer_No_AVX512;
>
>           /* Avoid RTM abort triggered by VZEROUPPER inside a
>              transactionally executing RTM region.  */
> --
> 2.33.1
>

Should we also do Rocket Lake?
According to Travis Downs at least downclocking is an issue there ether:
https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07  7:47 ` Noah Goldstein
@ 2021-12-07 12:53   ` H.J. Lu
  2021-12-07 13:17     ` Arjan van de Ven
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2021-12-07 12:53 UTC (permalink / raw)
  To: Noah Goldstein, Thiago Macieira, Arjan van de Ven; +Cc: GNU C Library

On Mon, Dec 6, 2021 at 11:47 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> > they won't lower CPU frequency when ZMM load and store instructions are
> > used.
> > ---
> >  sysdeps/x86/cpu-features.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index be2498b2e7..311ade1f26 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
> >           |= bit_arch_Prefer_No_VZEROUPPER;
> >        else
> >         {
> > -         cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > -           |= bit_arch_Prefer_No_AVX512;
> > +         /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> > +            when ZMM load and store instructions are used.  */
> > +         if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> > +           cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > +             |= bit_arch_Prefer_No_AVX512;
> >
> >           /* Avoid RTM abort triggered by VZEROUPPER inside a
> >              transactionally executing RTM region.  */
> > --
> > 2.33.1
> >
>
> Should we also do Rocket Lake?
> According to Travis Downs at least downclocking is an issue there ether:
> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake

Thiago, Arjan,

Is this true that Rocket Lake can use ZMM load/store?

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 12:53   ` H.J. Lu
@ 2021-12-07 13:17     ` Arjan van de Ven
  2021-12-07 13:34       ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Arjan van de Ven @ 2021-12-07 13:17 UTC (permalink / raw)
  To: H.J. Lu, Noah Goldstein, Thiago Macieira; +Cc: GNU C Library

On 12/7/2021 4:53 AM, H.J. Lu wrote:
>> Should we also do Rocket Lake?
>> According to Travis Downs at least downclocking is an issue there ether:
>> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
> 
> Thiago, Arjan,
> 
> Is this true that Rocket Lake can use ZMM load/store?
> 


I have no specific data myself about rocket lake... but data is data...
so I'm all for trying it, but other than looking at cpuid's model number
I wouldn't know of an easy way to detect RKL vs ICL or others

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 13:17     ` Arjan van de Ven
@ 2021-12-07 13:34       ` H.J. Lu
  2021-12-07 14:05         ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2021-12-07 13:34 UTC (permalink / raw)
  To: Arjan van de Ven, Hongyu Wang, liuhongt
  Cc: Noah Goldstein, Thiago Macieira, GNU C Library

On Tue, Dec 7, 2021 at 5:18 AM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> On 12/7/2021 4:53 AM, H.J. Lu wrote:
> >> Should we also do Rocket Lake?
> >> According to Travis Downs at least downclocking is an issue there ether:
> >> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
> >
> > Thiago, Arjan,
> >
> > Is this true that Rocket Lake can use ZMM load/store?
> >
>
>
> I have no specific data myself about rocket lake... but data is data...
> so I'm all for trying it, but other than looking at cpuid's model number

Hongtao, Hongyu,  can you find a Rocket Lake to test?

> I wouldn't know of an easy way to detect RKL vs ICL or others

In GCC, RKL ISAs are ICL ISAs without SGX.


-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 13:34       ` H.J. Lu
@ 2021-12-07 14:05         ` Florian Weimer
  2021-12-07 14:15           ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2021-12-07 14:05 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha
  Cc: Arjan van de Ven, Hongyu Wang, liuhongt, H.J. Lu, Thiago Macieira

* H. J. Lu via Libc-alpha:

> Hongtao, Hongyu,  can you find a Rocket Lake to test?

I've found a lab machine with an i7-11700 CPU.  Is there something I
could test for you?

(This could be non-production silicon, though.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 14:05         ` Florian Weimer
@ 2021-12-07 14:15           ` H.J. Lu
  2021-12-07 15:47             ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2021-12-07 14:15 UTC (permalink / raw)
  To: Florian Weimer
  Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, Hongyu Wang, liuhongt,
	Thiago Macieira

On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
>
> I've found a lab machine with an i7-11700 CPU.  Is there something I
> could test for you?

You can enable AVX512 in glibc with:

$ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512

While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
there is no CPU frequency drop and build time is less comparing against
without GLIBC_TUNABLES, we can enable AVX512.

> (This could be non-production silicon, though.)
>

The frequency behavior of non-production silicon can be different.

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 14:15           ` H.J. Lu
@ 2021-12-07 15:47             ` Florian Weimer
  2021-12-07 15:52               ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2021-12-07 15:47 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha
  Cc: H.J. Lu, Arjan van de Ven, liuhongt, Thiago Macieira, Hongyu Wang

* H. J. Lu via Libc-alpha:

> On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
>>
>> I've found a lab machine with an i7-11700 CPU.  Is there something I
>> could test for you?
>
> You can enable AVX512 in glibc with:
>
> $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
>
> While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> there is no CPU frequency drop and build time is less comparing against
> without GLIBC_TUNABLES, we can enable AVX512.
>
>> (This could be non-production silicon, though.)
>>
>
> The frequency behavior of non-production silicon can be different.

With that caveat, it seems that frequencies drop further with
GLIBC_TUNABLES set as above, and the build is also a little bit slower
(5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
little bit warmer for the second run).

Would it make sense to run more extensive tests, or should we wait for
someone with production silicon to show up?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 15:47             ` Florian Weimer
@ 2021-12-07 15:52               ` H.J. Lu
  2021-12-07 16:22                 ` Thiago Macieira
  2021-12-07 19:32                 ` Noah Goldstein
  0 siblings, 2 replies; 12+ messages in thread
From: H.J. Lu @ 2021-12-07 15:52 UTC (permalink / raw)
  To: Florian Weimer
  Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, liuhongt,
	Thiago Macieira, Hongyu Wang

On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> >>
> >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> >> could test for you?
> >
> > You can enable AVX512 in glibc with:
> >
> > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> >
> > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > there is no CPU frequency drop and build time is less comparing against
> > without GLIBC_TUNABLES, we can enable AVX512.
> >
> >> (This could be non-production silicon, though.)
> >>
> >
> > The frequency behavior of non-production silicon can be different.
>
> With that caveat, it seems that frequencies drop further with
> GLIBC_TUNABLES set as above, and the build is also a little bit slower
> (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> little bit warmer for the second run).
>
> Would it make sense to run more extensive tests, or should we wait for
> someone with production silicon to show up?

GCC is a heavy user of memcpy/memset, which is a good proxy of
ZMM load/store impact on CPU frequency.   We need to run the same
test on a production Rocket Lake.

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 15:52               ` H.J. Lu
@ 2021-12-07 16:22                 ` Thiago Macieira
  2021-12-07 19:32                 ` Noah Goldstein
  1 sibling, 0 replies; 12+ messages in thread
From: Thiago Macieira @ 2021-12-07 16:22 UTC (permalink / raw)
  To: Florian Weimer, H.J. Lu
  Cc: H.J. Lu via Libc-alpha, Arjan van de Ven, liuhongt, Hongyu Wang

On Tuesday, 7 December 2021 07:52:44 PST H.J. Lu wrote:
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
> 
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency.   We need to run the same
> test on a production Rocket Lake.

Can someone run the same test on an Ice Lake? That will also answer whether we 
should enable the same thing for ICL / ICX.

RKL is a Cypress Cove, so I'd expect it to have the same performance numbers 
as ICL's Sunny Cove. The data I have says that, in theory, we should not see a 
frequency drop for 512-bit memcpy / memset on ICL or TGL, but I haven't got 
experimental data confirming that. And I can't really run the benchmark test 
on a laptop with very poor thermal dissipation (freq drops to 1500 MHz all on 
its own).

If a good ICL has the drop, then I'd assume RKL will too.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 15:52               ` H.J. Lu
  2021-12-07 16:22                 ` Thiago Macieira
@ 2021-12-07 19:32                 ` Noah Goldstein
  2022-04-23  1:51                   ` Sunil Pandey
  1 sibling, 1 reply; 12+ messages in thread
From: Noah Goldstein @ 2021-12-07 19:32 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Florian Weimer, Arjan van de Ven, liuhongt, Hongyu Wang,
	H.J. Lu via Libc-alpha, Thiago Macieira

On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu via Libc-alpha:
> >
> > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> * H. J. Lu via Libc-alpha:
> > >>
> > >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> > >>
> > >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> > >> could test for you?
> > >
> > > You can enable AVX512 in glibc with:
> > >
> > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > >
> > > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > > there is no CPU frequency drop and build time is less comparing against
> > > without GLIBC_TUNABLES, we can enable AVX512.
> > >
> > >> (This could be non-production silicon, though.)
> > >>
> > >
> > > The frequency behavior of non-production silicon can be different.
> >
> > With that caveat, it seems that frequencies drop further with
> > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > little bit warmer for the second run).
> >
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
>
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency.   We need to run the same
> test on a production Rocket Lake.

I would think a microbenchmark would be better for determining if
rocketlake actually has throttling.

Testing the full j8 GCC build will add a bunch of frequency "noise"
due to thermal throttling.

>
> --
> H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
  2021-12-07 19:32                 ` Noah Goldstein
@ 2022-04-23  1:51                   ` Sunil Pandey
  0 siblings, 0 replies; 12+ messages in thread
From: Sunil Pandey @ 2022-04-23  1:51 UTC (permalink / raw)
  To: Noah Goldstein, libc-stable
  Cc: H.J. Lu, Florian Weimer, H.J. Lu via Libc-alpha, Hongyu Wang,
	Thiago Macieira, liuhongt, Arjan van de Ven

On Tue, Dec 7, 2021 at 11:33 AM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu via Libc-alpha:
> > >
> > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >>
> > > >> * H. J. Lu via Libc-alpha:
> > > >>
> > > >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> > > >>
> > > >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> > > >> could test for you?
> > > >
> > > > You can enable AVX512 in glibc with:
> > > >
> > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > > >
> > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > > > there is no CPU frequency drop and build time is less comparing against
> > > > without GLIBC_TUNABLES, we can enable AVX512.
> > > >
> > > >> (This could be non-production silicon, though.)
> > > >>
> > > >
> > > > The frequency behavior of non-production silicon can be different.
> > >
> > > With that caveat, it seems that frequencies drop further with
> > > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > > little bit warmer for the second run).
> > >
> > > Would it make sense to run more extensive tests, or should we wait for
> > > someone with production silicon to show up?
> >
> > GCC is a heavy user of memcpy/memset, which is a good proxy of
> > ZMM load/store impact on CPU frequency.   We need to run the same
> > test on a production Rocket Lake.
>
> I would think a microbenchmark would be better for determining if
> rocketlake actually has throttling.
>
> Testing the full j8 GCC build will add a bunch of frequency "noise"
> due to thermal throttling.
>
> >
> > --
> > H.J.

I would like to backport this patch to release branches.
Any comments or objections?

--Sunil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-23  1:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06  3:23 [PATCH] x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI H.J. Lu
2021-12-07  7:47 ` Noah Goldstein
2021-12-07 12:53   ` H.J. Lu
2021-12-07 13:17     ` Arjan van de Ven
2021-12-07 13:34       ` H.J. Lu
2021-12-07 14:05         ` Florian Weimer
2021-12-07 14:15           ` H.J. Lu
2021-12-07 15:47             ` Florian Weimer
2021-12-07 15:52               ` H.J. Lu
2021-12-07 16:22                 ` Thiago Macieira
2021-12-07 19:32                 ` Noah Goldstein
2022-04-23  1:51                   ` Sunil Pandey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).