public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Intel microcode update and glibc HLE
@ 2014-09-26 18:09 Siddhesh Poyarekar
  2014-09-26 18:19 ` Adhemerval Zanella
  2014-09-26 18:28 ` Josh Boyer
  0 siblings, 2 replies; 12+ messages in thread
From: Siddhesh Poyarekar @ 2014-09-26 18:09 UTC (permalink / raw)
  To: libc-alpha; +Cc: carlos

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

Hi,

The microcode_ctl package recently updated itself to include the
latest intel microcode update that disables HLE.  This is resulting in
unbootable Haswell systems on Fedora[1] and elsewhere.  The problem
seems to be that the microcode update is applied a bit late during
every boot.  Due to this, the kernel has stale CPU capabilities and
systemd sees HLE enabled before the microcode update is applied.
Later, HLE is disabled and the next pthread operation in systemd dies
with a SIGILL.

The ideal solution for this would probably be to apply the microcode
update as early as possible during boot, but things get complicated
with suspend-resume or hibernate, so it's not very simple.  In Fedora
we're now rebuilding glibc with elision disabled again since keeping
it enabled is useless and is currently harmful.

Siddhesh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1146967

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-26 18:09 Intel microcode update and glibc HLE Siddhesh Poyarekar
@ 2014-09-26 18:19 ` Adhemerval Zanella
  2014-09-26 18:57   ` Siddhesh Poyarekar
  2014-09-26 18:28 ` Josh Boyer
  1 sibling, 1 reply; 12+ messages in thread
From: Adhemerval Zanella @ 2014-09-26 18:19 UTC (permalink / raw)
  To: libc-alpha

On 26-09-2014 15:09, Siddhesh Poyarekar wrote:
> Hi,
>
> The microcode_ctl package recently updated itself to include the
> latest intel microcode update that disables HLE.  This is resulting in
> unbootable Haswell systems on Fedora[1] and elsewhere.  The problem
> seems to be that the microcode update is applied a bit late during
> every boot.  Due to this, the kernel has stale CPU capabilities and
> systemd sees HLE enabled before the microcode update is applied.
> Later, HLE is disabled and the next pthread operation in systemd dies
> with a SIGILL.
>
> The ideal solution for this would probably be to apply the microcode
> update as early as possible during boot, but things get complicated
> with suspend-resume or hibernate, so it's not very simple.  In Fedora
> we're now rebuilding glibc with elision disabled again since keeping
> it enabled is useless and is currently harmful.
>
> Siddhesh
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1146967

I am about to ping about my patches to add support of TLE on POWER8, and
with this recent issue with HLE on Intel lead to an idea of creating
a patch to enabled TLE conditionally using an environment variable.  The
idea is to build GLIBC with TLE enabled, add the environment variable
and let architecture work based on its value.

I know this maybe not the best solution in long term if we aim to enable
TLE as default (we will need to either deprecate the env. var or change its
semantics), but my idea is to let developers a way to evaluate either if
TLE is safe and if it indeed gives the workload a performance boost.

Comments?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-26 18:09 Intel microcode update and glibc HLE Siddhesh Poyarekar
  2014-09-26 18:19 ` Adhemerval Zanella
@ 2014-09-26 18:28 ` Josh Boyer
  2014-09-26 18:54   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 12+ messages in thread
From: Josh Boyer @ 2014-09-26 18:28 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha, Carlos O'Donell

On Fri, Sep 26, 2014 at 2:09 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> Hi,
>
> The microcode_ctl package recently updated itself to include the
> latest intel microcode update that disables HLE.  This is resulting in
> unbootable Haswell systems on Fedora[1] and elsewhere.  The problem
> seems to be that the microcode update is applied a bit late during
> every boot.  Due to this, the kernel has stale CPU capabilities and
> systemd sees HLE enabled before the microcode update is applied.
> Later, HLE is disabled and the next pthread operation in systemd dies
> with a SIGILL.

I was under the impression that the glibc implementation was looking
at the cpuid registers directly for the HLE/RTM support.  If it was
looking at cpuflags then the kernel could have probably hidden this
with a quirk.

> The ideal solution for this would probably be to apply the microcode
> update as early as possible during boot, but things get complicated
> with suspend-resume or hibernate, so it's not very simple.  In Fedora
> we're now rebuilding glibc with elision disabled again since keeping
> it enabled is useless and is currently harmful.

I'll be updating the Fedora kernel and dracut to do early microcode
loading today.  Kyle McMartin looked at the suspend/hibernate-resume
paths and things should work fine with early microcode loading.

Disabling this in glibc on x86 seems fine since nothing will be
capable of using it until newer CPUs are released, but it wasn't
strictly necessary.

josh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-26 18:28 ` Josh Boyer
@ 2014-09-26 18:54   ` Siddhesh Poyarekar
  2014-09-27  5:10     ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: Siddhesh Poyarekar @ 2014-09-26 18:54 UTC (permalink / raw)
  To: Josh Boyer; +Cc: libc-alpha, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

On Fri, Sep 26, 2014 at 02:28:13PM -0400, Josh Boyer wrote:
> I was under the impression that the glibc implementation was looking
> at the cpuid registers directly for the HLE/RTM support.  If it was
> looking at cpuflags then the kernel could have probably hidden this
> with a quirk.

It looks at the cpuid registers directly.

> I'll be updating the Fedora kernel and dracut to do early microcode
> loading today.  Kyle McMartin looked at the suspend/hibernate-resume
> paths and things should work fine with early microcode loading.

That's great.

> Disabling this in glibc on x86 seems fine since nothing will be
> capable of using it until newer CPUs are released, but it wasn't
> strictly necessary.

I wasn't aware of the plan to do the early microcode loading, which is
why we went with disabling HLE.  It's really just a flip of a
configure flag and rebuild, so we can even revert it and test once the
kernel bits are in.

Siddhesh

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-26 18:19 ` Adhemerval Zanella
@ 2014-09-26 18:57   ` Siddhesh Poyarekar
  0 siblings, 0 replies; 12+ messages in thread
From: Siddhesh Poyarekar @ 2014-09-26 18:57 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

On Fri, Sep 26, 2014 at 03:19:34PM -0300, Adhemerval Zanella wrote:
> I am about to ping about my patches to add support of TLE on POWER8, and
> with this recent issue with HLE on Intel lead to an idea of creating
> a patch to enabled TLE conditionally using an environment variable.  The
> idea is to build GLIBC with TLE enabled, add the environment variable
> and let architecture work based on its value.
> 
> I know this maybe not the best solution in long term if we aim to enable
> TLE as default (we will need to either deprecate the env. var or change its
> semantics), but my idea is to let developers a way to evaluate either if
> TLE is safe and if it indeed gives the workload a performance boost.

We've thrown around the idea of using tunables (yes, that abstract
vapourware that we've mentioned before and never written code for.
some day...) for things like elision, so your idea matches with that
in principle.

Siddhesh

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-26 18:54   ` Siddhesh Poyarekar
@ 2014-09-27  5:10     ` Carlos O'Donell
  2014-09-27  5:11       ` Carlos O'Donell
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Carlos O'Donell @ 2014-09-27  5:10 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella
  Cc: libc-alpha, Kyle McMartin

On 09/26/2014 02:54 PM, Siddhesh Poyarekar wrote:
>> I'll be updating the Fedora kernel and dracut to do early microcode
>> loading today.  Kyle McMartin looked at the suspend/hibernate-resume
>> paths and things should work fine with early microcode loading.
> 
> That's great.

I agree, that is great news.
 
>> Disabling this in glibc on x86 seems fine since nothing will be
>> capable of using it until newer CPUs are released, but it wasn't
>> strictly necessary.
> 
> I wasn't aware of the plan to do the early microcode loading, which is
> why we went with disabling HLE.  It's really just a flip of a
> configure flag and rebuild, so we can even revert it and test once the
> kernel bits are in.

It turns out to be more than just a configure flag.

Andi's recent rwlock changes use TSX unconditionally.

I've raised this issue with Andi to see if we can put it all under the
same configure flag.

Similarly for ppc64 and s390 I think I'll make the flag do this:

--enable-lock-elision=yes (enable for all machines)
--enable-lock-elisoin=x86_64,ppc64,s390x (enable for these machines)
--enable-lock-elision-no (disable for all machines)

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  5:10     ` Carlos O'Donell
@ 2014-09-27  5:11       ` Carlos O'Donell
  2014-09-27  5:14       ` Siddhesh Poyarekar
  2014-09-27  7:37       ` Andreas Schwab
  2 siblings, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2014-09-27  5:11 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella
  Cc: libc-alpha, Kyle McMartin

On 09/27/2014 01:10 AM, Carlos O'Donell wrote:
> Andi's recent rwlock changes use TSX unconditionally.

To be clear I meant "use TSX unconditionally if present."

c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  5:10     ` Carlos O'Donell
  2014-09-27  5:11       ` Carlos O'Donell
@ 2014-09-27  5:14       ` Siddhesh Poyarekar
  2014-09-27  5:26         ` Roland McGrath
  2014-09-27  5:39         ` Carlos O'Donell
  2014-09-27  7:37       ` Andreas Schwab
  2 siblings, 2 replies; 12+ messages in thread
From: Siddhesh Poyarekar @ 2014-09-27  5:14 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella, libc-alpha,
	Kyle McMartin

On 27 September 2014 10:40, Carlos O'Donell <carlos@redhat.com> wrote:
> It turns out to be more than just a configure flag.
>
> Andi's recent rwlock changes use TSX unconditionally.

I would reckon that it's wrong to do so. Not using
--enable-lock-elision should disable all elision code.

> I've raised this issue with Andi to see if we can put it all under the
> same configure flag.
>
> Similarly for ppc64 and s390 I think I'll make the flag do this:
>
> --enable-lock-elision=yes (enable for all machines)
> --enable-lock-elisoin=x86_64,ppc64,s390x (enable for these machines)

Why do you need this?  Wouldn't it be sufficient to do this in
distribution spec files?

Siddhesh
-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  5:14       ` Siddhesh Poyarekar
@ 2014-09-27  5:26         ` Roland McGrath
  2014-09-27  5:39         ` Carlos O'Donell
  1 sibling, 0 replies; 12+ messages in thread
From: Roland McGrath @ 2014-09-27  5:26 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Carlos O'Donell, Siddhesh Poyarekar, Josh Boyer,
	Adhemerval Zanella, libc-alpha, Kyle McMartin

> On 27 September 2014 10:40, Carlos O'Donell <carlos@redhat.com> wrote:
> > It turns out to be more than just a configure flag.
> >
> > Andi's recent rwlock changes use TSX unconditionally.
> 
> I would reckon that it's wrong to do so. Not using
> --enable-lock-elision should disable all elision code.

Emphatically agreed.

> > I've raised this issue with Andi to see if we can put it all under the
> > same configure flag.
> >
> > Similarly for ppc64 and s390 I think I'll make the flag do this:
> >
> > --enable-lock-elision=yes (enable for all machines)
> > --enable-lock-elisoin=x86_64,ppc64,s390x (enable for these machines)
> 
> Why do you need this?  Wouldn't it be sufficient to do this in
> distribution spec files?

I agree.  There's no precedent for that sort of value for enable options.
It's just saving people conditionally choosing that configure arguments
they actually want.  They have scripting of various sorts for that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  5:14       ` Siddhesh Poyarekar
  2014-09-27  5:26         ` Roland McGrath
@ 2014-09-27  5:39         ` Carlos O'Donell
  1 sibling, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2014-09-27  5:39 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella, libc-alpha,
	Kyle McMartin

On 09/27/2014 01:14 AM, Siddhesh Poyarekar wrote:
> On 27 September 2014 10:40, Carlos O'Donell <carlos@redhat.com> wrote:
>> It turns out to be more than just a configure flag.
>>
>> Andi's recent rwlock changes use TSX unconditionally.
> 
> I would reckon that it's wrong to do so. Not using
> --enable-lock-elision should disable all elision code.

I agree.

>> Similarly for ppc64 and s390 I think I'll make the flag do this:
>>
>> --enable-lock-elision=yes (enable for all machines)
>> --enable-lock-elisoin=x86_64,ppc64,s390x (enable for these machines)
> 
> Why do you need this?  Wouldn't it be sufficient to do this in
> distribution spec files?

We don't need it. We can indeed push it into the distribution
to handle there. However, the remaining problem is that upstream
will want a sensible default per-machine. I guess we can look at
refactoring this such that the machines can influence the default.

For example as the Intel elision code is tested more it might
default to being enabled by default sooner, while the ppc64 and s390x
would not.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  5:10     ` Carlos O'Donell
  2014-09-27  5:11       ` Carlos O'Donell
  2014-09-27  5:14       ` Siddhesh Poyarekar
@ 2014-09-27  7:37       ` Andreas Schwab
  2014-09-27 16:26         ` Carlos O'Donell
  2 siblings, 1 reply; 12+ messages in thread
From: Andreas Schwab @ 2014-09-27  7:37 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella, libc-alpha,
	Kyle McMartin

"Carlos O'Donell" <carlos@redhat.com> writes:

> --enable-lock-elision-no (disable for all machines)

--disable-lock-elision

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Intel microcode update and glibc HLE
  2014-09-27  7:37       ` Andreas Schwab
@ 2014-09-27 16:26         ` Carlos O'Donell
  0 siblings, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2014-09-27 16:26 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Siddhesh Poyarekar, Josh Boyer, Adhemerval Zanella, libc-alpha,
	Kyle McMartin

On 09/27/2014 03:37 AM, Andreas Schwab wrote:
> "Carlos O'Donell" <carlos@redhat.com> writes:
> 
>> --enable-lock-elision-no (disable for all machines)
> 
> --disable-lock-elision

Sure :-)

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-27 16:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-26 18:09 Intel microcode update and glibc HLE Siddhesh Poyarekar
2014-09-26 18:19 ` Adhemerval Zanella
2014-09-26 18:57   ` Siddhesh Poyarekar
2014-09-26 18:28 ` Josh Boyer
2014-09-26 18:54   ` Siddhesh Poyarekar
2014-09-27  5:10     ` Carlos O'Donell
2014-09-27  5:11       ` Carlos O'Donell
2014-09-27  5:14       ` Siddhesh Poyarekar
2014-09-27  5:26         ` Roland McGrath
2014-09-27  5:39         ` Carlos O'Donell
2014-09-27  7:37       ` Andreas Schwab
2014-09-27 16:26         ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).