* tunables vs osxsave vs checkpointing vs _dl_runtime_resolve @ 2021-03-18 17:18 DJ Delorie 2021-03-18 17:29 ` H.J. Lu ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: DJ Delorie @ 2021-03-18 17:18 UTC (permalink / raw) To: libc-alpha In response to this customer bug... https://bugzilla.redhat.com/show_bug.cgi?id=1937515 I spent some time digging into this code, and was able to reproduce it using criu (checkpoint/restore in userspace). In a nutshell: if you create a task on a machine WITH xsave (or xsavec), and migrate it (somehow) to a machine WITHOUT xsave (or xsavec), any further DSO calls will fail because we've already chosen an xsave/xsavec resolver. This, of course, is guaranteed to fail, and cannot be fixed. With criu I had to override the checks with a command line option just to prove my point. However, if you *know* you might do this, there should be a way to use tunables to avoid xsave/xsavec - with the usual caveats about "YMMV" - so that a process could be migrated across such CPUs without fault. Our tunables almost provide this. IMHO the tunables logic should interact with cpu features like this: 1. Read the CPU features available 2. Apply tunables 3. Make secondary decisions based on the results The code that handles xsave breaks these rules; the save size for extended registers is computed before tunables are applied, so disabling xsave, xsavec, or osxsave in tunables has no affect. *After* tunables, we use the save size to determine if xsave/xsavec are enabled! It looks like just moving the save_size logic in update_usable() (cpu_features.c) to after the tunables check in init_cpu_features() should "solve" this problem, allowing tunables to determine if xsave/xsavec are used in dl_runtime_resolve. However, the code is complex and hairy and there's a good chance for gotchas in there. Comments? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:18 tunables vs osxsave vs checkpointing vs _dl_runtime_resolve DJ Delorie @ 2021-03-18 17:29 ` H.J. Lu 2021-03-18 17:45 ` DJ Delorie 2021-03-18 17:32 ` Florian Weimer 2021-03-19 16:43 ` Szabolcs Nagy 2 siblings, 1 reply; 13+ messages in thread From: H.J. Lu @ 2021-03-18 17:29 UTC (permalink / raw) To: DJ Delorie; +Cc: GNU C Library On Thu, Mar 18, 2021 at 10:18 AM DJ Delorie via Libc-alpha <libc-alpha@sourceware.org> wrote: > > > In response to this customer bug... > > https://bugzilla.redhat.com/show_bug.cgi?id=1937515 > > I spent some time digging into this code, and was able to reproduce it > using criu (checkpoint/restore in userspace). In a nutshell: if you > create a task on a machine WITH xsave (or xsavec), and migrate it > (somehow) to a machine WITHOUT xsave (or xsavec), any further DSO > calls will fail because we've already chosen an xsave/xsavec resolver. > > This, of course, is guaranteed to fail, and cannot be fixed. With > criu I had to override the checks with a command line option just to > prove my point. > > However, if you *know* you might do this, there should be a way to use > tunables to avoid xsave/xsavec - with the usual caveats about "YMMV" - > so that a process could be migrated across such CPUs without fault. > > Our tunables almost provide this. > > IMHO the tunables logic should interact with cpu features like this: > > 1. Read the CPU features available > 2. Apply tunables > 3. Make secondary decisions based on the results > > The code that handles xsave breaks these rules; the save size for > extended registers is computed before tunables are applied, so > disabling xsave, xsavec, or osxsave in tunables has no affect. > *After* tunables, we use the save size to determine if xsave/xsavec > are enabled! > > It looks like just moving the save_size logic in update_usable() > (cpu_features.c) to after the tunables check in init_cpu_features() > should "solve" this problem, allowing tunables to determine if > xsave/xsavec are used in dl_runtime_resolve. However, the code is > complex and hairy and there's a good chance for gotchas in there. > > Comments? > Please open a glibc bug and CC me. -- H.J. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:29 ` H.J. Lu @ 2021-03-18 17:45 ` DJ Delorie 2021-03-18 20:39 ` H.J. Lu 0 siblings, 1 reply; 13+ messages in thread From: DJ Delorie @ 2021-03-18 17:45 UTC (permalink / raw) To: H.J. Lu; +Cc: libc-alpha "H.J. Lu" <hjl.tools@gmail.com> writes: > Please open a glibc bug and CC me. https://sourceware.org/bugzilla/show_bug.cgi?id=27605 Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:45 ` DJ Delorie @ 2021-03-18 20:39 ` H.J. Lu 2021-03-19 3:40 ` DJ Delorie 0 siblings, 1 reply; 13+ messages in thread From: H.J. Lu @ 2021-03-18 20:39 UTC (permalink / raw) To: DJ Delorie; +Cc: GNU C Library On Thu, Mar 18, 2021 at 10:45 AM DJ Delorie <dj@redhat.com> wrote: > > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > Please open a glibc bug and CC me. > > https://sourceware.org/bugzilla/show_bug.cgi?id=27605 > GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC can be used to control it: (gdb) r --direct Starting program: /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/elf/tst-x86_64-1 --direct [Inferior 1 (process 1499934) exited normally] (gdb) b _dl_runtime_resolve_xsave Breakpoint 1 at 0x7ffff7fe19d0: file ../sysdeps/x86_64/dl-trampoline.h, line 67. (gdb) b _dl_runtime_resolve_xsavec Breakpoint 2 at 0x7ffff7fe1aa0: file ../sysdeps/x86_64/dl-trampoline.h, line 67. (gdb) r Starting program: /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/elf/tst-x86_64-1 --direct Breakpoint 2, _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:67 67 _CET_ENDBR (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/elf/tst-x86_64-1 --direct Breakpoint 1, _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:67 67 _CET_ENDBR (gdb) -- H.J. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 20:39 ` H.J. Lu @ 2021-03-19 3:40 ` DJ Delorie 2021-03-19 4:16 ` H.J. Lu 0 siblings, 1 reply; 13+ messages in thread From: DJ Delorie @ 2021-03-19 3:40 UTC (permalink / raw) To: H.J. Lu; +Cc: libc-alpha But... (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-OSXSAVE (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/dj/criu-test Breakpoint 2, _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:67 67 _CET_ENDBR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-19 3:40 ` DJ Delorie @ 2021-03-19 4:16 ` H.J. Lu 2021-03-19 4:35 ` DJ Delorie 0 siblings, 1 reply; 13+ messages in thread From: H.J. Lu @ 2021-03-19 4:16 UTC (permalink / raw) To: DJ Delorie; +Cc: GNU C Library [-- Attachment #1: Type: text/plain, Size: 424 bytes --] On Thu, Mar 18, 2021 at 8:40 PM DJ Delorie <dj@redhat.com> wrote: > > > But... > > (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-OSXSAVE > (gdb) r > The program being debugged has been started already. > Start it from the beginning? (y or n) y > Starting program: /home/dj/criu-test > > Breakpoint 2, _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:67 > 67 _CET_ENDBR > Try this. -- H.J. [-- Attachment #2: cpu-features.diff --] [-- Type: text/x-patch, Size: 2447 bytes --] diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index d7248cbb45..6262b03cd4 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -40,7 +40,7 @@ extern void TUNABLE_CALLBACK (set_x86_shstk) (tunable_val_t *) #endif static void -update_usable (struct cpu_features *cpu_features) +copy_usable (struct cpu_features *cpu_features) { /* Copy the cpuid bits to usable bits for CPU featuress whose usability in user space can be detected without additonal OS support. */ @@ -98,9 +98,13 @@ update_usable (struct cpu_features *cpu_features) CPU_FEATURE_SET_USABLE (cpu_features, FSRS); CPU_FEATURE_SET_USABLE (cpu_features, FSRCS); CPU_FEATURE_SET_USABLE (cpu_features, PTWRITE); +} +static void +update_usable (struct cpu_features *cpu_features) +{ /* Can we call xgetbv? */ - if (CPU_FEATURES_CPU_P (cpu_features, OSXSAVE)) + if (CPU_FEATURE_USABLE_P (cpu_features, OSXSAVE)) { unsigned int xcrlow; unsigned int xcrhigh; @@ -420,8 +424,6 @@ init_cpu_features (struct cpu_features *cpu_features) get_extended_indices (cpu_features); - update_usable (cpu_features); - if (family == 0x06) { model += extended_model; @@ -547,8 +549,6 @@ init_cpu_features (struct cpu_features *cpu_features) get_extended_indices (cpu_features); - update_usable (cpu_features); - ecx = cpu_features->features[CPUID_INDEX_1].cpuid.ecx; if (CPU_FEATURE_USABLE_P (cpu_features, AVX)) @@ -586,8 +586,6 @@ init_cpu_features (struct cpu_features *cpu_features) get_extended_indices (cpu_features); - update_usable (cpu_features); - model += extended_model; if (family == 0x6) { @@ -630,7 +628,6 @@ init_cpu_features (struct cpu_features *cpu_features) { kind = arch_kind_other; get_common_indices (cpu_features, NULL, NULL, NULL, NULL); - update_usable (cpu_features); } /* Support i586 if CX8 is available. */ @@ -652,6 +649,8 @@ no_cpuid: dl_init_cacheinfo (cpu_features); + copy_usable (cpu_features); + #if HAVE_TUNABLES TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps)); #elif defined SHARED @@ -661,6 +660,8 @@ no_cpuid: GLRO(dl_hwcap_mask) = HWCAP_IMPORTANT; #endif + update_usable (cpu_features); + #ifdef __x86_64__ GLRO(dl_hwcap) = HWCAP_X86_64; if (cpu_features->basic.kind == arch_kind_intel) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-19 4:16 ` H.J. Lu @ 2021-03-19 4:35 ` DJ Delorie 0 siblings, 0 replies; 13+ messages in thread From: DJ Delorie @ 2021-03-19 4:35 UTC (permalink / raw) To: H.J. Lu; +Cc: libc-alpha "H.J. Lu" <hjl.tools@gmail.com> writes: > Try this. close... fixed osxsave, broke xsavec (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-OSXSAVE (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: ... /home/dj/criu-test --direct Breakpoint 4, _dl_runtime_resolve_fxsave () at ../sysdeps/x86_64/dl-trampoline.h:72 72 pushq %rbx # push subtracts stack by 8. (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: ... /home/dj/criu-test --direct Breakpoint 3, _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:72 72 pushq %rbx # push subtracts stack by 8. (gdb) set env GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: ... /home/dj/criu-test --direct Breakpoint 3, _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:72 72 pushq %rbx # push subtracts stack by 8. (gdb) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:18 tunables vs osxsave vs checkpointing vs _dl_runtime_resolve DJ Delorie 2021-03-18 17:29 ` H.J. Lu @ 2021-03-18 17:32 ` Florian Weimer 2021-03-18 17:47 ` DJ Delorie 2021-03-19 16:43 ` Szabolcs Nagy 2 siblings, 1 reply; 13+ messages in thread From: Florian Weimer @ 2021-03-18 17:32 UTC (permalink / raw) To: DJ Delorie via Libc-alpha * DJ Delorie via Libc-alpha: > In response to this customer bug... > > https://bugzilla.redhat.com/show_bug.cgi?id=1937515 > > I spent some time digging into this code, and was able to reproduce it > using criu (checkpoint/restore in userspace). In a nutshell: if you > create a task on a machine WITH xsave (or xsavec), and migrate it > (somehow) to a machine WITHOUT xsave (or xsavec), any further DSO > calls will fail because we've already chosen an xsave/xsavec resolver. What happens if you migrate to a machine that has XSAVE/XSAVEC, but requires more storage? I expect that this is the case that people actually care about these days. 8-) Thanks, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:32 ` Florian Weimer @ 2021-03-18 17:47 ` DJ Delorie 2021-03-18 17:57 ` Florian Weimer 0 siblings, 1 reply; 13+ messages in thread From: DJ Delorie @ 2021-03-18 17:47 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha Florian Weimer <fweimer@redhat.com> writes: > What happens if you migrate to a machine that has XSAVE/XSAVEC, but > requires more storage? We'd need a new tunable to always use the maximum size, or some specified size, I suppose. But there are limits to the miracles we can perform, and "size" is not the root problem - the layout changes too! > I expect that this is the case that people actually care about these > days. 8-) In this case, they just want to disable xsave completely, so it's as portable as possible. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:47 ` DJ Delorie @ 2021-03-18 17:57 ` Florian Weimer 2021-03-18 18:19 ` DJ Delorie 0 siblings, 1 reply; 13+ messages in thread From: Florian Weimer @ 2021-03-18 17:57 UTC (permalink / raw) To: DJ Delorie; +Cc: libc-alpha * DJ Delorie: > Florian Weimer <fweimer@redhat.com> writes: >> What happens if you migrate to a machine that has XSAVE/XSAVEC, but >> requires more storage? > > We'd need a new tunable to always use the maximum size, or some > specified size, I suppose. But there are limits to the miracles we can > perform, and "size" is not the root problem - the layout changes too! The layout matters only if the process is captured at a very unlucky time. So hopefully they can just avoid that. I think the main issue with the size change is that it prevents restore consistently, even in the non-race case. For XSAVE, it would be possible to query the size each time using CPUID. >> I expect that this is the case that people actually care about these >> days. 8-) > > In this case, they just want to disable xsave completely, so it's as > portable as possible. But I don't think it actually is. Disabling XSAVE won't play nicely with the AVX2 and later calling conventions. Thanks, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:57 ` Florian Weimer @ 2021-03-18 18:19 ` DJ Delorie 0 siblings, 0 replies; 13+ messages in thread From: DJ Delorie @ 2021-03-18 18:19 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha Florian Weimer <fweimer@redhat.com> writes: > The layout matters only if the process is captured at a very unlucky > time. Unlucky? Ha! I *plan* on that kind of bad luck! Call sleep(10). Checkpoint during that call. Crash on return. >> In this case, they just want to disable xsave completely, so it's as >> portable as possible. > > But I don't think it actually is. Disabling XSAVE won't play nicely > with the AVX2 and later calling conventions. That's the YMMV caveat - they have to ensure that they're not using extended registers *at all*. I don't recommend it. It's bad practice. But we still have a tunable that's ignored, for people who want it anyway. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-18 17:18 tunables vs osxsave vs checkpointing vs _dl_runtime_resolve DJ Delorie 2021-03-18 17:29 ` H.J. Lu 2021-03-18 17:32 ` Florian Weimer @ 2021-03-19 16:43 ` Szabolcs Nagy 2021-03-19 21:12 ` DJ Delorie 2 siblings, 1 reply; 13+ messages in thread From: Szabolcs Nagy @ 2021-03-19 16:43 UTC (permalink / raw) To: DJ Delorie; +Cc: libc-alpha The 03/18/2021 13:18, DJ Delorie via Libc-alpha wrote: > > In response to this customer bug... > > https://bugzilla.redhat.com/show_bug.cgi?id=1937515 > > I spent some time digging into this code, and was able to reproduce it > using criu (checkpoint/restore in userspace). In a nutshell: if you > create a task on a machine WITH xsave (or xsavec), and migrate it > (somehow) to a machine WITHOUT xsave (or xsavec), any further DSO > calls will fail because we've already chosen an xsave/xsavec resolver. > > This, of course, is guaranteed to fail, and cannot be fixed. With > criu I had to override the checks with a command line option just to > prove my point. > > However, if you *know* you might do this, there should be a way to use > tunables to avoid xsave/xsavec - with the usual caveats about "YMMV" - > so that a process could be migrated across such CPUs without fault. > > Our tunables almost provide this. so are we supposed to handle migrations to machines with different arch extensions? cpu_features based decisions can break across different machines and there is no reliable (future proof) way to request baseline arch features. on aarch64 the closest is glibc.cpu.name=generic tunable but it only affects the cpuid (which fixes most libc ifunc selection logic, but e.g. not if the cpu has bti or not, there is glibc.cpu.hwcap_mask but not hacap2 mask so bti cannot be turned off in glibc if the hw has it) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tunables vs osxsave vs checkpointing vs _dl_runtime_resolve 2021-03-19 16:43 ` Szabolcs Nagy @ 2021-03-19 21:12 ` DJ Delorie 0 siblings, 0 replies; 13+ messages in thread From: DJ Delorie @ 2021-03-19 21:12 UTC (permalink / raw) To: Szabolcs Nagy; +Cc: libc-alpha Szabolcs Nagy <szabolcs.nagy@arm.com> writes: > so are we supposed to handle migrations to machines with > different arch extensions? No, that's not the intent. There's a lot of "caveat user" in what our customer wants, and if the x86 tunables didn't already *almost* do what they wanted, we would have just said no. I mean, x86 has code to do what our customer wants, and has code to let tunables control it, it just didn't actually work. So in this case, it's more of a bug fix than a new feature. > cpu_features based decisions can break across different > machines and there is no reliable (future proof) way to > request baseline arch features. That's a separate problem, which we've solved with ifuncs, but others solve with multilibs. X86 chooses to let tunables override ifunc choices, and exposes CPU features, setting an expectation. If other arches don't do that, so be it. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-03-19 21:12 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-18 17:18 tunables vs osxsave vs checkpointing vs _dl_runtime_resolve DJ Delorie 2021-03-18 17:29 ` H.J. Lu 2021-03-18 17:45 ` DJ Delorie 2021-03-18 20:39 ` H.J. Lu 2021-03-19 3:40 ` DJ Delorie 2021-03-19 4:16 ` H.J. Lu 2021-03-19 4:35 ` DJ Delorie 2021-03-18 17:32 ` Florian Weimer 2021-03-18 17:47 ` DJ Delorie 2021-03-18 17:57 ` Florian Weimer 2021-03-18 18:19 ` DJ Delorie 2021-03-19 16:43 ` Szabolcs Nagy 2021-03-19 21:12 ` DJ Delorie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).