public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Handle variable XSAVE layouts
@ 2022-03-16 19:46 John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 1/4] x86: Add an xsave_offsets structure to handle " John Baldwin
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-16 19:46 UTC (permalink / raw)
  To: gdb-patches

This is a first attempt at resolving the issue with XSAVE I described
previously.  There are more details in the commit logs, but here I think
will describe some caveats about the current prototype:

- It is probably terrible performance-wise to be reading the offsets
  from the target every time collect/supply_xsave is called.  I'd
  actually much prefer to store these (along with the total XSAVE area
  size) in the tdep.  The issue is that you can have gdbarches with the
  same tdesc that use different layouts (e.g. if you open a core dump
  from an Intel CPU on a host with an AMD CPU, the two CPUs could have
  identical XCR0 masks, but the layout in the core dump wouldn't match
  the layout of a live process).  Perhaps if I can fetch the offsets
  from the target in i386_gdbarch_init though I can iterate over
  matching arches looking for a match.

- The cpuid function I added in patch 3 isn't FreeBSD-specific at all
  (and would work on i386).  I could add it to x86-nat.c instead
  easily enough.  Even if OS's start providing a new ptrace op
  to fetch this info we probably should ship a cpuid-based variant as
  a fallback?

- The collect size I used in patch 3 for the XSAVE register set
  isn't really correct.  Really if I had the "real" XSAVE register
  set size available in the tdep (see point 1) I would not set
  REGSET_VARIABLE_SIZE and instead use tdep->sizeof_xsave for both
  sizes.

- I have no idea how gdbserver is impacted.  So far I haven't really
  found similar tables to i387-tdep.c in gdbserver.  (It's also harder
  for me to test currently as I haven't yet added FreeBSD support to
  gdbserver).

- I haven't added any support for fetching the offsets on Linux (the
  only other OS that supports XSAVE state currently).  I am waiting
  to have the design a bit more finalized before doing that.

John Baldwin (4):
  x86: Add an xsave_offsets structure to handle variable XSAVE layouts.
  core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from
    architectures.
  Update x86 FreeBSD architectures to support XSAVE offsets.
  Support XSAVE layouts for the current host in the FreeBSD/amd64
    target.

 gdb/amd64-fbsd-nat.c      |  73 +++++
 gdb/amd64-fbsd-tdep.c     |   8 +-
 gdb/corelow.c             |  22 ++
 gdb/gdbarch-components.py |  13 +
 gdb/gdbarch-gen.h         |  10 +
 gdb/gdbarch.c             |  32 +++
 gdb/i386-fbsd-tdep.c      |  33 ++-
 gdb/i386-fbsd-tdep.h      |   6 +
 gdb/i387-tdep.c           | 592 +++++++++++++++++++++++++-------------
 gdb/i387-tdep.h           |  22 ++
 gdb/target.h              |   2 +
 11 files changed, 607 insertions(+), 206 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/4] x86: Add an xsave_offsets structure to handle variable XSAVE layouts.
  2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
@ 2022-03-16 19:46 ` John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 2/4] core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from architectures John Baldwin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-16 19:46 UTC (permalink / raw)
  To: gdb-patches

The standard layout of the XSAVE extended state area consists of three
primary regions.  The first 512 bytes (legacy region) match the layout
of the FXSAVE instruction including floating point registers, MMX
registers, and SSE registers.  The next 64 bytes (XSAVE header)
contains a header with a fixed layout.  The final region (extended
region) contains zero or more optional state components.  Examples of
these include the upper 128 bits of YMM registers for AVX.

These optional state components generally have an
architecturally-fixed size, but they are not assigned architectural
offsets in the extended region.  Instead, processors provide
additional CPUID leafs describing the size and offset of each
component in the "standard" layout for a given CPU.  (There is also a
"compat" format which uses an alternate layout, but existing OS's
currently export the "standard" layout when exporting XSAVE data via
ptrace() and core dumps.)

To date, GDB has assumed the layout used on current Intel processors
for state components in the extended region and hardcoding those
offsets in the tables in i387-tdep.c.  However, this fails on recent
AMD processors which use a different layout.  Specifically, AMD Ryzen
9 processors do not leave space for the MPX register set in between
the AVX and AVX512 register sets.

To rectify this, add an xsave_offsets structure which contains the
offset of each known optional state component.  i387_collect_xsave and
i386_supply_xsave fetch this structure from the current target as a
TARGET_OBJECT_X86_XSAVE_OFFSETS object.  All of the tables describing
the offsets of indvidual registers for XSAVE state components now hold
relative offsets rather than absolute offsets.  Some tables (those for
MPX registers and ZMMH registers) had to be split into separate tables
as they held entries that spanned multiple state components.

Current OS's do not export these offsets either via ptrace or in core
dumps, so provide an i387_set_xsave_offsets helper function to set
offsets based on known combinations of XCR0 masks and total state
sizes.  This can be used as a fallback when individual offsets are not
available.

Note that no targets yet provide support for reading
TARGET_OBJECT_X86_XSAVE_OFFSETS.  These will be added to existing
targets which support fetching XSAVE state in subsequent commits.
---
 gdb/i387-tdep.c | 592 +++++++++++++++++++++++++++++++-----------------
 gdb/i387-tdep.h |  22 ++
 gdb/target.h    |   2 +
 3 files changed, 414 insertions(+), 202 deletions(-)

diff --git a/gdb/i387-tdep.c b/gdb/i387-tdep.c
index 2f0b6509457..3e4c6378def 100644
--- a/gdb/i387-tdep.c
+++ b/gdb/i387-tdep.c
@@ -738,89 +738,104 @@ i387_collect_fxsave (const struct regcache *regcache, int regnum, void *fxsave)
 
 static int xsave_avxh_offset[] =
 {
-  576 + 0 * 16,		/* Upper 128bit of %ymm0 through ...  */
-  576 + 1 * 16,
-  576 + 2 * 16,
-  576 + 3 * 16,
-  576 + 4 * 16,
-  576 + 5 * 16,
-  576 + 6 * 16,
-  576 + 7 * 16,
-  576 + 8 * 16,
-  576 + 9 * 16,
-  576 + 10 * 16,
-  576 + 11 * 16,
-  576 + 12 * 16,
-  576 + 13 * 16,
-  576 + 14 * 16,
-  576 + 15 * 16		/* Upper 128bit of ... %ymm15 (128 bits each).  */
+  0 * 16,		/* Upper 128bit of %ymm0 through ...  */
+  1 * 16,
+  2 * 16,
+  3 * 16,
+  4 * 16,
+  5 * 16,
+  6 * 16,
+  7 * 16,
+  8 * 16,
+  9 * 16,
+  10 * 16,
+  11 * 16,
+  12 * 16,
+  13 * 16,
+  14 * 16,
+  15 * 16		/* Upper 128bit of ... %ymm15 (128 bits each).  */
 };
 
-#define XSAVE_AVXH_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_avxh_offset[regnum - I387_YMM0H_REGNUM (tdep)])
+#define XSAVE_AVXH_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->avx_offset				\
+   + xsave_avxh_offset[regnum - I387_YMM0H_REGNUM (tdep)])
 
 /* At xsave_ymm_avx512_offset[REGNUM] you'll find the offset to the location in
-   the upper 128bit of ZMM register data structure used by the "xsave"
+   the second 128bits of ZMM register data structure used by the "xsave"
    instruction where GDB register REGNUM is stored.  */
 
 static int xsave_ymm_avx512_offset[] =
 {
   /* HI16_ZMM_area + 16 bytes + regnum* 64 bytes.  */
-  1664 + 16 + 0 * 64,		/* %ymm16 through...  */
-  1664 + 16 + 1 * 64,
-  1664 + 16 + 2 * 64,
-  1664 + 16 + 3 * 64,
-  1664 + 16 + 4 * 64,
-  1664 + 16 + 5 * 64,
-  1664 + 16 + 6 * 64,
-  1664 + 16 + 7 * 64,
-  1664 + 16 + 8 * 64,
-  1664 + 16 + 9 * 64,
-  1664 + 16 + 10 * 64,
-  1664 + 16 + 11 * 64,
-  1664 + 16 + 12 * 64,
-  1664 + 16 + 13 * 64,
-  1664 + 16 + 14 * 64,
-  1664 + 16 + 15 * 64		/* ...  %ymm31 (128 bits each).  */
+  16 + 0 * 64,		/* %ymm16 through...  */
+  16 + 1 * 64,
+  16 + 2 * 64,
+  16 + 3 * 64,
+  16 + 4 * 64,
+  16 + 5 * 64,
+  16 + 6 * 64,
+  16 + 7 * 64,
+  16 + 8 * 64,
+  16 + 9 * 64,
+  16 + 10 * 64,
+  16 + 11 * 64,
+  16 + 12 * 64,
+  16 + 13 * 64,
+  16 + 14 * 64,
+  16 + 15 * 64		/* ...  %ymm31 (128 bits each).  */
 };
 
-#define XSAVE_YMM_AVX512_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_ymm_avx512_offset[regnum - I387_YMM16H_REGNUM (tdep)])
+#define XSAVE_YMM_AVX512_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->avx512_zmm_offset					\
+   + xsave_ymm_avx512_offset[regnum - I387_YMM16H_REGNUM (tdep)])
+
+/* At xsave_xmm_avx512_offset[REGNUM] you'll find the offset to the location in
+   the first 128bits of ZMM register data structure used by the "xsave"
+   instruction where GDB register REGNUM is stored.  */
 
 static int xsave_xmm_avx512_offset[] =
 {
-  1664 + 0 * 64,		/* %ymm16 through...  */
-  1664 + 1 * 64,
-  1664 + 2 * 64,
-  1664 + 3 * 64,
-  1664 + 4 * 64,
-  1664 + 5 * 64,
-  1664 + 6 * 64,
-  1664 + 7 * 64,
-  1664 + 8 * 64,
-  1664 + 9 * 64,
-  1664 + 10 * 64,
-  1664 + 11 * 64,
-  1664 + 12 * 64,
-  1664 + 13 * 64,
-  1664 + 14 * 64,
-  1664 + 15 * 64		/* ...  %ymm31 (128 bits each).  */
+  0 * 64,			/* %xmm16 through...  */
+  1 * 64,
+  2 * 64,
+  3 * 64,
+  4 * 64,
+  5 * 64,
+  6 * 64,
+  7 * 64,
+  8 * 64,
+  9 * 64,
+  10 * 64,
+  11 * 64,
+  12 * 64,
+  13 * 64,
+  14 * 64,
+  15 * 64			/* ...  %xmm31 (128 bits each).  */
 };
 
-#define XSAVE_XMM_AVX512_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_xmm_avx512_offset[regnum - I387_XMM16_REGNUM (tdep)])
+#define XSAVE_XMM_AVX512_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->avx512_zmm_offset					\
+   + xsave_xmm_avx512_offset[regnum - I387_XMM16_REGNUM (tdep)])
 
-static int xsave_mpx_offset[] = {
-  960 + 0 * 16,			/* bnd0r...bnd3r registers.  */
-  960 + 1 * 16,
-  960 + 2 * 16,
-  960 + 3 * 16,
-  1024 + 0 * 8,			/* bndcfg ... bndstatus.  */
-  1024 + 1 * 8,
+static int xsave_bndregs_offset[] = {
+  0 * 16,			/* bnd0r...bnd3r registers.  */
+  1 * 16,
+  2 * 16,
+  3 * 16
 };
 
-#define XSAVE_MPX_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_mpx_offset[regnum - I387_BND0R_REGNUM (tdep)])
+#define XSAVE_BNDREGS_ADDR(tdep, xsave, offsets, regnum)	\
+  (xsave + (offsets)->bndregs_offset				\
+   + xsave_bndregs_offset[regnum - I387_BND0R_REGNUM (tdep)])
+
+static int xsave_bndcfg_offset[] = {
+  0 * 8,			/* bndcfg ... bndstatus.  */
+  1 * 8,
+};
+
+#define XSAVE_BNDCFG_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->bndcfg_offset				\
+   + xsave_bndcfg_offset[regnum - I387_BNDCFGU_REGNUM (tdep)])
 
   /* At xsave_avx512__h_offset[REGNUM] you find the offset to the location
    of the AVX512 opmask register data structure used by the "xsave"
@@ -828,61 +843,78 @@ static int xsave_mpx_offset[] = {
 
 static int xsave_avx512_k_offset[] =
 {
-  1088 + 0 * 8,			/* %k0 through...  */
-  1088 + 1 * 8,
-  1088 + 2 * 8,
-  1088 + 3 * 8,
-  1088 + 4 * 8,
-  1088 + 5 * 8,
-  1088 + 6 * 8,
-  1088 + 7 * 8     		/* %k7 (64 bits each).  */
+  0 * 8,			/* %k0 through...  */
+  1 * 8,
+  2 * 8,
+  3 * 8,
+  4 * 8,
+  5 * 8,
+  6 * 8,
+  7 * 8				/* %k7 (64 bits each).	*/
 };
 
-#define XSAVE_AVX512_K_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_avx512_k_offset[regnum - I387_K0_REGNUM (tdep)])
+#define XSAVE_AVX512_K_ADDR(tdep, xsave, offsets, regnum)	\
+  (xsave + (offsets)->avx512_k_offset				\
+   + xsave_avx512_k_offset[regnum - I387_K0_REGNUM (tdep)])
 
-/* At xsave_avx512_zmm_h_offset[REGNUM] you find the offset to the location in
-   the upper 256bit of AVX512 ZMMH register data structure used by the "xsave"
-   instruction where GDB register REGNUM is stored.  */
 
-static int xsave_avx512_zmm_h_offset[] =
+/* At xsave_avx512_zmm0_h_offset[REGNUM] you find the offset to the
+   location in the upper 256bit of AVX512 ZMM0-15H register data
+   structure used by the "xsave" instruction where GDB register REGNUM
+   is stored.  */
+
+/* At xsave_avx512_zmm16_h_offset[REGNUM] you find the offset to the
+   location in the upper 256bit of AVX512 ZMMH register data structure
+   used by the "xsave" instruction where GDB register REGNUM is
+   stored.  */
+
+static int xsave_avx512_zmm0_h_offset[] =
+{
+  0 * 32,		/* Upper 256bit of %zmmh0 through...  */
+  1 * 32,
+  2 * 32,
+  3 * 32,
+  4 * 32,
+  5 * 32,
+  6 * 32,
+  7 * 32,
+  8 * 32,
+  9 * 32,
+  10 * 32,
+  11 * 32,
+  12 * 32,
+  13 * 32,
+  14 * 32,
+  15 * 32		/* Upper 256bit of...  %zmmh15 (256 bits each).  */
+};
+
+#define XSAVE_AVX512_ZMM0_H_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->avx512_zmm_h_offset				\
+   + xsave_avx512_zmm0_h_offset[regnum - I387_ZMM0H_REGNUM (tdep)])
+
+static int xsave_avx512_zmm16_h_offset[] =
 {
-  1152 + 0 * 32,
-  1152 + 1 * 32,	/* Upper 256bit of %zmmh0 through...  */
-  1152 + 2 * 32,
-  1152 + 3 * 32,
-  1152 + 4 * 32,
-  1152 + 5 * 32,
-  1152 + 6 * 32,
-  1152 + 7 * 32,
-  1152 + 8 * 32,
-  1152 + 9 * 32,
-  1152 + 10 * 32,
-  1152 + 11 * 32,
-  1152 + 12 * 32,
-  1152 + 13 * 32,
-  1152 + 14 * 32,
-  1152 + 15 * 32,	/* Upper 256bit of...  %zmmh15 (256 bits each).  */
-  1664 + 32 + 0 * 64,   /* Upper 256bit of...  %zmmh16 (256 bits each).  */
-  1664 + 32 + 1 * 64,
-  1664 + 32 + 2 * 64,
-  1664 + 32 + 3 * 64,
-  1664 + 32 + 4 * 64,
-  1664 + 32 + 5 * 64,
-  1664 + 32 + 6 * 64,
-  1664 + 32 + 7 * 64,
-  1664 + 32 + 8 * 64,
-  1664 + 32 + 9 * 64,
-  1664 + 32 + 10 * 64,
-  1664 + 32 + 11 * 64,
-  1664 + 32 + 12 * 64,
-  1664 + 32 + 13 * 64,
-  1664 + 32 + 14 * 64,
-  1664 + 32 + 15 * 64   /* Upper 256bit of... %zmmh31 (256 bits each).  */
+  32 + 0 * 64,		/* Upper 256bit of...  %zmmh16 (256 bits each).  */
+  32 + 1 * 64,
+  32 + 2 * 64,
+  32 + 3 * 64,
+  32 + 4 * 64,
+  32 + 5 * 64,
+  32 + 6 * 64,
+  32 + 7 * 64,
+  32 + 8 * 64,
+  32 + 9 * 64,
+  32 + 10 * 64,
+  32 + 11 * 64,
+  32 + 12 * 64,
+  32 + 13 * 64,
+  32 + 14 * 64,
+  32 + 15 * 64		/* Upper 256bit of... %zmmh31 (256 bits each).  */
 };
 
-#define XSAVE_AVX512_ZMM_H_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_avx512_zmm_h_offset[regnum - I387_ZMM0H_REGNUM (tdep)])
+#define XSAVE_AVX512_ZMM16_H_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->avx512_zmm_offset					\
+   + xsave_avx512_zmm16_h_offset[regnum - I387_ZMM16H_REGNUM (tdep)])
 
 /* At xsave_pkeys_offset[REGNUM] you find the offset to the location
    of the PKRU register data structure used by the "xsave"
@@ -890,14 +922,93 @@ static int xsave_avx512_zmm_h_offset[] =
 
 static int xsave_pkeys_offset[] =
 {
-2688 + 0 * 8		/* %pkru (64 bits in XSTATE, 32-bit actually used by
+  0 * 8			/* %pkru (64 bits in XSTATE, 32-bit actually used by
 			   instructions and applications).  */
 };
 
-#define XSAVE_PKEYS_ADDR(tdep, xsave, regnum) \
-  (xsave + xsave_pkeys_offset[regnum - I387_PKRU_REGNUM (tdep)])
+#define XSAVE_PKEYS_ADDR(tdep, xsave, offsets, regnum)		\
+  (xsave + (offsets)->pkru_offset				\
+   + xsave_pkeys_offset[regnum - I387_PKRU_REGNUM (tdep)])
 
 
+/* See i387-tdep.h.  */
+
+void
+i387_set_xsave_offsets (uint64_t xcr0, size_t xsave_size,
+			struct xsave_offsets *offsets)
+{
+  if (HAS_PKRU(xcr0) && xsave_size == 2696)
+    {
+      offsets->avx_offset = 576;
+      offsets->bndregs_offset = 960;
+      offsets->bndcfg_offset = 1024;
+      offsets->avx512_k_offset = 1088;
+      offsets->avx512_zmm_h_offset = 1152;
+      offsets->avx512_zmm_offset = 1664;
+      offsets->pkru_offset = 2688;
+    }
+  else if (HAS_PKRU(xcr0) && xsave_size == 2440)
+    {
+      offsets->avx_offset = 576;
+      offsets->bndregs_offset = -1;
+      offsets->bndcfg_offset = -1;
+      offsets->avx512_k_offset = 832;
+      offsets->avx512_zmm_h_offset = 896;
+      offsets->avx512_zmm_offset = 1408;
+      offsets->pkru_offset = 2432;
+    }
+  else if (HAS_AVX512(xcr0) && xsave_size == 2688)
+    {
+      offsets->avx_offset = 576;
+      offsets->bndregs_offset = 960;
+      offsets->bndcfg_offset = 1024;
+      offsets->avx512_k_offset = 1088;
+      offsets->avx512_zmm_h_offset = 1152;
+      offsets->avx512_zmm_offset = 1664;
+      offsets->pkru_offset = -1;
+    }
+  else if (HAS_MPX(xcr0) && xsave_size == 1088)
+    {
+      offsets->avx_offset = 576;
+      offsets->bndregs_offset = 960;
+      offsets->bndcfg_offset = 1024;
+      offsets->avx512_k_offset = -1;
+      offsets->avx512_zmm_h_offset = -1;
+      offsets->avx512_zmm_offset = -1;
+      offsets->pkru_offset = -1;
+    }
+  else if (HAS_AVX(xcr0) && xsave_size == 832)
+    {
+      offsets->avx_offset = 576;
+      offsets->bndregs_offset = -1;
+      offsets->bndcfg_offset = -1;
+      offsets->avx512_k_offset = -1;
+      offsets->avx512_zmm_h_offset = -1;
+      offsets->avx512_zmm_offset = -1;
+      offsets->pkru_offset = -1;
+    }
+  else
+    {
+      offsets->bndregs_offset = -1;
+      offsets->bndcfg_offset = -1;
+      offsets->avx512_k_offset = -1;
+      offsets->avx512_zmm_h_offset = -1;
+      offsets->avx512_zmm_offset = -1;
+      offsets->pkru_offset = -1;
+    }
+}
+
+/* Fetch the XSAVE offsets for the current target.  */
+
+static void
+i387_fetch_xsave_offsets (struct xsave_offsets *offsets)
+{
+  LONGEST len = target_read (current_inferior ()->top_target (),
+			     TARGET_OBJECT_X86_XSAVE_OFFSETS, nullptr,
+			     (gdb_byte *) offsets, 0, sizeof (*offsets));
+  gdb_assert (len == sizeof (*offsets));
+}
+
 /* Extract from XSAVE a bitset of the features that are available on the
    target, but which have not yet been enabled.  */
 
@@ -924,6 +1035,7 @@ void
 i387_supply_xsave (struct regcache *regcache, int regnum,
 		   const void *xsave)
 {
+  struct xsave_offsets offsets;
   struct gdbarch *gdbarch = regcache->arch ();
   enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
   i386_gdbarch_tdep *tdep = (i386_gdbarch_tdep *) gdbarch_tdep (gdbarch);
@@ -943,28 +1055,35 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
       x87 = 0x1,
       sse = 0x2,
       avxh = 0x4,
-      mpx  = 0x8,
-      avx512_k = 0x10,
-      avx512_zmm_h = 0x20,
-      avx512_ymmh_avx512 = 0x40,
-      avx512_xmm_avx512 = 0x80,
-      pkeys = 0x100,
-      all = x87 | sse | avxh | mpx | avx512_k | avx512_zmm_h
-	    | avx512_ymmh_avx512 | avx512_xmm_avx512 | pkeys
+      bndregs = 0x8,
+      bndcfg = 0x10,
+      avx512_k = 0x20,
+      avx512_zmm0_h = 0x40,
+      avx512_zmm16_h = 0x80,
+      avx512_ymmh_avx512 = 0x100,
+      avx512_xmm_avx512 = 0x200,
+      pkeys = 0x400,
+      all = x87 | sse | avxh | bndregs | bndcfg | avx512_k | avx512_zmm0_h
+	    | avx512_zmm16_h | avx512_ymmh_avx512 | avx512_xmm_avx512 | pkeys
     } regclass;
 
   gdb_assert (regs != NULL);
   gdb_assert (tdep->st0_regnum >= I386_ST0_REGNUM);
   gdb_assert (tdep->num_xmm_regs > 0);
 
+  i387_fetch_xsave_offsets (&offsets);
+
   if (regnum == -1)
     regclass = all;
   else if (regnum >= I387_PKRU_REGNUM (tdep)
 	   && regnum < I387_PKEYSEND_REGNUM (tdep))
     regclass = pkeys;
   else if (regnum >= I387_ZMM0H_REGNUM (tdep)
+	   && regnum < I387_ZMM16H_REGNUM (tdep))
+    regclass = avx512_zmm0_h;
+  else if (regnum >= I387_ZMM16H_REGNUM (tdep)
 	   && regnum < I387_ZMMENDH_REGNUM (tdep))
-    regclass = avx512_zmm_h;
+    regclass = avx512_zmm16_h;
   else if (regnum >= I387_K0_REGNUM (tdep)
 	   && regnum < I387_KEND_REGNUM (tdep))
     regclass = avx512_k;
@@ -978,8 +1097,11 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	   && regnum < I387_YMMENDH_REGNUM (tdep))
     regclass = avxh;
   else if (regnum >= I387_BND0R_REGNUM (tdep)
+	   && regnum < I387_BNDCFGU_REGNUM (tdep))
+    regclass = bndregs;
+  else if (regnum >= I387_BNDCFGU_REGNUM (tdep)
 	   && regnum < I387_MPXEND_REGNUM (tdep))
-    regclass = mpx;
+    regclass = bndcfg;
   else if (regnum >= I387_XMM0_REGNUM (tdep)
 	   && regnum < I387_MXCSR_REGNUM (tdep))
     regclass = sse;
@@ -1009,23 +1131,34 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
       if ((clear_bv & X86_XSTATE_PKRU))
 	regcache->raw_supply (regnum, zero);
       else
-	regcache->raw_supply (regnum, XSAVE_PKEYS_ADDR (tdep, regs, regnum));
+	regcache->raw_supply (regnum, XSAVE_PKEYS_ADDR (tdep, regs, &offsets,
+							regnum));
       return;
 
-    case avx512_zmm_h:
-      if ((clear_bv & (regnum < zmm_endlo_regnum ? X86_XSTATE_ZMM_H
-						 : X86_XSTATE_ZMM)))
+    case avx512_zmm0_h:
+      if ((clear_bv & X86_XSTATE_ZMM_H))
 	regcache->raw_supply (regnum, zero);
       else
 	regcache->raw_supply (regnum,
-			      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, regnum));
+			      XSAVE_AVX512_ZMM0_H_ADDR (tdep, regs, &offsets,
+							regnum));
+      return;
+
+    case avx512_zmm16_h:
+      if ((clear_bv & X86_XSTATE_ZMM))
+	regcache->raw_supply (regnum, zero);
+      else
+	regcache->raw_supply (regnum,
+			      XSAVE_AVX512_ZMM16_H_ADDR (tdep, regs, &offsets,
+							 regnum));
       return;
 
     case avx512_k:
       if ((clear_bv & X86_XSTATE_K))
 	regcache->raw_supply (regnum, zero);
       else
-	regcache->raw_supply (regnum, XSAVE_AVX512_K_ADDR (tdep, regs, regnum));
+	regcache->raw_supply (regnum, XSAVE_AVX512_K_ADDR (tdep, regs, &offsets,
+							   regnum));
       return;
 
     case avx512_ymmh_avx512:
@@ -1033,7 +1166,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	regcache->raw_supply (regnum, zero);
       else
 	regcache->raw_supply (regnum,
-			      XSAVE_YMM_AVX512_ADDR (tdep, regs, regnum));
+			      XSAVE_YMM_AVX512_ADDR (tdep, regs, &offsets,
+						     regnum));
       return;
 
     case avx512_xmm_avx512:
@@ -1041,21 +1175,32 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	regcache->raw_supply (regnum, zero);
       else
 	regcache->raw_supply (regnum,
-			      XSAVE_XMM_AVX512_ADDR (tdep, regs, regnum));
+			      XSAVE_XMM_AVX512_ADDR (tdep, regs, &offsets,
+						     regnum));
       return;
 
     case avxh:
       if ((clear_bv & X86_XSTATE_AVX))
 	regcache->raw_supply (regnum, zero);
       else
-	regcache->raw_supply (regnum, XSAVE_AVXH_ADDR (tdep, regs, regnum));
+	regcache->raw_supply (regnum, XSAVE_AVXH_ADDR (tdep, regs, &offsets,
+						       regnum));
       return;
 
-    case mpx:
+    case bndcfg:
+      if ((clear_bv & X86_XSTATE_BNDCFG))
+	regcache->raw_supply (regnum, zero);
+      else
+	regcache->raw_supply (regnum, XSAVE_BNDCFG_ADDR (tdep, regs, &offsets,
+							 regnum));
+      return;
+
+    case bndregs:
       if ((clear_bv & X86_XSTATE_BNDREGS))
 	regcache->raw_supply (regnum, zero);
       else
-	regcache->raw_supply (regnum, XSAVE_MPX_ADDR (tdep, regs, regnum));
+	regcache->raw_supply (regnum, XSAVE_BNDREGS_ADDR (tdep, regs, &offsets,
+							  regnum));
       return;
 
     case sse:
@@ -1088,7 +1233,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	      for (i = I387_PKRU_REGNUM (tdep);
 		   i < I387_PKEYSEND_REGNUM (tdep);
 		   i++)
-		regcache->raw_supply (i, XSAVE_PKEYS_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_PKEYS_ADDR (tdep, regs, &offsets,
+							   i));
 	    }
 	}
 
@@ -1104,7 +1250,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	    {
 	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
 		regcache->raw_supply (i,
-				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
+				      XSAVE_AVX512_ZMM0_H_ADDR (tdep, regs,
+								&offsets, i));
 	    }
 	}
 
@@ -1123,7 +1270,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	      for (i = I387_K0_REGNUM (tdep);
 		   i < I387_KEND_REGNUM (tdep);
 		   i++)
-		regcache->raw_supply (i, XSAVE_AVX512_K_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_AVX512_K_ADDR (tdep, regs,
+							      &offsets, i));
 	    }
 	}
 
@@ -1132,7 +1280,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	{
 	  if ((clear_bv & X86_XSTATE_ZMM))
 	    {
-	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
+	      for (i = I387_ZMM16H_REGNUM (tdep);
+		   i < I387_ZMMENDH_REGNUM (tdep); i++)
 		regcache->raw_supply (i, zero);
 	      for (i = I387_YMM16H_REGNUM (tdep);
 		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
@@ -1145,17 +1294,21 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	    }
 	  else
 	    {
-	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
+	      for (i = I387_ZMM16H_REGNUM (tdep);
+		   i < I387_ZMMENDH_REGNUM (tdep); i++)
 		regcache->raw_supply (i,
-				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
+				      XSAVE_AVX512_ZMM16_H_ADDR (tdep, regs,
+								 &offsets, i));
 	      for (i = I387_YMM16H_REGNUM (tdep);
 		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
 		   i++)
-		regcache->raw_supply (i, XSAVE_YMM_AVX512_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_YMM_AVX512_ADDR (tdep, regs,
+								&offsets, i));
 	      for (i = I387_XMM16_REGNUM (tdep);
 		   i < I387_XMM_AVX512_END_REGNUM (tdep);
 		   i++)
-		regcache->raw_supply (i, XSAVE_XMM_AVX512_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_XMM_AVX512_ADDR (tdep, regs,
+								&offsets, i));
 	    }
 	}
       /* Handle the upper YMM registers.  */
@@ -1173,7 +1326,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	      for (i = I387_YMM0H_REGNUM (tdep);
 		   i < I387_YMMENDH_REGNUM (tdep);
 		   i++)
-		regcache->raw_supply (i, XSAVE_AVXH_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_AVXH_ADDR (tdep, regs,
+							  &offsets, i));
 	    }
 	}
 
@@ -1190,7 +1344,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	    {
 	      for (i = I387_BND0R_REGNUM (tdep);
 		   i < I387_BNDCFGU_REGNUM (tdep); i++)
-		regcache->raw_supply (i, XSAVE_MPX_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_BNDREGS_ADDR (tdep, regs,
+							     &offsets, i));
 	    }
 	}
 
@@ -1207,7 +1362,8 @@ i387_supply_xsave (struct regcache *regcache, int regnum,
 	    {
 	      for (i = I387_BNDCFGU_REGNUM (tdep);
 		   i < I387_MPXEND_REGNUM (tdep); i++)
-		regcache->raw_supply (i, XSAVE_MPX_ADDR (tdep, regs, i));
+		regcache->raw_supply (i, XSAVE_BNDCFG_ADDR (tdep, regs,
+							    &offsets, i));
 	    }
 	}
 
@@ -1347,6 +1503,7 @@ void
 i387_collect_xsave (const struct regcache *regcache, int regnum,
 		    void *xsave, int gcore)
 {
+  struct xsave_offsets offsets;
   struct gdbarch *gdbarch = regcache->arch ();
   enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
   i386_gdbarch_tdep *tdep = (i386_gdbarch_tdep *) gdbarch_tdep (gdbarch);
@@ -1363,27 +1520,34 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
       x87 = 0x2,
       sse = 0x4,
       avxh = 0x8,
-      mpx  = 0x10,
-      avx512_k = 0x20,
-      avx512_zmm_h = 0x40,
-      avx512_ymmh_avx512 = 0x80,
-      avx512_xmm_avx512 = 0x100,
-      pkeys = 0x200,
-      all = x87 | sse | avxh | mpx | avx512_k | avx512_zmm_h
-	    | avx512_ymmh_avx512 | avx512_xmm_avx512 | pkeys
+      bndregs = 0x10,
+      bndcfg = 0x20,
+      avx512_k = 0x40,
+      avx512_zmm0_h = 0x80,
+      avx512_zmm16_h = 0x100,
+      avx512_ymmh_avx512 = 0x200,
+      avx512_xmm_avx512 = 0x400,
+      pkeys = 0x800,
+      all = x87 | sse | avxh | bndregs | bndcfg | avx512_k | avx512_zmm0_h
+	    | avx512_zmm16_h | avx512_ymmh_avx512 | avx512_xmm_avx512 | pkeys
     } regclass;
 
   gdb_assert (tdep->st0_regnum >= I386_ST0_REGNUM);
   gdb_assert (tdep->num_xmm_regs > 0);
 
+  i387_fetch_xsave_offsets (&offsets);
+
   if (regnum == -1)
     regclass = all;
   else if (regnum >= I387_PKRU_REGNUM (tdep)
 	   && regnum < I387_PKEYSEND_REGNUM (tdep))
     regclass = pkeys;
   else if (regnum >= I387_ZMM0H_REGNUM (tdep)
+	   && regnum < I387_ZMM16H_REGNUM (tdep))
+    regclass = avx512_zmm0_h;
+  else if (regnum >= I387_ZMM16H_REGNUM (tdep)
 	   && regnum < I387_ZMMENDH_REGNUM (tdep))
-    regclass = avx512_zmm_h;
+    regclass = avx512_zmm16_h;
   else if (regnum >= I387_K0_REGNUM (tdep)
 	   && regnum < I387_KEND_REGNUM (tdep))
     regclass = avx512_k;
@@ -1397,8 +1561,11 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	   && regnum < I387_YMMENDH_REGNUM (tdep))
     regclass = avxh;
   else if (regnum >= I387_BND0R_REGNUM (tdep)
+	   && regnum < I387_BNDCFGU_REGNUM (tdep))
+    regclass = bndregs;
+  else if (regnum >= I387_BNDCFGU_REGNUM (tdep)
 	   && regnum < I387_MPXEND_REGNUM (tdep))
-    regclass = mpx;
+    regclass = bndcfg;
   else if (regnum >= I387_XMM0_REGNUM (tdep)
 	   && regnum < I387_MXCSR_REGNUM (tdep))
     regclass = sse;
@@ -1445,43 +1612,44 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
       if ((clear_bv & X86_XSTATE_PKRU))
 	for (i = I387_PKRU_REGNUM (tdep);
 	     i < I387_PKEYSEND_REGNUM (tdep); i++)
-	  memset (XSAVE_PKEYS_ADDR (tdep, regs, i), 0, 4);
+	  memset (XSAVE_PKEYS_ADDR (tdep, regs, &offsets, i), 0, 4);
 
       if ((clear_bv & X86_XSTATE_BNDREGS))
 	for (i = I387_BND0R_REGNUM (tdep);
 	     i < I387_BNDCFGU_REGNUM (tdep); i++)
-	  memset (XSAVE_MPX_ADDR (tdep, regs, i), 0, 16);
+	  memset (XSAVE_BNDREGS_ADDR (tdep, regs, &offsets, i), 0, 16);
 
       if ((clear_bv & X86_XSTATE_BNDCFG))
 	for (i = I387_BNDCFGU_REGNUM (tdep);
 	     i < I387_MPXEND_REGNUM (tdep); i++)
-	  memset (XSAVE_MPX_ADDR (tdep, regs, i), 0, 8);
+	  memset (XSAVE_BNDCFG_ADDR (tdep, regs, &offsets, i), 0, 8);
 
       if ((clear_bv & X86_XSTATE_ZMM_H))
 	for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
-	  memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
+	  memset (XSAVE_AVX512_ZMM0_H_ADDR (tdep, regs, &offsets, i), 0, 32);
 
       if ((clear_bv & X86_XSTATE_K))
 	for (i = I387_K0_REGNUM (tdep);
 	     i < I387_KEND_REGNUM (tdep); i++)
-	  memset (XSAVE_AVX512_K_ADDR (tdep, regs, i), 0, 8);
+	  memset (XSAVE_AVX512_K_ADDR (tdep, regs, &offsets, i), 0, 8);
 
       if ((clear_bv & X86_XSTATE_ZMM))
 	{
-	  for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
-	    memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
+	  for (i = I387_ZMM16H_REGNUM (tdep); i < I387_ZMMENDH_REGNUM (tdep);
+	       i++)
+	    memset (XSAVE_AVX512_ZMM16_H_ADDR (tdep, regs, &offsets, i), 0, 32);
 	  for (i = I387_YMM16H_REGNUM (tdep);
 	       i < I387_YMMH_AVX512_END_REGNUM (tdep); i++)
-	    memset (XSAVE_YMM_AVX512_ADDR (tdep, regs, i), 0, 16);
+	    memset (XSAVE_YMM_AVX512_ADDR (tdep, regs, &offsets, i), 0, 16);
 	  for (i = I387_XMM16_REGNUM (tdep);
 	       i < I387_XMM_AVX512_END_REGNUM (tdep); i++)
-	    memset (XSAVE_XMM_AVX512_ADDR (tdep, regs, i), 0, 16);
+	    memset (XSAVE_XMM_AVX512_ADDR (tdep, regs, &offsets, i), 0, 16);
 	}
 
       if ((clear_bv & X86_XSTATE_AVX))
 	for (i = I387_YMM0H_REGNUM (tdep);
 	     i < I387_YMMENDH_REGNUM (tdep); i++)
-	  memset (XSAVE_AVXH_ADDR (tdep, regs, i), 0, 16);
+	  memset (XSAVE_AVXH_ADDR (tdep, regs, &offsets, i), 0, 16);
 
       if ((clear_bv & X86_XSTATE_SSE))
 	for (i = I387_XMM0_REGNUM (tdep);
@@ -1523,7 +1691,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	     i < I387_PKEYSEND_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_PKEYS_ADDR (tdep, regs, i);
+	    p = XSAVE_PKEYS_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 4) != 0)
 	      {
 		xstate_bv |= X86_XSTATE_PKRU;
@@ -1532,15 +1700,27 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	  }
 
       /* Check if any ZMMH registers are changed.  */
-      if ((tdep->xcr0 & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
-	for (i = I387_ZMM0H_REGNUM (tdep);
+      if ((tdep->xcr0 & X86_XSTATE_ZMM))
+	for (i = I387_ZMM16H_REGNUM (tdep);
 	     i < I387_ZMMENDH_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i);
+	    p = XSAVE_AVX512_ZMM16_H_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 32) != 0)
 	      {
-		xstate_bv |= (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM);
+		xstate_bv |= X86_XSTATE_ZMM;
+		memcpy (p, raw, 32);
+	      }
+	  }
+
+      if ((tdep->xcr0 & X86_XSTATE_ZMM_H))
+	for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
+	  {
+	    regcache->raw_collect (i, raw);
+	    p = XSAVE_AVX512_ZMM0_H_ADDR (tdep, regs, &offsets, i);
+	    if (memcmp (raw, p, 32) != 0)
+	      {
+		xstate_bv |= X86_XSTATE_ZMM_H;
 		memcpy (p, raw, 32);
 	      }
 	  }
@@ -1551,7 +1731,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	     i < I387_KEND_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_AVX512_K_ADDR (tdep, regs, i);
+	    p = XSAVE_AVX512_K_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 8) != 0)
 	      {
 		xstate_bv |= X86_XSTATE_K;
@@ -1566,7 +1746,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	       i < I387_YMMH_AVX512_END_REGNUM (tdep); i++)
 	    {
 	      regcache->raw_collect (i, raw);
-	      p = XSAVE_YMM_AVX512_ADDR (tdep, regs, i);
+	      p = XSAVE_YMM_AVX512_ADDR (tdep, regs, &offsets, i);
 	      if (memcmp (raw, p, 16) != 0)
 		{
 		  xstate_bv |= X86_XSTATE_ZMM;
@@ -1577,7 +1757,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	       i < I387_XMM_AVX512_END_REGNUM (tdep); i++)
 	    {
 	      regcache->raw_collect (i, raw);
-	      p = XSAVE_XMM_AVX512_ADDR (tdep, regs, i);
+	      p = XSAVE_XMM_AVX512_ADDR (tdep, regs, &offsets, i);
 	      if (memcmp (raw, p, 16) != 0)
 		{
 		  xstate_bv |= X86_XSTATE_ZMM;
@@ -1592,7 +1772,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	     i < I387_BNDCFGU_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_MPX_ADDR (tdep, regs, i);
+	    p = XSAVE_BNDREGS_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 16))
 	      {
 		xstate_bv |= X86_XSTATE_BNDREGS;
@@ -1606,7 +1786,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	     i < I387_MPXEND_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_MPX_ADDR (tdep, regs, i);
+	    p = XSAVE_BNDCFG_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 8))
 	      {
 		xstate_bv |= X86_XSTATE_BNDCFG;
@@ -1620,7 +1800,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	     i < I387_YMMENDH_REGNUM (tdep); i++)
 	  {
 	    regcache->raw_collect (i, raw);
-	    p = XSAVE_AVXH_ADDR (tdep, regs, i);
+	    p = XSAVE_AVXH_ADDR (tdep, regs, &offsets, i);
 	    if (memcmp (raw, p, 16))
 	      {
 		xstate_bv |= X86_XSTATE_AVX;
@@ -1688,7 +1868,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 
 	case pkeys:
 	  /* This is a PKEYS register.  */
-	  p = XSAVE_PKEYS_ADDR (tdep, regs, regnum);
+	  p = XSAVE_PKEYS_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 4) != 0)
 	    {
 	      xstate_bv |= X86_XSTATE_PKRU;
@@ -1696,18 +1876,29 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	    }
 	  break;
 
-	case avx512_zmm_h:
-	  /* This is a ZMM register.  */
-	  p = XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, regnum);
+	case avx512_zmm16_h:
+	  /* This is a ZMM16-31 register.  */
+	  p = XSAVE_AVX512_ZMM16_H_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 32) != 0)
 	    {
-	      xstate_bv |= (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM);
+	      xstate_bv |= X86_XSTATE_ZMM;
 	      memcpy (p, raw, 32);
 	    }
 	  break;
+
+	case avx512_zmm0_h:
+	  /* This is a ZMM0-15 register.  */
+	  p = XSAVE_AVX512_ZMM0_H_ADDR (tdep, regs, &offsets, regnum);
+	  if (memcmp (raw, p, 32) != 0)
+	    {
+	      xstate_bv |= X86_XSTATE_ZMM_H;
+	      memcpy (p, raw, 32);
+	    }
+	  break;
+
 	case avx512_k:
 	  /* This is a AVX512 mask register.  */
-	  p = XSAVE_AVX512_K_ADDR (tdep, regs, regnum);
+	  p = XSAVE_AVX512_K_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 8) != 0)
 	    {
 	      xstate_bv |= X86_XSTATE_K;
@@ -1717,7 +1908,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 
 	case avx512_ymmh_avx512:
 	  /* This is an upper YMM16-31 register.  */
-	  p = XSAVE_YMM_AVX512_ADDR (tdep, regs, regnum);
+	  p = XSAVE_YMM_AVX512_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 16) != 0)
 	    {
 	      xstate_bv |= X86_XSTATE_ZMM;
@@ -1727,7 +1918,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 
 	case avx512_xmm_avx512:
 	  /* This is an upper XMM16-31 register.  */
-	  p = XSAVE_XMM_AVX512_ADDR (tdep, regs, regnum);
+	  p = XSAVE_XMM_AVX512_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 16) != 0)
 	    {
 	      xstate_bv |= X86_XSTATE_ZMM;
@@ -1737,7 +1928,7 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 
 	case avxh:
 	  /* This is an upper YMM register.  */
-	  p = XSAVE_AVXH_ADDR (tdep, regs, regnum);
+	  p = XSAVE_AVXH_ADDR (tdep, regs, &offsets, regnum);
 	  if (memcmp (raw, p, 16))
 	    {
 	      xstate_bv |= X86_XSTATE_AVX;
@@ -1745,25 +1936,22 @@ i387_collect_xsave (const struct regcache *regcache, int regnum,
 	    }
 	  break;
 
-	case mpx:
-	  if (regnum < I387_BNDCFGU_REGNUM (tdep))
+	case bndregs:
+	  regcache->raw_collect (regnum, raw);
+	  p = XSAVE_BNDREGS_ADDR (tdep, regs, &offsets, regnum);
+	  if (memcmp (raw, p, 16))
 	    {
-	      regcache->raw_collect (regnum, raw);
-	      p = XSAVE_MPX_ADDR (tdep, regs, regnum);
-	      if (memcmp (raw, p, 16))
-		{
-		  xstate_bv |= X86_XSTATE_BNDREGS;
-		  memcpy (p, raw, 16);
-		}
-	    }
-	  else
-	    {
-	      p = XSAVE_MPX_ADDR (tdep, regs, regnum);
-	      xstate_bv |= X86_XSTATE_BNDCFG;
-	      memcpy (p, raw, 8);
+	      xstate_bv |= X86_XSTATE_BNDREGS;
+	      memcpy (p, raw, 16);
 	    }
 	  break;
 
+	case bndcfg:
+	  p = XSAVE_BNDCFG_ADDR (tdep, regs, &offsets, regnum);
+	  xstate_bv |= X86_XSTATE_BNDCFG;
+	  memcpy (p, raw, 8);
+	  break;
+
 	case sse:
 	  /* This is an SSE register.  */
 	  p = FXSAVE_ADDR (tdep, regs, regnum);
diff --git a/gdb/i387-tdep.h b/gdb/i387-tdep.h
index 698ff2ee206..bda7e2603d1 100644
--- a/gdb/i387-tdep.h
+++ b/gdb/i387-tdep.h
@@ -50,6 +50,7 @@ struct ui_file;
 #define I387_K0_REGNUM(tdep) ((tdep)->k0_regnum)
 #define I387_NUM_ZMMH_REGS(tdep) ((tdep)->num_zmm_regs)
 #define I387_ZMM0H_REGNUM(tdep) ((tdep)->zmm0h_regnum)
+#define I387_ZMM16H_REGNUM(tdep) ((tdep)->zmm0h_regnum + 16)
 #define I387_NUM_YMM_AVX512_REGS(tdep) ((tdep)->num_ymm_avx512_regs)
 #define I387_YMM16H_REGNUM(tdep) ((tdep)->ymm16h_regnum)
 
@@ -84,6 +85,21 @@ struct ui_file;
 #define I387_PKEYSEND_REGNUM(tdep) \
   (I387_PKRU_REGNUM (tdep) + I387_NUM_PKEYS_REGS)
 
+
+/* Offsets of register states in XSAVE area extended region.  Set to
+   -1 to indicate the absence of the associated registers.  */
+
+struct xsave_offsets
+{
+  int avx_offset = 0;
+  int bndregs_offset = 0;
+  int bndcfg_offset = 0;
+  int avx512_k_offset = 0;
+  int avx512_zmm_h_offset = 0;
+  int avx512_zmm_offset = 0;
+  int pkru_offset = 0;
+};
+
 /* Print out the i387 floating point state.  */
 
 extern void i387_print_float_info (struct gdbarch *gdbarch,
@@ -138,6 +154,12 @@ extern void i387_collect_fsave (const struct regcache *regcache, int regnum,
 extern void i387_supply_fxsave (struct regcache *regcache, int regnum,
 				const void *fxsave);
 
+/* Select an XSAVE layout based on the XCR0 bitmask and total XSAVE
+   extended state size.  */
+
+extern void i387_set_xsave_offsets (uint64_t xcr0, size_t xsave_size,
+				    struct xsave_offsets *offsets);
+
 /* Similar to i387_supply_fxsave, but use XSAVE extended state.  */
 
 extern void i387_supply_xsave (struct regcache *regcache, int regnum,
diff --git a/gdb/target.h b/gdb/target.h
index 4cc79df05b4..c4bd9bdf3d8 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -204,6 +204,8 @@ enum target_object
   TARGET_OBJECT_FREEBSD_VMMAP,
   /* FreeBSD process strings.  */
   TARGET_OBJECT_FREEBSD_PS_STRINGS,
+  /* x86 XSAVE area extended region offsets.  */
+  TARGET_OBJECT_X86_XSAVE_OFFSETS,
   /* Possible future objects: TARGET_OBJECT_FILE, ...  */
 };
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4] core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from architectures.
  2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 1/4] x86: Add an xsave_offsets structure to handle " John Baldwin
@ 2022-03-16 19:46 ` John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 3/4] Update x86 FreeBSD architectures to support XSAVE offsets John Baldwin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-16 19:46 UTC (permalink / raw)
  To: gdb-patches

Add gdbarch_core_xfer_x86_xsave_offsets to fetch the x86 XSAVE offsets
structure from a core file.
---
 gdb/corelow.c             | 22 ++++++++++++++++++++++
 gdb/gdbarch-components.py | 13 +++++++++++++
 gdb/gdbarch-gen.h         | 10 ++++++++++
 gdb/gdbarch.c             | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+)

diff --git a/gdb/corelow.c b/gdb/corelow.c
index f7f2bd3f318..c945d47df29 100644
--- a/gdb/corelow.c
+++ b/gdb/corelow.c
@@ -987,6 +987,28 @@ core_target::xfer_partial (enum target_object object, const char *annex,
 	}
       return TARGET_XFER_E_IO;
 
+    case TARGET_OBJECT_X86_XSAVE_OFFSETS:
+      if (readbuf)
+	{
+	  if (m_core_gdbarch != nullptr
+	      && gdbarch_core_xfer_x86_xsave_offsets_p (m_core_gdbarch))
+	    {
+	      LONGEST l = gdbarch_core_xfer_x86_xsave_offsets  (m_core_gdbarch,
+								readbuf, offset,
+								len);
+
+	      if (l >= 0)
+		{
+		  *xfered_len = l;
+		  if (l == 0)
+		    return TARGET_XFER_EOF;
+		  else
+		    return TARGET_XFER_OK;
+		}
+	    }
+	}
+      return TARGET_XFER_E_IO;
+
     default:
     fallthrough:
       return this->beneath ()->xfer_partial (object, annex, readbuf,
diff --git a/gdb/gdbarch-components.py b/gdb/gdbarch-components.py
index 4987f8a5fef..9c88433f02f 100644
--- a/gdb/gdbarch-components.py
+++ b/gdb/gdbarch-components.py
@@ -1585,6 +1585,19 @@ of bytes read (zero indicates EOF, a negative value indicates failure).
     invalid=True,
 )
 
+Method(
+    comment="""
+Read offset OFFSET of TARGET_OBJECT_X86_XSAVE_OFFSETS from core file
+into buffer READBUF with length LEN.  Return the number of bytes read
+(zero indicates EOF, a negative value indicates failure).
+""",
+    type="LONGEST",
+    name="core_xfer_x86_xsave_offsets",
+    params=[("gdb_byte *", "readbuf"), ("ULONGEST", "offset"), ("ULONGEST", "len")],
+    predicate=True,
+    invalid=True,
+)
+
 Value(
     comment="""
 BFD target to use when generating a core file.
diff --git a/gdb/gdbarch-gen.h b/gdb/gdbarch-gen.h
index b7beb73d36d..9dca81b9c85 100644
--- a/gdb/gdbarch-gen.h
+++ b/gdb/gdbarch-gen.h
@@ -923,6 +923,16 @@ typedef LONGEST (gdbarch_core_xfer_siginfo_ftype) (struct gdbarch *gdbarch, gdb_
 extern LONGEST gdbarch_core_xfer_siginfo (struct gdbarch *gdbarch, gdb_byte *readbuf, ULONGEST offset, ULONGEST len);
 extern void set_gdbarch_core_xfer_siginfo (struct gdbarch *gdbarch, gdbarch_core_xfer_siginfo_ftype *core_xfer_siginfo);
 
+/* Read offset OFFSET of TARGET_OBJECT_X86_XSAVE_OFFSETS from core file
+   into buffer READBUF with length LEN.  Return the number of bytes read
+   (zero indicates EOF, a negative value indicates failure). */
+
+extern bool gdbarch_core_xfer_x86_xsave_offsets_p (struct gdbarch *gdbarch);
+
+typedef LONGEST (gdbarch_core_xfer_x86_xsave_offsets_ftype) (struct gdbarch *gdbarch, gdb_byte *readbuf, ULONGEST offset, ULONGEST len);
+extern LONGEST gdbarch_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch, gdb_byte *readbuf, ULONGEST offset, ULONGEST len);
+extern void set_gdbarch_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch, gdbarch_core_xfer_x86_xsave_offsets_ftype *core_xfer_x86_xsave_offsets);
+
 /* BFD target to use when generating a core file. */
 
 extern bool gdbarch_gcore_bfd_target_p (struct gdbarch *gdbarch);
diff --git a/gdb/gdbarch.c b/gdb/gdbarch.c
index 55dd602b27f..7d2d8dbdb4b 100644
--- a/gdb/gdbarch.c
+++ b/gdb/gdbarch.c
@@ -176,6 +176,7 @@ struct gdbarch
   gdbarch_core_pid_to_str_ftype *core_pid_to_str;
   gdbarch_core_thread_name_ftype *core_thread_name;
   gdbarch_core_xfer_siginfo_ftype *core_xfer_siginfo;
+  gdbarch_core_xfer_x86_xsave_offsets_ftype *core_xfer_x86_xsave_offsets;
   const char * gcore_bfd_target;
   int vtable_function_descriptors;
   int vbit_in_delta;
@@ -527,6 +528,7 @@ verify_gdbarch (struct gdbarch *gdbarch)
   /* Skip verify of core_pid_to_str, has predicate.  */
   /* Skip verify of core_thread_name, has predicate.  */
   /* Skip verify of core_xfer_siginfo, has predicate.  */
+  /* Skip verify of core_xfer_x86_xsave_offsets, has predicate.  */
   /* Skip verify of gcore_bfd_target, has predicate.  */
   /* Skip verify of vtable_function_descriptors, invalid_p == 0 */
   /* Skip verify of vbit_in_delta, invalid_p == 0 */
@@ -1122,6 +1124,12 @@ gdbarch_dump (struct gdbarch *gdbarch, struct ui_file *file)
   fprintf_filtered (file,
                       "gdbarch_dump: core_xfer_siginfo = <%s>\n",
                       host_address_to_string (gdbarch->core_xfer_siginfo));
+  fprintf_filtered (file,
+                      "gdbarch_dump: gdbarch_core_xfer_x86_xsave_offsets_p() = %d\n",
+                      gdbarch_core_xfer_x86_xsave_offsets_p (gdbarch));
+  fprintf_filtered (file,
+                      "gdbarch_dump: core_xfer_x86_xsave_offsets = <%s>\n",
+                      host_address_to_string (gdbarch->core_xfer_x86_xsave_offsets));
   fprintf_filtered (file,
                       "gdbarch_dump: gdbarch_gcore_bfd_target_p() = %d\n",
                       gdbarch_gcore_bfd_target_p (gdbarch));
@@ -3865,6 +3873,30 @@ set_gdbarch_core_xfer_siginfo (struct gdbarch *gdbarch,
   gdbarch->core_xfer_siginfo = core_xfer_siginfo;
 }
 
+bool
+gdbarch_core_xfer_x86_xsave_offsets_p (struct gdbarch *gdbarch)
+{
+  gdb_assert (gdbarch != NULL);
+  return gdbarch->core_xfer_x86_xsave_offsets != NULL;
+}
+
+LONGEST
+gdbarch_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch, gdb_byte *readbuf, ULONGEST offset, ULONGEST len)
+{
+  gdb_assert (gdbarch != NULL);
+  gdb_assert (gdbarch->core_xfer_x86_xsave_offsets != NULL);
+  if (gdbarch_debug >= 2)
+    fprintf_unfiltered (gdb_stdlog, "gdbarch_core_xfer_x86_xsave_offsets called\n");
+  return gdbarch->core_xfer_x86_xsave_offsets (gdbarch, readbuf, offset, len);
+}
+
+void
+set_gdbarch_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch,
+                                         gdbarch_core_xfer_x86_xsave_offsets_ftype core_xfer_x86_xsave_offsets)
+{
+  gdbarch->core_xfer_x86_xsave_offsets = core_xfer_x86_xsave_offsets;
+}
+
 bool
 gdbarch_gcore_bfd_target_p (struct gdbarch *gdbarch)
 {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4] Update x86 FreeBSD architectures to support XSAVE offsets.
  2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 1/4] x86: Add an xsave_offsets structure to handle " John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 2/4] core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from architectures John Baldwin
@ 2022-03-16 19:46 ` John Baldwin
  2022-03-16 19:46 ` [RFC PATCH 4/4] Support XSAVE layouts for the current host in the FreeBSD/amd64 target John Baldwin
  2022-03-17 13:17 ` [RFC PATCH 0/4] Handle variable XSAVE layouts Willgerodt, Felix
  4 siblings, 0 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-16 19:46 UTC (permalink / raw)
  To: gdb-patches

Add an implementation of gdbarch_core_xfer_x86_xsave_offsets which
uses the size of an existing NT_X86_XSTATE core dump to determine
the offsets via i387_set_xsave_offsets.

The XSAVE register set is now also marked as variable size.  This is
not quite correct (and in particular, the maximum ("collect") size
still assumes the Intel layout.
---
 gdb/amd64-fbsd-tdep.c |  8 ++++++--
 gdb/i386-fbsd-tdep.c  | 33 +++++++++++++++++++++++++++++++--
 gdb/i386-fbsd-tdep.h  |  6 ++++++
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/gdb/amd64-fbsd-tdep.c b/gdb/amd64-fbsd-tdep.c
index da5c297902d..19a86d40203 100644
--- a/gdb/amd64-fbsd-tdep.c
+++ b/gdb/amd64-fbsd-tdep.c
@@ -28,6 +28,7 @@
 
 #include "amd64-tdep.h"
 #include "amd64-fbsd-tdep.h"
+#include "i387-tdep.h"
 #include "fbsd-tdep.h"
 #include "solib-svr4.h"
 #include "inferior.h"
@@ -236,7 +237,8 @@ static const struct regset amd64fbsd_xstateregset =
   {
     NULL,
     amd64fbsd_supply_xstateregset,
-    amd64fbsd_collect_xstateregset
+    amd64fbsd_collect_xstateregset,
+    REGSET_VARIABLE_SIZE
   };
 
 /* Iterate over core file register note sections.  */
@@ -253,7 +255,7 @@ amd64fbsd_iterate_over_regset_sections (struct gdbarch *gdbarch,
       &amd64_fbsd_gregset, NULL, cb_data);
   cb (".reg2", tdep->sizeof_fpregset, tdep->sizeof_fpregset, &amd64_fpregset,
       NULL, cb_data);
-  cb (".reg-xstate", X86_XSTATE_SIZE (tdep->xcr0), X86_XSTATE_SIZE (tdep->xcr0),
+  cb (".reg-xstate", X86_XSTATE_SSE_SIZE, X86_XSTATE_SIZE (tdep->xcr0),
       &amd64fbsd_xstateregset, "XSAVE extended state", cb_data);
 }
 
@@ -295,6 +297,8 @@ amd64fbsd_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
   tramp_frame_prepend_unwinder (gdbarch, &amd64_fbsd_sigframe);
 
   tdep->xsave_xcr0_offset = I386_FBSD_XSAVE_XCR0_OFFSET;
+  set_gdbarch_core_xfer_x86_xsave_offsets
+    (gdbarch, i386fbsd_core_xfer_x86_xsave_offsets);
 
   /* Iterate over core file register note sections.  */
   set_gdbarch_iterate_over_regset_sections
diff --git a/gdb/i386-fbsd-tdep.c b/gdb/i386-fbsd-tdep.c
index 16ffd576323..577eed06321 100644
--- a/gdb/i386-fbsd-tdep.c
+++ b/gdb/i386-fbsd-tdep.c
@@ -18,6 +18,7 @@
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
 #include "defs.h"
+#include "gdbcore.h"
 #include "osabi.h"
 #include "regcache.h"
 #include "regset.h"
@@ -263,6 +264,31 @@ i386fbsd_core_read_xcr0 (bfd *abfd)
   return xcr0;
 }
 
+/* Implement the core_xfer_x86_xsave_offsets gdbarch method.  */
+
+LONGEST
+i386fbsd_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch,
+				      gdb_byte *readbuf, ULONGEST offset,
+				      ULONGEST len)
+{
+  asection *xstate = bfd_get_section_by_name (core_bfd, ".reg-xstate");
+  if (xstate == nullptr)
+    return -1;
+
+  if (offset > sizeof (struct xsave_offsets))
+    return -1;
+
+  i386_gdbarch_tdep *tdep = (i386_gdbarch_tdep *) gdbarch_tdep (gdbarch);
+  struct xsave_offsets xsave_offsets;
+  i387_set_xsave_offsets (tdep->xcr0, bfd_section_size (xstate), &xsave_offsets);
+
+  if (offset + len > sizeof (struct xsave_offsets))
+    len = sizeof (struct xsave_offsets) - offset;
+
+  memcpy (readbuf, ((gdb_byte *) &xsave_offsets) + offset, len);
+  return len;
+}
+
 /* Implement the core_read_description gdbarch method.  */
 
 static const struct target_desc *
@@ -299,7 +325,8 @@ static const struct regset i386fbsd_xstateregset =
   {
     NULL,
     i386fbsd_supply_xstateregset,
-    i386fbsd_collect_xstateregset
+    i386fbsd_collect_xstateregset,
+    REGSET_VARIABLE_SIZE
   };
 
 /* Iterate over core file register note sections.  */
@@ -318,7 +345,7 @@ i386fbsd_iterate_over_regset_sections (struct gdbarch *gdbarch,
       NULL, cb_data);
 
   if (tdep->xcr0 & X86_XSTATE_AVX)
-    cb (".reg-xstate", X86_XSTATE_SIZE (tdep->xcr0),
+    cb (".reg-xstate", X86_XSTATE_SSE_SIZE,
 	X86_XSTATE_SIZE (tdep->xcr0), &i386fbsd_xstateregset,
 	"XSAVE extended state", cb_data);
 }
@@ -372,6 +399,8 @@ i386fbsd_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
   i386_elf_init_abi (info, gdbarch);
 
   tdep->xsave_xcr0_offset = I386_FBSD_XSAVE_XCR0_OFFSET;
+  set_gdbarch_core_xfer_x86_xsave_offsets
+    (gdbarch, i386fbsd_core_xfer_x86_xsave_offsets);
 
   /* Iterate over core file register note sections.  */
   set_gdbarch_iterate_over_regset_sections
diff --git a/gdb/i386-fbsd-tdep.h b/gdb/i386-fbsd-tdep.h
index 76f4c20f657..ed8e8285f88 100644
--- a/gdb/i386-fbsd-tdep.h
+++ b/gdb/i386-fbsd-tdep.h
@@ -25,6 +25,12 @@
 /* Get XSAVE extended state xcr0 from core dump.  */
 extern uint64_t i386fbsd_core_read_xcr0 (bfd *abfd);
 
+/* Implement the core_xfer_x86_xsave_offsets gdbarch method.  */
+extern LONGEST i386fbsd_core_xfer_x86_xsave_offsets (struct gdbarch *gdbarch,
+						     gdb_byte *readbuf,
+						     ULONGEST offset,
+						     ULONGEST len);
+
 /* The format of the XSAVE extended area is determined by hardware.
    Cores store the XSAVE extended area in a NT_X86_XSTATE note that
    matches the layout on Linux.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4] Support XSAVE layouts for the current host in the FreeBSD/amd64 target.
  2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
                   ` (2 preceding siblings ...)
  2022-03-16 19:46 ` [RFC PATCH 3/4] Update x86 FreeBSD architectures to support XSAVE offsets John Baldwin
@ 2022-03-16 19:46 ` John Baldwin
  2022-03-17 13:17 ` [RFC PATCH 0/4] Handle variable XSAVE layouts Willgerodt, Felix
  4 siblings, 0 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-16 19:46 UTC (permalink / raw)
  To: gdb-patches

Use the CPUID instruction to fetch the offsets of supported state
components.
---
 gdb/amd64-fbsd-nat.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/gdb/amd64-fbsd-nat.c b/gdb/amd64-fbsd-nat.c
index 98a1af03a66..f749a022611 100644
--- a/gdb/amd64-fbsd-nat.c
+++ b/gdb/amd64-fbsd-nat.c
@@ -28,11 +28,15 @@
 #include <sys/sysctl.h>
 #include <sys/user.h>
 #include <machine/reg.h>
+#ifdef PT_GETXSTATE_INFO
+#include <cpuid.h>
+#endif
 
 #include "fbsd-nat.h"
 #include "amd64-tdep.h"
 #include "amd64-fbsd-tdep.h"
 #include "amd64-nat.h"
+#include "i387-tdep.h"
 #include "x86-nat.h"
 #include "gdbsupport/x86-xstate.h"
 #include "x86-bsd-nat.h"
@@ -41,6 +45,15 @@ class amd64_fbsd_nat_target final
   : public x86bsd_nat_target<fbsd_nat_target>
 {
 public:
+#ifdef PT_GETXSTATE_INFO
+  enum target_xfer_status xfer_partial (enum target_object object,
+					const char *annex,
+					gdb_byte *readbuf,
+					const gdb_byte *writebuf,
+					ULONGEST offset, ULONGEST len,
+					ULONGEST *xfered_len) override;
+#endif
+
   void fetch_registers (struct regcache *, int) override;
   void store_registers (struct regcache *, int) override;
 
@@ -55,6 +68,37 @@ static amd64_fbsd_nat_target the_amd64_fbsd_nat_target;
 
 #ifdef PT_GETXSTATE_INFO
 static size_t xsave_len;
+static struct xsave_offsets xsave_offsets;
+
+/* Implement the "xfer_partial" target_ops method.  */
+
+enum target_xfer_status
+amd64_fbsd_nat_target::xfer_partial (enum target_object object,
+				     const char *annex, gdb_byte *readbuf,
+				     const gdb_byte *writebuf,
+				     ULONGEST offset, ULONGEST len,
+				     ULONGEST *xfered_len)
+{
+  switch (object)
+    {
+    case TARGET_OBJECT_X86_XSAVE_OFFSETS:
+      if (xsave_len == 0)
+	return TARGET_XFER_E_IO;
+
+      if (offset > sizeof (xsave_offsets))
+	return TARGET_XFER_E_IO;
+
+      if (offset + len > sizeof (xsave_offsets))
+	len = sizeof (xsave_offsets) - offset;
+
+      memcpy (readbuf, ((gdb_byte *) &xsave_offsets) + offset, len);
+      *xfered_len = len;
+      return len == 0 ? TARGET_XFER_EOF : TARGET_XFER_OK;
+    default:
+      return fbsd_nat_target::xfer_partial (object, annex, readbuf, writebuf,
+					    offset, len, xfered_len);
+    }
+}
 #endif
 
 /* This is a layout of the amd64 'struct reg' but with i386
@@ -304,6 +348,34 @@ amd64fbsd_supply_pcb (struct regcache *regcache, struct pcb *pcb)
 }
 \f
 
+#ifdef PT_GETXSTATE_INFO
+/* Fetch the offset a specific XSAVE extended region.  */
+static int
+xsave_leaf_offset (uint64_t xcr0, int leaf)
+{
+  uint32_t eax, ebx, ecx, edx;
+
+  if ((xcr0 & (1ULL << leaf)) == 0)
+    return -1;
+
+  __cpuid_count(0xd, leaf, eax, ebx, ecx, edx);
+  return ebx;
+}
+
+/* Fetch the offsets of the XSAVE extended regions on the running host.  */
+static void
+probe_xsave_layout (uint64_t xcr0)
+{
+  xsave_offsets.avx_offset = xsave_leaf_offset(xcr0, 2);
+  xsave_offsets.bndregs_offset = xsave_leaf_offset(xcr0, 3);
+  xsave_offsets.bndcfg_offset = xsave_leaf_offset(xcr0, 4);
+  xsave_offsets.avx512_k_offset = xsave_leaf_offset(xcr0, 5);
+  xsave_offsets.avx512_zmm_h_offset = xsave_leaf_offset(xcr0, 6);
+  xsave_offsets.avx512_zmm_offset = xsave_leaf_offset(xcr0, 7);
+  xsave_offsets.pkru_offset = xsave_leaf_offset(xcr0, 9);
+}
+#endif
+
 /* Implement the read_description method.  */
 
 const struct target_desc *
@@ -330,6 +402,7 @@ amd64_fbsd_nat_target::read_description ()
 	{
 	  xsave_len = info.xsave_len;
 	  xcr0 = info.xsave_mask;
+	  probe_xsave_layout (xcr0);
 	}
       xsave_probed = 1;
     }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC PATCH 0/4] Handle variable XSAVE layouts
  2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
                   ` (3 preceding siblings ...)
  2022-03-16 19:46 ` [RFC PATCH 4/4] Support XSAVE layouts for the current host in the FreeBSD/amd64 target John Baldwin
@ 2022-03-17 13:17 ` Willgerodt, Felix
  2022-03-17 16:20   ` John Baldwin
  4 siblings, 1 reply; 10+ messages in thread
From: Willgerodt, Felix @ 2022-03-17 13:17 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

Hi John,

We looked at this recently while working on AMX support (I hope I can post
that soon), and decided to leave it hardcoded for now due to the missing
information for corefiles. We weren't aware of the AMD behaviour.

You can find an old prototype we discussed internally here:
https://github.com/intel/gdb/tree/experimental/xsave_offsets 
It is not working for corefiles, which is why we put it on hold for now.
But it might be interesting to see. I just rebased it quickly from an old
state, so there might be problems with the patches.

I would like to see the offset info in corefiles in the long run, as you
mentioned in your earlier mail. To me your approach of hardcoding
known combinations seems hard to maintain. But obviously making it
a bit better right now, if there are no conflicting combinations.

I responded a bit more below.

> -----Original Message-----
> From: Gdb-patches <gdb-patches-
> bounces+felix.willgerodt=intel.com@sourceware.org> On Behalf Of John
> Baldwin
> Sent: Mittwoch, 16. März 2022 20:46
> To: gdb-patches@sourceware.org
> Subject: [RFC PATCH 0/4] Handle variable XSAVE layouts
> 
> This is a first attempt at resolving the issue with XSAVE I described
> previously.  There are more details in the commit logs, but here I think
> will describe some caveats about the current prototype:
> 
> - It is probably terrible performance-wise to be reading the offsets
>   from the target every time collect/supply_xsave is called.  I'd
>   actually much prefer to store these (along with the total XSAVE area
>   size) in the tdep.  The issue is that you can have gdbarches with the
>   same tdesc that use different layouts (e.g. if you open a core dump
>   from an Intel CPU on a host with an AMD CPU, the two CPUs could have
>   identical XCR0 masks, but the layout in the core dump wouldn't match
>   the layout of a live process).  Perhaps if I can fetch the offsets
>   from the target in i386_gdbarch_init though I can iterate over
>   matching arches looking for a match.

I don't quite understand why storing them in tdep wouldn't work.
We get XCR0 from the coredump, not from the CPU analysing
the coredump. For live targets we would query CPUID on GDB/gdbserver.
I don't see how this would clash in your example, but maybe I missed
something in your patches.

> - The cpuid function I added in patch 3 isn't FreeBSD-specific at all
>   (and would work on i386).  I could add it to x86-nat.c instead
>   easily enough.  Even if OS's start providing a new ptrace op
>   to fetch this info we probably should ship a cpuid-based variant as
>   a fallback?

We will also need this in gdbserver, so maybe gdbsupport or nat/ is
the better place. I personally don't see a new ptrace op coming,
as ptrace won't help us with corefiles either. But that is just my guess.

> - The collect size I used in patch 3 for the XSAVE register set
>   isn't really correct.  Really if I had the "real" XSAVE register
>   set size available in the tdep (see point 1) I would not set
>   REGSET_VARIABLE_SIZE and instead use tdep->sizeof_xsave for both
>   sizes.
> 
> - I have no idea how gdbserver is impacted.  So far I haven't really
>   found similar tables to i387-tdep.c in gdbserver.  (It's also harder
>   for me to test currently as I haven't yet added FreeBSD support to
>   gdbserver).
> 

The gdbserver part is in i387-fp.cc. It is quite similar to i386-tdep.c.
There is a struct i387_xsave, which also assumes a fixed layout.

> - I haven't added any support for fetching the offsets on Linux (the
>   only other OS that supports XSAVE state currently).  I am waiting
>   to have the design a bit more finalized before doing that.
> 
> John Baldwin (4):
>   x86: Add an xsave_offsets structure to handle variable XSAVE layouts.
>   core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from
>     architectures.
>   Update x86 FreeBSD architectures to support XSAVE offsets.
>   Support XSAVE layouts for the current host in the FreeBSD/amd64
>     target.
> 
>  gdb/amd64-fbsd-nat.c      |  73 +++++
>  gdb/amd64-fbsd-tdep.c     |   8 +-
>  gdb/corelow.c             |  22 ++
>  gdb/gdbarch-components.py |  13 +
>  gdb/gdbarch-gen.h         |  10 +
>  gdb/gdbarch.c             |  32 +++
>  gdb/i386-fbsd-tdep.c      |  33 ++-
>  gdb/i386-fbsd-tdep.h      |   6 +
>  gdb/i387-tdep.c           | 592 +++++++++++++++++++++++++-------------
>  gdb/i387-tdep.h           |  22 ++
>  gdb/target.h              |   2 +
>  11 files changed, 607 insertions(+), 206 deletions(-)
> 
> --
> 2.34.1

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] Handle variable XSAVE layouts
  2022-03-17 13:17 ` [RFC PATCH 0/4] Handle variable XSAVE layouts Willgerodt, Felix
@ 2022-03-17 16:20   ` John Baldwin
  2022-03-17 18:03     ` John Baldwin
  0 siblings, 1 reply; 10+ messages in thread
From: John Baldwin @ 2022-03-17 16:20 UTC (permalink / raw)
  To: Willgerodt, Felix, gdb-patches

On 3/17/22 6:17 AM, Willgerodt, Felix wrote:
> Hi John,
> 
> We looked at this recently while working on AMX support (I hope I can post
> that soon), and decided to leave it hardcoded for now due to the missing
> information for corefiles. We weren't aware of the AMD behaviour.
> 
> You can find an old prototype we discussed internally here:
> https://github.com/intel/gdb/tree/experimental/xsave_offsets
> It is not working for corefiles, which is why we put it on hold for now.
> But it might be interesting to see. I just rebased it quickly from an old
> state, so there might be problems with the patches.

Yes, I like many aspects of this such as moving code to gdbuspport.
I do think it is perhaps more future-proof/cleaner to go ahead and
assume arbitrary offsets for each region (e.g. don't assume a single
relative offset for the 3 AVX-512 regions) which my patch 2 does,
but there are certainly many similarities.
  
> I would like to see the offset info in corefiles in the long run, as you
> mentioned in your earlier mail. To me your approach of hardcoding
> known combinations seems hard to maintain. But obviously making it
> a bit better right now, if there are no conflicting combinations.

So I view hardcoding combinations as a fallback.  I would much prefer
adding a new NT_X86_XSAVE_LAYOUT or the like.  It could be fetched for
live processes using PT_GETREGSET if desired (though using cpuid directly
which both of our patch sets do for native is fine).  The fallback would
exist to handle older core dumps without the new note.  For example, if
we had the note, then the new function in patch 3 would use that note if
it exists and only fall back to calling i387_set_xsave_offsets() when the
note isn't present.

I'm happy to come up with a scheme for the proposed NT_X86_XSAVE_LAYOUT.
A simple layout would be for it to simply contain an array of the
x86_extended_feature type from your structures, but I don't know if it
wouldn't be better to store "raw" CPUID results in case there are future
needs for other bits (like the alignment if we ever wanted to support the
compact format in userland for some reason).  To that end, perhaps a
structure like:

struct {
    uint32_t id;
    uint32_t size;    /* eax */
    uint32_t offset;  /* ebx */
    uint32_t ecx;
    uint32_t edx;
};

Where the note is just an array of those?  The total size of the state
can be inferred from the size of the NT_X86_XSTATE note.
    > I responded a bit more below.
> 
>> -----Original Message-----
>> From: Gdb-patches <gdb-patches-
>> bounces+felix.willgerodt=intel.com@sourceware.org> On Behalf Of John
>> Baldwin
>> Sent: Mittwoch, 16. März 2022 20:46
>> To: gdb-patches@sourceware.org
>> Subject: [RFC PATCH 0/4] Handle variable XSAVE layouts
>>
>> This is a first attempt at resolving the issue with XSAVE I described
>> previously.  There are more details in the commit logs, but here I think
>> will describe some caveats about the current prototype:
>>
>> - It is probably terrible performance-wise to be reading the offsets
>>    from the target every time collect/supply_xsave is called.  I'd
>>    actually much prefer to store these (along with the total XSAVE area
>>    size) in the tdep.  The issue is that you can have gdbarches with the
>>    same tdesc that use different layouts (e.g. if you open a core dump
>>    from an Intel CPU on a host with an AMD CPU, the two CPUs could have
>>    identical XCR0 masks, but the layout in the core dump wouldn't match
>>    the layout of a live process).  Perhaps if I can fetch the offsets
>>    from the target in i386_gdbarch_init though I can iterate over
>>    matching arches looking for a match.
> 
> I don't quite understand why storing them in tdep wouldn't work.
> We get XCR0 from the coredump, not from the CPU analysing
> the coredump. For live targets we would query CPUID on GDB/gdbserver.
> I don't see how this would clash in your example, but maybe I missed
> something in your patches.

The problem is that two tdep's with the same XCR0 value currently
have an identical tdesc and thus share the same 'struct gdbarch'.
However, an Intel CPU with XCR0 of 0x207 uses a different layout
than an AMD CPU with an XCR0 of 0x207.  We would thus need separate
gdbarches for those.  I think though I can make that work if I fetch
TARGET_OBJECT_X86_XSAVE_OFFSETS in i386_gdbarch_init() before this
loop:

   /* If there is already a candidate, use it.  */
   arches = gdbarch_list_lookup_by_info (arches, &info);
   if (arches != NULL)
     return arches->gdbarch;

And instead only return an existing gdbarch if it has the same XSAVE
layout.  For example, RISC-V does the following logic to handle
differences in gdbarches that aren't fully handled by the tdesc:

   /* Find a candidate among the list of pre-declared architectures.  */
   for (arches = gdbarch_list_lookup_by_info (arches, &info);
        arches != NULL;
        arches = gdbarch_list_lookup_by_info (arches->next, &info))
     {
       /* Check that the feature set of the ARCHES matches the feature set
	 we are looking for.  If it doesn't then we can't reuse this
	 gdbarch.  */
       riscv_gdbarch_tdep *other_tdep
	= (riscv_gdbarch_tdep *) gdbarch_tdep (arches->gdbarch);

       if (other_tdep->isa_features != features
	  || other_tdep->abi_features != abi_features)
	continue;

       break;
     }

   if (arches != NULL)
     return arches->gdbarch;

I think it would also be handy in this case to extend the xsave_offsets
structure to include the total size that can be used in the collect/supply
callbacks.

>> - The cpuid function I added in patch 3 isn't FreeBSD-specific at all
>>    (and would work on i386).  I could add it to x86-nat.c instead
>>    easily enough.  Even if OS's start providing a new ptrace op
>>    to fetch this info we probably should ship a cpuid-based variant as
>>    a fallback?
> 
> We will also need this in gdbserver, so maybe gdbsupport or nat/ is
> the better place. I personally don't see a new ptrace op coming,
> as ptrace won't help us with corefiles either. But that is just my guess.

I would only see the ptrace op being one that comes "for free" via
PT_GETREGSET to fetch the new core dump note.  However, I'm also happy
to just use cpuid always for native targets.

>> - The collect size I used in patch 3 for the XSAVE register set
>>    isn't really correct.  Really if I had the "real" XSAVE register
>>    set size available in the tdep (see point 1) I would not set
>>    REGSET_VARIABLE_SIZE and instead use tdep->sizeof_xsave for both
>>    sizes.
>>
>> - I have no idea how gdbserver is impacted.  So far I haven't really
>>    found similar tables to i387-tdep.c in gdbserver.  (It's also harder
>>    for me to test currently as I haven't yet added FreeBSD support to
>>    gdbserver).
>>
> 
> The gdbserver part is in i387-fp.cc. It is quite similar to i386-tdep.c.
> There is a struct i387_xsave, which also assumes a fixed layout.

Ok, I had read that, but the structure wasn't obvious to me.

>> - I haven't added any support for fetching the offsets on Linux (the
>>    only other OS that supports XSAVE state currently).  I am waiting
>>    to have the design a bit more finalized before doing that.
>>
>> John Baldwin (4):
>>    x86: Add an xsave_offsets structure to handle variable XSAVE layouts.
>>    core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from
>>      architectures.
>>    Update x86 FreeBSD architectures to support XSAVE offsets.
>>    Support XSAVE layouts for the current host in the FreeBSD/amd64
>>      target.
>>
>>   gdb/amd64-fbsd-nat.c      |  73 +++++
>>   gdb/amd64-fbsd-tdep.c     |   8 +-
>>   gdb/corelow.c             |  22 ++
>>   gdb/gdbarch-components.py |  13 +
>>   gdb/gdbarch-gen.h         |  10 +
>>   gdb/gdbarch.c             |  32 +++
>>   gdb/i386-fbsd-tdep.c      |  33 ++-
>>   gdb/i386-fbsd-tdep.h      |   6 +
>>   gdb/i387-tdep.c           | 592 +++++++++++++++++++++++++-------------
>>   gdb/i387-tdep.h           |  22 ++
>>   gdb/target.h              |   2 +
>>   11 files changed, 607 insertions(+), 206 deletions(-)
>>
>> --
>> 2.34.1
> 
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928


-- 
John Baldwin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] Handle variable XSAVE layouts
  2022-03-17 16:20   ` John Baldwin
@ 2022-03-17 18:03     ` John Baldwin
  2022-03-18 13:49       ` Willgerodt, Felix
  0 siblings, 1 reply; 10+ messages in thread
From: John Baldwin @ 2022-03-17 18:03 UTC (permalink / raw)
  To: gdb-patches

On 3/17/22 9:20 AM, John Baldwin wrote:
> On 3/17/22 6:17 AM, Willgerodt, Felix wrote:
>>> This is a first attempt at resolving the issue with XSAVE I described
>>> previously.  There are more details in the commit logs, but here I think
>>> will describe some caveats about the current prototype:
>>>
>>> - It is probably terrible performance-wise to be reading the offsets
>>>     from the target every time collect/supply_xsave is called.  I'd
>>>     actually much prefer to store these (along with the total XSAVE area
>>>     size) in the tdep.  The issue is that you can have gdbarches with the
>>>     same tdesc that use different layouts (e.g. if you open a core dump
>>>     from an Intel CPU on a host with an AMD CPU, the two CPUs could have
>>>     identical XCR0 masks, but the layout in the core dump wouldn't match
>>>     the layout of a live process).  Perhaps if I can fetch the offsets
>>>     from the target in i386_gdbarch_init though I can iterate over
>>>     matching arches looking for a match.
>>
>> I don't quite understand why storing them in tdep wouldn't work.
>> We get XCR0 from the coredump, not from the CPU analysing
>> the coredump. For live targets we would query CPUID on GDB/gdbserver.
>> I don't see how this would clash in your example, but maybe I missed
>> something in your patches.
> 
> The problem is that two tdep's with the same XCR0 value currently
> have an identical tdesc and thus share the same 'struct gdbarch'.
> However, an Intel CPU with XCR0 of 0x207 uses a different layout
> than an AMD CPU with an XCR0 of 0x207.  We would thus need separate
> gdbarches for those.  I think though I can make that work if I fetch
> TARGET_OBJECT_X86_XSAVE_OFFSETS in i386_gdbarch_init() before this
> loop:
> 
>     /* If there is already a candidate, use it.  */
>     arches = gdbarch_list_lookup_by_info (arches, &info);
>     if (arches != NULL)
>       return arches->gdbarch;
> 
> And instead only return an existing gdbarch if it has the same XSAVE
> layout.  For example, RISC-V does the following logic to handle
> differences in gdbarches that aren't fully handled by the tdesc:
> 
>     /* Find a candidate among the list of pre-declared architectures.  */
>     for (arches = gdbarch_list_lookup_by_info (arches, &info);
>          arches != NULL;
>          arches = gdbarch_list_lookup_by_info (arches->next, &info))
>       {
>         /* Check that the feature set of the ARCHES matches the feature set
> 	 we are looking for.  If it doesn't then we can't reuse this
> 	 gdbarch.  */
>         riscv_gdbarch_tdep *other_tdep
> 	= (riscv_gdbarch_tdep *) gdbarch_tdep (arches->gdbarch);
> 
>         if (other_tdep->isa_features != features
> 	  || other_tdep->abi_features != abi_features)
> 	continue;
> 
>         break;
>       }
> 
>     if (arches != NULL)
>       return arches->gdbarch;
> 
> I think it would also be handy in this case to extend the xsave_offsets
> structure to include the total size that can be used in the collect/supply
> callbacks.

I have made these changes and it does appear to work.  I do think the approach
of a new TARGET_OBJECT will work (and it will support adding a new
NT_X86_XSAVE_LAYOUT or the like in the future to better support core dumps).
If you agree with that part of the design (and storing it in the tdep, etc.
I can try to merge in parts of your patchset (in particular, moving some
things to gdbsupport and similar gdbserver patches) or I'm happy to let you
drive.  I will send a V2 with the changes to store the layout in the tdep.

-- 
John Baldwin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC PATCH 0/4] Handle variable XSAVE layouts
  2022-03-17 18:03     ` John Baldwin
@ 2022-03-18 13:49       ` Willgerodt, Felix
  2022-03-18 17:27         ` John Baldwin
  0 siblings, 1 reply; 10+ messages in thread
From: Willgerodt, Felix @ 2022-03-18 13:49 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

Hi John,

See comments inline.

> -----Original Message-----
> From: Gdb-patches <gdb-patches-
> bounces+felix.willgerodt=intel.com@sourceware.org> On Behalf Of John
> Baldwin
> Sent: Donnerstag, 17. März 2022 19:03
> To: gdb-patches@sourceware.org
> Subject: Re: [RFC PATCH 0/4] Handle variable XSAVE layouts
> 
> On 3/17/22 9:20 AM, John Baldwin wrote:
> > On 3/17/22 6:17 AM, Willgerodt, Felix wrote:
> >>> This is a first attempt at resolving the issue with XSAVE I described
> >>> previously.  There are more details in the commit logs, but here I think
> >>> will describe some caveats about the current prototype:
> >>>
> >>> - It is probably terrible performance-wise to be reading the offsets
> >>>     from the target every time collect/supply_xsave is called.  I'd
> >>>     actually much prefer to store these (along with the total XSAVE area
> >>>     size) in the tdep.  The issue is that you can have gdbarches with the
> >>>     same tdesc that use different layouts (e.g. if you open a core dump
> >>>     from an Intel CPU on a host with an AMD CPU, the two CPUs could
> have
> >>>     identical XCR0 masks, but the layout in the core dump wouldn't match
> >>>     the layout of a live process).  Perhaps if I can fetch the offsets
> >>>     from the target in i386_gdbarch_init though I can iterate over
> >>>     matching arches looking for a match.
> >>
> >> I don't quite understand why storing them in tdep wouldn't work.
> >> We get XCR0 from the coredump, not from the CPU analysing
> >> the coredump. For live targets we would query CPUID on GDB/gdbserver.
> >> I don't see how this would clash in your example, but maybe I missed
> >> something in your patches.
> >
> > The problem is that two tdep's with the same XCR0 value currently
> > have an identical tdesc and thus share the same 'struct gdbarch'.
> > However, an Intel CPU with XCR0 of 0x207 uses a different layout
> > than an AMD CPU with an XCR0 of 0x207.  We would thus need separate
> > gdbarches for those. 

Just out of curiosity: If we wouldn't implement i387_set_xsave_layout(),
and read that info from CPUID and the corefile (once that note exists),
would we still need this?

>> I think though I can make that work if I fetch
> > TARGET_OBJECT_X86_XSAVE_OFFSETS in i386_gdbarch_init() before this
> > loop:
> >
> >     /* If there is already a candidate, use it.  */
> >     arches = gdbarch_list_lookup_by_info (arches, &info);
> >     if (arches != NULL)
> >       return arches->gdbarch;
> >
> > And instead only return an existing gdbarch if it has the same XSAVE
> > layout.  For example, RISC-V does the following logic to handle
> > differences in gdbarches that aren't fully handled by the tdesc:
> >
> >     /* Find a candidate among the list of pre-declared architectures.  */
> >     for (arches = gdbarch_list_lookup_by_info (arches, &info);
> >          arches != NULL;
> >          arches = gdbarch_list_lookup_by_info (arches->next, &info))
> >       {
> >         /* Check that the feature set of the ARCHES matches the feature set
> > 	 we are looking for.  If it doesn't then we can't reuse this
> > 	 gdbarch.  */
> >         riscv_gdbarch_tdep *other_tdep
> > 	= (riscv_gdbarch_tdep *) gdbarch_tdep (arches->gdbarch);
> >
> >         if (other_tdep->isa_features != features
> > 	  || other_tdep->abi_features != abi_features)
> > 	continue;
> >
> >         break;
> >       }
> >
> >     if (arches != NULL)
> >       return arches->gdbarch;
> >
> > I think it would also be handy in this case to extend the xsave_offsets
> > structure to include the total size that can be used in the collect/supply
> > callbacks.
> 
> I have made these changes and it does appear to work.  I do think the
> approach
> of a new TARGET_OBJECT will work (and it will support adding a new
> NT_X86_XSAVE_LAYOUT or the like in the future to better support core
> dumps).
> If you agree with that part of the design (and storing it in the tdep, etc.
> I can try to merge in parts of your patchset (in particular, moving some
> things to gdbsupport and similar gdbserver patches) or I'm happy to let you
> drive.  I will send a V2 with the changes to store the layout in the tdep.
> 
> --
> John Baldwin


Please feel free to take (and adjust) any code or idea from our patchset
that you like. I just posted it trying to be helpful.


I must admit I am not so sure about your approach. Yes it helps
now. But assume in the future there is a 64-byte component Y and a 64-byte
component X. What happens if one CPU drops X and another drops Y?
We could have the same XCR0 and same XSAVE size and no way to 
distinguish the layouts.
To me there is no real alternative to getting everything from CPUID.
I personally would at least like to see CPUID implemented for live
processes, and a huge comment that this is a last-resort fallback
for corefiles, that could fail. Until we have the corefile stuff figured out.

That said, I am no maintainer and not the right person for deciding.
This is just my point of view.


I looked at the Intel software development manual a bit. There is a
compacted xsave format and a standard format. In GDB we
never check which one is used and assume standard format, afaik.
(I think we have even dropped that information when we are in
I387-tdep.c.)

SDM:
"Standard format. Each state component i (i ≥ 2) is located at the byte
offset from the base address of the XSAVE area enumerated in
CPUID.(EAX=0DH,ECX=i):EBX. (CPUID.(EAX=0DH,ECX=i):EAX enumerates
The number of bytes required for state component I"

MPX is i = 3 (and 4).

That said, MPX is marked as deprecated and I don't know if that
has any influence.

Judging by your XCR0 values in your earlier email, you are not in
compacted mode, right? Could you check the CPUID leaves for MPX?
I wonder what they report.

Felix
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] Handle variable XSAVE layouts
  2022-03-18 13:49       ` Willgerodt, Felix
@ 2022-03-18 17:27         ` John Baldwin
  0 siblings, 0 replies; 10+ messages in thread
From: John Baldwin @ 2022-03-18 17:27 UTC (permalink / raw)
  To: Willgerodt, Felix, gdb-patches

On 3/18/22 6:49 AM, Willgerodt, Felix wrote:
> Hi John,
> 
> See comments inline.
> 
>> -----Original Message-----
>> From: Gdb-patches <gdb-patches-
>> bounces+felix.willgerodt=intel.com@sourceware.org> On Behalf Of John
>> Baldwin
>> Sent: Donnerstag, 17. März 2022 19:03
>> To: gdb-patches@sourceware.org
>> Subject: Re: [RFC PATCH 0/4] Handle variable XSAVE layouts
>>
>> On 3/17/22 9:20 AM, John Baldwin wrote:
>>> On 3/17/22 6:17 AM, Willgerodt, Felix wrote:
>>>>> This is a first attempt at resolving the issue with XSAVE I described
>>>>> previously.  There are more details in the commit logs, but here I think
>>>>> will describe some caveats about the current prototype:
>>>>>
>>>>> - It is probably terrible performance-wise to be reading the offsets
>>>>>      from the target every time collect/supply_xsave is called.  I'd
>>>>>      actually much prefer to store these (along with the total XSAVE area
>>>>>      size) in the tdep.  The issue is that you can have gdbarches with the
>>>>>      same tdesc that use different layouts (e.g. if you open a core dump
>>>>>      from an Intel CPU on a host with an AMD CPU, the two CPUs could
>> have
>>>>>      identical XCR0 masks, but the layout in the core dump wouldn't match
>>>>>      the layout of a live process).  Perhaps if I can fetch the offsets
>>>>>      from the target in i386_gdbarch_init though I can iterate over
>>>>>      matching arches looking for a match.
>>>>
>>>> I don't quite understand why storing them in tdep wouldn't work.
>>>> We get XCR0 from the coredump, not from the CPU analysing
>>>> the coredump. For live targets we would query CPUID on GDB/gdbserver.
>>>> I don't see how this would clash in your example, but maybe I missed
>>>> something in your patches.
>>>
>>> The problem is that two tdep's with the same XCR0 value currently
>>> have an identical tdesc and thus share the same 'struct gdbarch'.
>>> However, an Intel CPU with XCR0 of 0x207 uses a different layout
>>> than an AMD CPU with an XCR0 of 0x207.  We would thus need separate
>>> gdbarches for those.
> 
> Just out of curiosity: If we wouldn't implement i387_set_xsave_layout(),
> and read that info from CPUID and the corefile (once that note exists),
> would we still need this?

Once corefile support exists, this i387_set_xsave_layout() function won't
be used.  To be clear, in the current patches (which don't yet include all
of the Linux-specific changes), CPUID is used for native targets.  This
function is only used for coredumps, and only in OS-specific gdbarch
method to read the XSAVE layout from a core dump.  Once OS's add a new
core dump note to describe the XSAVE layout, those OS-specific gdbarch
methods can read that core dump note instead and will only call
i387_set_xsave_layout() for older core dumps without the note.

I'm happy to work on the core dump note and I can implement it in
FreeBSD easily.  I think it would be nice to agree on a common format that
is also used in Linux, but I'm not a Linux developer and would need someone
else to work on implementing the core dump note in Linux.

>>> I think though I can make that work if I fetch
>>> TARGET_OBJECT_X86_XSAVE_OFFSETS in i386_gdbarch_init() before this
>>> loop:
>>>
>>>      /* If there is already a candidate, use it.  */
>>>      arches = gdbarch_list_lookup_by_info (arches, &info);
>>>      if (arches != NULL)
>>>        return arches->gdbarch;
>>>
>>> And instead only return an existing gdbarch if it has the same XSAVE
>>> layout.  For example, RISC-V does the following logic to handle
>>> differences in gdbarches that aren't fully handled by the tdesc:
>>>
>>>      /* Find a candidate among the list of pre-declared architectures.  */
>>>      for (arches = gdbarch_list_lookup_by_info (arches, &info);
>>>           arches != NULL;
>>>           arches = gdbarch_list_lookup_by_info (arches->next, &info))
>>>        {
>>>          /* Check that the feature set of the ARCHES matches the feature set
>>> 	 we are looking for.  If it doesn't then we can't reuse this
>>> 	 gdbarch.  */
>>>          riscv_gdbarch_tdep *other_tdep
>>> 	= (riscv_gdbarch_tdep *) gdbarch_tdep (arches->gdbarch);
>>>
>>>          if (other_tdep->isa_features != features
>>> 	  || other_tdep->abi_features != abi_features)
>>> 	continue;
>>>
>>>          break;
>>>        }
>>>
>>>      if (arches != NULL)
>>>        return arches->gdbarch;
>>>
>>> I think it would also be handy in this case to extend the xsave_offsets
>>> structure to include the total size that can be used in the collect/supply
>>> callbacks.
>>
>> I have made these changes and it does appear to work.  I do think the
>> approach
>> of a new TARGET_OBJECT will work (and it will support adding a new
>> NT_X86_XSAVE_LAYOUT or the like in the future to better support core
>> dumps).
>> If you agree with that part of the design (and storing it in the tdep, etc.
>> I can try to merge in parts of your patchset (in particular, moving some
>> things to gdbsupport and similar gdbserver patches) or I'm happy to let you
>> drive.  I will send a V2 with the changes to store the layout in the tdep.
>>
>> --
>> John Baldwin
> 
> 
> Please feel free to take (and adjust) any code or idea from our patchset
> that you like. I just posted it trying to be helpful.
> 
> 
> I must admit I am not so sure about your approach. Yes it helps
> now. But assume in the future there is a 64-byte component Y and a 64-byte
> component X. What happens if one CPU drops X and another drops Y?
> We could have the same XCR0 and same XSAVE size and no way to
> distinguish the layouts.
> To me there is no real alternative to getting everything from CPUID.
> I personally would at least like to see CPUID implemented for live
> processes, and a huge comment that this is a last-resort fallback
> for corefiles, that could fail. Until we have the corefile stuff figured out.

I have tried to make that apparent in the commit log in the V2 of the series
I posted (and have split the commits up a bit further so that it is clear
that the intention is to only use the static layouts as a fallback for
core dumps, not in native targets).

> That said, I am no maintainer and not the right person for deciding.
> This is just my point of view.

I am not a global maintainer either FWIW.

> I looked at the Intel software development manual a bit. There is a
> compacted xsave format and a standard format. In GDB we
> never check which one is used and assume standard format, afaik.
> (I think we have even dropped that information when we are in
> I387-tdep.c.)

Yes, GDB currently assumes the standard format.  FreeBSD doesn't currently
make use of the compacted format, and Linux is careful to convert state
saved in the compact format to the standard format before exporting the
XSAVE state either via ptrace() or via core dump notes.

That said, there may come a day when OS's may want to export data in
the compacted format.  In that case we may want to compute a different
set of offsets for each state component.  That is the reason I'm inclined
to make the core dump note store all of the information for each active
leaf (all 4 registers from cpuid) so that they can be used to compute
the layout of the compacted format (rather than just storing the
ID, size, and standard offset).
  
> Judging by your XCR0 values in your earlier email, you are not in
> compacted mode, right? Could you check the CPUID leaves for MPX?
> I wonder what they report.

The AMD CPU in this case doesn't implement MPX so bits 3 and 4 are always
zero in XCR0.  Per the SDM for CPUID, the cpuid leaves return a size and
offset of 0 for those leaves.  This CPU also doesn't support AVX512, so
it reports 0's for those three leaves as well.

-- 
John Baldwin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-18 17:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 19:46 [RFC PATCH 0/4] Handle variable XSAVE layouts John Baldwin
2022-03-16 19:46 ` [RFC PATCH 1/4] x86: Add an xsave_offsets structure to handle " John Baldwin
2022-03-16 19:46 ` [RFC PATCH 2/4] core: Support fetching TARGET_OBJECT_X86_XSAVE_OFFSETS from architectures John Baldwin
2022-03-16 19:46 ` [RFC PATCH 3/4] Update x86 FreeBSD architectures to support XSAVE offsets John Baldwin
2022-03-16 19:46 ` [RFC PATCH 4/4] Support XSAVE layouts for the current host in the FreeBSD/amd64 target John Baldwin
2022-03-17 13:17 ` [RFC PATCH 0/4] Handle variable XSAVE layouts Willgerodt, Felix
2022-03-17 16:20   ` John Baldwin
2022-03-17 18:03     ` John Baldwin
2022-03-18 13:49       ` Willgerodt, Felix
2022-03-18 17:27         ` John Baldwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).