From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by sourceware.org (Postfix) with ESMTPS id 87641384514A for ; Fri, 5 Apr 2024 12:58:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 87641384514A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 87641384514A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712321905; cv=none; b=iOFfTU+OmevhUFE/N60h8gzLKuP6pFqh+V4ygFazXc7S9pWA2cr3A6V3cN6KaQXN5etrW4jfJJY8t0yCQxC8VaMiqQNMZ5J+laVAjS6xA7yuLdWvBSPleWXDbZlmAKha6AfNKRcfbz3a0US9lZ3XluFBv+GT4hzoNKWCq5ap4/I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712321905; c=relaxed/simple; bh=zVWNPGTDg3z/LwiI9SztidqPw62mKYVMeQpUWvyajgc=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Bq7x8M4RBSbK/lBHccmoDsy0PsfU5FE30K5OxZhPELFxG1hxBLXWafLrUqBIz7A78kpUg4mnuvkjK9kYXk1tJingIoAZrArVdfRM4i6BH+dgn2WOLFvGFM16rX1DsQn6kNYGPvpEvGBsefxSS3mJInCINh2m1j+AnN5jJGmp5jU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-dc6d9a8815fso2376944276.3 for ; Fri, 05 Apr 2024 05:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712321899; x=1712926699; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9IshauEA2v6dUiCnYniCykUO/jtVd3ist5GQu5gOmlg=; b=j0piQzffBx+3RA7qDHLmw/2/t0leiSTufvroI1CIXQZ+r8F7BpBVg2zZbn7j7y9cAp xO8l+I6jQy1YGrp3LbjUjDehdf4/1ShHKbvGC5Fx0atdgVWsEn77SE12yQZEJReVUecC U+wNVVwi6RJBhuRXS08aKl8J2tcsPsmcsv+UhLVJMi/0Tzj9KIyMp3gg0XMJja1XjxwK FS4y2csqzutU0T9X7fI3y9QI1ExhTo1StKNYq9M8gYk6ykLOAakHsUFEVbthfZ9eWzMf qsOrrNNCPfoKJ4Qdtqt5Eu/DGP5lgwJos/buR9xg9upEYd5KpjPixwKyFnK6yF95f2mK /xBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712321899; x=1712926699; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9IshauEA2v6dUiCnYniCykUO/jtVd3ist5GQu5gOmlg=; b=MjkC7FDqkzBkfcNskqSxlv0nNswL0LI9j87hHIOgGwRv0dKopy3GCIBU8mi6xXM2pi cnpg9ssZ5IWoaBDc//tfj7cjrA1hleOjd2f1Jn7ZQ+thc0rBU6W1sJDT9ZKiG95/bfUg 8w5eng+v1GSCC/gXmXzKHpZDPQDmDLFB1H3zDg9DhKlF1WcDOKjnSOOWt4AuaeKlMv0a fGhOOZWGpVORBdyxmR3z46hhAtrFBFZY3Tgcr08DnvtaNRBxdmjP66CAVDMDuElqkZZl 46IcNCorPaWVChFvbwYj65vSM12LfHa2CaaLwwExAduHJCQhSeJ7tZYPRMf8LosMM+Xk VlHQ== X-Gm-Message-State: AOJu0Yw4dCVeOYw4u7ibexMYflV8Ge9jirE32o2hI60H+tjrX0VD1QJF gge8QY4BomWwHuzvnoDrjEDYUOfgGtV09qgTRs5KXe0mJy+C5DPJ/1dUKvUKSfwK47+k3Z1//lZ XqbUlulJlSkvk0pW++5KvQ7+Og+oi8lSr X-Google-Smtp-Source: AGHT+IHFUYebLJKwmztLRFJ3tgGpU+m53N3DQx4oFgbvVMMMj+zzT7fCTnNHY/m7iDLixbHY13mOdI+AWJTYyaM4OGg= X-Received: by 2002:a25:9c42:0:b0:dcf:30dc:127c with SMTP id x2-20020a259c42000000b00dcf30dc127cmr1023426ybo.18.1712321898620; Fri, 05 Apr 2024 05:58:18 -0700 (PDT) MIME-Version: 1.0 References: <58f5f6cd41d2593f9a040dc4c190103d967f0787.1712312063.git.fweimer@redhat.com> In-Reply-To: <58f5f6cd41d2593f9a040dc4c190103d967f0787.1712312063.git.fweimer@redhat.com> From: "H.J. Lu" Date: Fri, 5 Apr 2024 05:57:42 -0700 Message-ID: Subject: Re: [PATCH v3 2/3] x86: Add generic CPUID data dumper to ld.so --list-diagnostics To: Florian Weimer Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3019.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Apr 5, 2024 at 3:17=E2=80=AFAM Florian Weimer = wrote: > > This is surprisingly difficult to implement if the goal is to produce > reasonably sized output. With the current approaches to output > compression (suppressing zeros and repeated results between CPUs, > folding ranges of identical subleaves, dealing with the %ecx > reflection issue), the output is less than 600 KiB even for systems > with 256 logical CPUs. > --- > manual/dynlink.texi | 85 ++++++- > sysdeps/x86/dl-diagnostics-cpu.c | 384 +++++++++++++++++++++++++++++++ > 2 files changed, 468 insertions(+), 1 deletion(-) > > diff --git a/manual/dynlink.texi b/manual/dynlink.texi > index 06a6c15533..f2f2341818 100644 > --- a/manual/dynlink.texi > +++ b/manual/dynlink.texi > @@ -228,7 +228,90 @@ reported by the @code{uname} function. @xref{Platfo= rm Type}. > @item x86.cpu_features.@dots{} > These items are specific to the i386 and x86-64 architectures. They > reflect supported CPU features and information on cache geometry, mostly > -collected using the @code{CPUID} instruction. > +collected using the CPUID instruction. > + > +@item x86.processor[@var{index}].@dots{} > +These are additional items for the i386 and x86-64 architectures, as > +described below. They mostly contain raw data from the CPUID > +instruction. The probes are performed for each active CPU for the > +@code{ld.so} process, and data for different probed CPUs receives a > +uniqe @var{index} value. Some CPUID data is expected to differ from CPU > +core to CPU core. In some cases, CPUs are not correctly initialized and > +indicate the presence of different feature sets. > + > +@item x86.processor[@var{index}].requested=3D@var{kernel-cpu} > +The kernel is told to run the subsequent probing on the CPU numbered > +@var{kernel-cpu}. The values @var{kernel-cpu} and @var{index} can be > +distinct if there are gaps in the process CPU affinity mask. This line > +is not included if CPU affinity mask information is not available. > + > +@item x86.processor[@var{index}].observed=3D@var{kernel-cpu} > +This line reports the kernel CPU number @var{kernel-cpu} on which the > +probing code initially ran. If the CPU number cannot be obtained, > +this line is not printed. > + > +@item x86.processor[@var{index}].observed_node=3D@var{node} > +This reports the observed NUMA node number, as reported by the > +@code{getcpu} system call. If this information cannot be obtained, this > +line is not printed. > + > +@item x86.processor[@var{index}].cpuid_leaves=3D@var{count} > +This line indicates that @var{count} distinct CPUID leaves were > +encountered. (This reflects internal @code{ld.so} storage space, it > +does not directly correspond to @code{CPUID} enumeration ranges.) > + > +@item x86.processor[@var{index}].ecx_limit=3D@var{value} > +The CPUID data extraction code uses a brute-force approach to enumerate > +subleaves (see the @samp{.subleaf_eax} lines below). The last > +@code{%rcx} value used in a CPUID query on this probed CPU was > +@var{value}. > + > +@item x86.processor[@var{index}].cpuid.eax[@var{query_eax}].eax=3D@var{e= ax} > +@itemx x86.processor[@var{index}].cpuid.eax[@var{query_eax}].ebx=3D@var{= ebx} > +@itemx x86.processor[@var{index}].cpuid.eax[@var{query_eax}].ecx=3D@var{= ecx} > +@itemx x86.processor[@var{index}].cpuid.eax[@var{query_eax}].edx=3D@var{= edx} > +These lines report the register contents after executing the CPUID > +instruction with @samp{%rax =3D=3D @var{query_eax}} and @samp{%rcx =3D= =3D 0} (a > +@dfn{leaf}). For the first probed CPU (with a zero @var{index}), only > +leaves with non-zero register contents are reported. For subsequent > +CPUs, only leaves whose register contents differs from the previously > +probed CPUs (with @var{index} one less) are reported. > + > +Basic and extended leaves are reported using the same syntax. This > +means there is a large jump in @var{query_eax} for the first reported > +extended leaf. > + > +@item x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx[= @var{query_ecx}].eax=3D@var{eax} > +@itemx x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx= [@var{query_ecx}].ebx=3D@var{ebx} > +@itemx x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx= [@var{query_ecx}].ecx=3D@var{ecx} > +@itemx x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx= [@var{query_ecx}].edx=3D@var{edx} > +This is similar to the leaves above, but for a @dfn{subleaf}. For > +subleaves, the CPUID instruction is executed with @samp{%rax =3D=3D > +@var{query_eax}} and @samp{%rcx =3D=3D @var{query_ecx}}, so the result > +depends on both register values. The same rules about filtering zero > +and identical results apply. > + > +@item x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx[= @var{query_ecx}].until_ecx=3D@var{ecx_limit} > +Some CPUID results are the same regardless the @var{query_ecx} value. > +If this situation is detected, a line with the @samp{.until_ecx} > +selector ins included, and this indicates that the CPUID register > +contents is the same for @code{%rcx} values between @var{query_ecx} > +and @var{ecx_limit} (inclusive). > + > +@item x86.processor[@var{index}].cpuid.subleaf_eax[@var{query_eax}].ecx[= @var{query_ecx}].ecx_query_mask=3D0xff > +This line indicates that in an @samp{.until_ecx} range, the CPUID > +instruction preserved the lowested 8 bits of the input @code{%rcx} in > +the output @code{%rcx} registers. Otherwise, the subleaves in the range > +have identical values. This special treatment is necessary to report > +compact range information in case such copying occurs (because the > +subleaves would otherwise be all different). > + > +@item x86.processor[@var{index}].xgetbv.ecx[@var{query_ecx}]=3D@var{resu= lt} > +This line shows the 64-bit @var{result} value in the @code{%rdx:%rax} > +register pair after executing the XGETBV instruction with @code{%rcx} > +set to @var{query_ecx}. Zero values and values matching the previously > +probed CPU are omitted. Nothing is printed if the system does not > +support the XGETBV instruction. > @end table > > @node Dynamic Linker Introspection > diff --git a/sysdeps/x86/dl-diagnostics-cpu.c b/sysdeps/x86/dl-diagnostic= s-cpu.c > index c76ea3be16..ceafde9481 100644 > --- a/sysdeps/x86/dl-diagnostics-cpu.c > +++ b/sysdeps/x86/dl-diagnostics-cpu.c > @@ -17,7 +17,18 @@ > . */ > > #include > + > +#include > +#include > +#include > +#include > #include > +#include > +#include > +#include > + > +/* The generic CPUID dumping code. */ > +static void _dl_diagnostics_cpuid (void); > > static void > print_cpu_features_value (const char *label, uint64_t value) > @@ -120,4 +131,377 @@ _dl_diagnostics_cpu (void) > + sizeof (cpu_features->cachesize_non_temporal_divisor) > =3D=3D sizeof (*cpu_features), > "last cpu_features field has been printed"); > + > + _dl_diagnostics_cpuid (); > +} > + > +/* The following code implements a generic CPUID dumper that tries to > + gather CPUID data without knowing about CPUID implementation > + details. */ > + > +/* Register arguments to CPUID. Multiple ECX subleaf values yielding > + the same result are combined, to shorten the output. Both > + identical matches (EAX to EDX are the same) and matches where EAX, > + EBX, EDX, and ECX are equal except in the lower byte, which must > + match the query ECX value. The latter is needed to compress ranges > + on CPUs which preserve the lowest byte in ECX if an unknown leaf is > + queried. */ > +struct cpuid_query > +{ > + unsigned int eax; > + unsigned ecx_first; > + unsigned ecx_last; > + bool ecx_preserves_query_byte; > +}; > + > +/* Single integer value that can be used for sorting/ordering > + comparisons. Uses Q->eax and Q->ecx_first only because ecx_last is > + always greater than the previous ecx_first value and less than the > + subsequent one. */ > +static inline unsigned long long int > +cpuid_query_combined (struct cpuid_query *q) > +{ > + /* ecx can be -1 (that is, ~0U). If this happens, this the only ecx > + value for this eax value, so the ordering does not matter. */ > + return ((unsigned long long int) q->eax << 32) | (unsigned int) q->ecx= _first; > +}; > + > +/* Used for differential reporting of zero/non-zero values. */ > +static const struct cpuid_registers cpuid_registers_zero; > + > +/* Register arguments to CPUID paired with the results that came back. = */ > +struct cpuid_query_result > +{ > + struct cpuid_query q; > + struct cpuid_registers r; > +}; > + > +/* During a first enumeration pass, we try to collect data for > + cpuid_initial_subleaf_limit subleaves per leaf/EAX value. If we run > + out of space, we try once more with applying the lower limit. */ > +enum { cpuid_main_leaf_limit =3D 128 }; > +enum { cpuid_initial_subleaf_limit =3D 512 }; > +enum { cpuid_subleaf_limit =3D 32 }; > + > +/* Offset of the extended leaf area. */ > +enum {cpuid_extended_leaf_offset =3D 0x80000000 }; > + > +/* Collected CPUID data. Everything is stored in a statically sized > + array that is sized so that the second pass will collect some data > + for all leaves, after the limit is applied. On the second pass, > + ecx_limit is set to cpuid_subleaf_limit. */ > +struct cpuid_collected_data > +{ > + unsigned int used; > + unsigned int ecx_limit; > + uint64_t xgetbv_ecx_0; > + struct cpuid_query_result qr[cpuid_main_leaf_limit > + * 2 * cpuid_subleaf_limit]; > +}; > + > +/* Fill in the result of a CPUID query. Returns true if there is > + room, false if nothing could be stored. */ > +static bool > +_dl_diagnostics_cpuid_store (struct cpuid_collected_data *ccd, > + unsigned eax, int ecx) > +{ > + if (ccd->used >=3D array_length (ccd->qr)) > + return false; > + > + /* Tentatively fill in the next value. */ > + __cpuid_count (eax, ecx, > + ccd->qr[ccd->used].r.eax, > + ccd->qr[ccd->used].r.ebx, > + ccd->qr[ccd->used].r.ecx, > + ccd->qr[ccd->used].r.edx); > + > + /* If the ECX subleaf is next subleaf after the previous one (for > + the same leaf), and the values are the same, merge the result > + with the already-stored one. Do this before skipping zero > + leaves, which avoids artifiacts for ECX =3D=3D 256 queries. */ > + if (ccd->used > 0 > + && ccd->qr[ccd->used - 1].q.eax =3D=3D eax > + && ccd->qr[ccd->used - 1].q.ecx_last + 1 =3D=3D ecx) > + { > + /* Exact match of the previous result. Ignore the value of > + ecx_preserves_query_byte if this is a singleton range so far > + because we can treat ECX as fixed if the same value repeats. *= / > + if ((!ccd->qr[ccd->used - 1].q.ecx_preserves_query_byte > + || (ccd->qr[ccd->used - 1].q.ecx_first > + =3D=3D ccd->qr[ccd->used - 1].q.ecx_last)) > + && memcmp (&ccd->qr[ccd->used - 1].r, &ccd->qr[ccd->used].r, > + sizeof (ccd->qr[ccd->used].r)) =3D=3D 0) > + { > + ccd->qr[ccd->used - 1].q.ecx_last =3D ecx; > + /* ECX is now fixed because the same value has been observed > + twice, even if we had a low-byte match before. */ > + ccd->qr[ccd->used - 1].q.ecx_preserves_query_byte =3D false; > + return true; > + } > + /* Match except for the low byte in ECX, which must match the > + incoming ECX value. */ > + if (ccd->qr[ccd->used - 1].q.ecx_preserves_query_byte > + && (ecx & 0xff) =3D=3D (ccd->qr[ccd->used].r.ecx & 0xff) > + && ccd->qr[ccd->used].r.eax =3D=3D ccd->qr[ccd->used - 1].r.ea= x > + && ccd->qr[ccd->used].r.ebx =3D=3D ccd->qr[ccd->used - 1].r.eb= x > + && ((ccd->qr[ccd->used].r.ecx & 0xffffff00) > + =3D=3D (ccd->qr[ccd->used - 1].r.ecx & 0xffffff00)) > + && ccd->qr[ccd->used].r.edx =3D=3D ccd->qr[ccd->used - 1].r.ed= x) > + { > + ccd->qr[ccd->used - 1].q.ecx_last =3D ecx; > + return true; > + } > + } > + > + /* Do not store zero results. All-zero values usually mean that the > + subleaf is unsupported. */ > + if (ccd->qr[ccd->used].r.eax =3D=3D 0 > + && ccd->qr[ccd->used].r.ebx =3D=3D 0 > + && ccd->qr[ccd->used].r.ecx =3D=3D 0 > + && ccd->qr[ccd->used].r.edx =3D=3D 0) > + return true; > + > + /* The result needs to be stored. Fill in the query parameters and > + consume the storage. */ > + ccd->qr[ccd->used].q.eax =3D eax; > + ccd->qr[ccd->used].q.ecx_first =3D ecx; > + ccd->qr[ccd->used].q.ecx_last =3D ecx; > + ccd->qr[ccd->used].q.ecx_preserves_query_byte > + =3D (ecx & 0xff) =3D=3D (ccd->qr[ccd->used].r.ecx & 0xff); > + ++ccd->used; > + return true; > +} > + > +/* Collected CPUID data into *CCD. If LIMIT, apply per-leaf limits to > + avoid exceeding the pre-allocated space. Return true if all data > + could be stored, false if the retrying without a limit is > + requested. */ > +static bool > +_dl_diagnostics_cpuid_collect_1 (struct cpuid_collected_data *ccd, bool = limit) > +{ > + ccd->used =3D 0; > + ccd->ecx_limit > + =3D (limit ? cpuid_subleaf_limit : cpuid_initial_subleaf_limit) - 1; > + _dl_diagnostics_cpuid_store (ccd, 0x00, 0x00); > + if (ccd->used =3D=3D 0) > + /* CPUID reported all 0. Should not happen. */ > + return true; > + unsigned int maximum_leaf =3D ccd->qr[0x00].r.eax; > + if (limit && maximum_leaf >=3D cpuid_main_leaf_limit) > + maximum_leaf =3D cpuid_main_leaf_limit - 1; > + > + for (unsigned int eax =3D 1; eax <=3D maximum_leaf; ++eax) > + { > + for (unsigned int ecx =3D 0; ecx <=3D ccd->ecx_limit; ++ecx) > + if (!_dl_diagnostics_cpuid_store (ccd, eax, ecx)) > + return false; > + } > + > + if (!_dl_diagnostics_cpuid_store (ccd, cpuid_extended_leaf_offset, 0x0= 0)) > + return false; > + maximum_leaf =3D ccd->qr[ccd->used - 1].r.eax; > + if (maximum_leaf < cpuid_extended_leaf_offset) > + /* No extended CPUID information. */ > + return true; > + if (limit > + && maximum_leaf - cpuid_extended_leaf_offset >=3D cpuid_main_leaf_= limit) > + maximum_leaf =3D cpuid_extended_leaf_offset + cpuid_main_leaf_limit = - 1; > + for (unsigned int eax =3D cpuid_extended_leaf_offset + 1; > + eax <=3D maximum_leaf; ++eax) > + { > + for (unsigned int ecx =3D 0; ecx <=3D ccd->ecx_limit; ++ecx) > + if (!_dl_diagnostics_cpuid_store (ccd, eax, ecx)) > + return false; > + } > + return true; > +} > + > +/* Call _dl_diagnostics_cpuid_collect_1 twice if necessary, the > + second time with the limit applied. */ > +static void > +_dl_diagnostics_cpuid_collect (struct cpuid_collected_data *ccd) > +{ > + if (!_dl_diagnostics_cpuid_collect_1 (ccd, false)) > + _dl_diagnostics_cpuid_collect_1 (ccd, true); > + > + /* Re-use the result of the official feature probing here. */ > + const struct cpu_features *cpu_features =3D __get_cpu_features (); > + if (CPU_FEATURES_CPU_P (cpu_features, OSXSAVE)) > + { > + unsigned int xcrlow; > + unsigned int xcrhigh; > + asm ("xgetbv" : "=3Da" (xcrlow), "=3Dd" (xcrhigh) : "c" (0)); > + ccd->xgetbv_ecx_0 =3D ((uint64_t) xcrhigh << 32) + xcrlow; > + } > + else > + ccd->xgetbv_ecx_0 =3D 0; > +} > + > +/* Print a CPUID register value (passed as REG_VALUE) if it differs > + from the expected REG_REFERENCE value. PROCESSOR_INDEX is the > + process sequence number (always starting at zero; not a kernel ID). = */ > +static void > +_dl_diagnostics_cpuid_print_reg (unsigned int processor_index, > + const struct cpuid_query *q, > + const char *reg_label, unsigned int reg= _value, > + bool subleaf) > +{ > + if (subleaf) > + _dl_printf ("x86.processor[0x%x].cpuid.subleaf_eax[0x%x]" > + ".ecx[0x%x].%s=3D0x%x\n", > + processor_index, q->eax, q->ecx_first, reg_label, reg_va= lue); > + else > + _dl_printf ("x86.processor[0x%x].cpuid.eax[0x%x].%s=3D0x%x\n", > + processor_index, q->eax, reg_label, reg_value); > +} > + > +/* Print CPUID result values in *RESULT for the query in > + CCD->qr[CCD_IDX]. PROCESSOR_INDEX is the process sequence number > + (always starting at zero; not a kernel ID). */ > +static void > +_dl_diagnostics_cpuid_print_query (unsigned int processor_index, > + struct cpuid_collected_data *ccd, > + unsigned int ccd_idx, > + const struct cpuid_registers *result) > +{ > + /* Treat this as a value if subleaves if ecx isn't zero (maybe > + within the [ecx_fist, ecx_last] range), or if eax matches its > + neighbors. If the range is [0, ecx_limit], then the subleaves > + are not distinct (independently of ecx_preserves_query_byte), > + so do not report them separately. */ > + struct cpuid_query *q =3D &ccd->qr[ccd_idx].q; > + bool subleaf =3D (q->ecx_first > 0 > + || (q->ecx_first !=3D q->ecx_last > + && !(q->ecx_first =3D=3D 0 && q->ecx_last =3D=3D c= cd->ecx_limit)) > + || (ccd_idx > 0 && q->eax =3D=3D ccd->qr[ccd_idx - 1].= q.eax) > + || (ccd_idx + 1 < ccd->used > + && q->eax =3D=3D ccd->qr[ccd_idx + 1].q.eax)); > + _dl_diagnostics_cpuid_print_reg (processor_index, q, "eax", result->ea= x, > + subleaf); > + _dl_diagnostics_cpuid_print_reg (processor_index, q, "ebx", result->eb= x, > + subleaf); > + _dl_diagnostics_cpuid_print_reg (processor_index, q, "ecx", result->ec= x, > + subleaf); > + _dl_diagnostics_cpuid_print_reg (processor_index, q, "edx", result->ed= x, > + subleaf); > + > + if (subleaf && q->ecx_first !=3D q->ecx_last) > + { > + _dl_printf ("x86.processor[0x%x].cpuid.subleaf_eax[0x%x]" > + ".ecx[0x%x].until_ecx=3D0x%x\n", > + processor_index, q->eax, q->ecx_first, q->ecx_last); > + if (q->ecx_preserves_query_byte) > + _dl_printf ("x86.processor[0x%x].cpuid.subleaf_eax[0x%x]" > + ".ecx[0x%x].ecx_query_mask=3D0xff\n", > + processor_index, q->eax, q->ecx_first); > + } > +} > + > +/* Perform differential reporting of the data in *CURRENT against > + *BASE. REQUESTED_CPU is the kernel CPU ID the thread was > + configured to run on, or -1 if no configuration was possible. > + PROCESSOR_INDEX is the process sequence number (always starting at > + zero; not a kernel ID). */ > +static void > +_dl_diagnostics_cpuid_report (struct dl_iterate_cpu *dci, > + struct cpuid_collected_data *current, > + struct cpuid_collected_data *base) > +{ > + if (dci->requested_cpu >=3D 0) > + _dl_printf ("x86.processor[0x%x].requested=3D0x%x\n", > + dci->processor_index, dci->requested_cpu); > + if (dci->actual_cpu >=3D 0) > + _dl_printf ("x86.processor[0x%x].observed=3D0x%x\n", > + dci->processor_index, dci->actual_cpu); > + if (dci->actual_node >=3D 0) > + _dl_printf ("x86.processor[0x%x].observed_node=3D0x%x\n", > + dci->processor_index, dci->actual_node); > + > + _dl_printf ("x86.processor[0x%x].cpuid_leaves=3D0x%x\n", > + dci->processor_index, current->used); > + _dl_printf ("x86.processor[0x%x].ecx_limit=3D0x%x\n", > + dci->processor_index, current->ecx_limit); > + > + unsigned int base_idx =3D 0; > + for (unsigned int current_idx =3D 0; current_idx < current->used; > + ++current_idx) > + { > + /* Report missing data on the current CPU as 0. */ > + unsigned long long int current_query > + =3D cpuid_query_combined (¤t->qr[current_idx].q); > + while (base_idx < base->used > + && cpuid_query_combined (&base->qr[base_idx].q) < current_q= uery) > + { > + _dl_diagnostics_cpuid_print_query (dci->processor_index, > + base, base_idx, > + &cpuid_registers_zero); > + ++base_idx; > + } > + > + if (base_idx < base->used > + && cpuid_query_combined (&base->qr[base_idx].q) =3D=3D current= _query) > + { > + _Static_assert (sizeof (struct cpuid_registers) =3D=3D 4 * 4, > + "no padding in struct cpuid_registers"); > + if (current->qr[current_idx].q.ecx_last > + !=3D base->qr[base_idx].q.ecx_last > + || memcmp (¤t->qr[current_idx].r, > + &base->qr[base_idx].r, > + sizeof (struct cpuid_registers)) !=3D 0) > + /* The ECX range or the values have changed. Show the > + new values. */ > + _dl_diagnostics_cpuid_print_query (dci->processor_index, > + current, current_idx, > + ¤t->qr[current_idx]= .r); > + ++base_idx; > + } > + else > + /* Data is absent in the base reference. Report the new data. = */ > + _dl_diagnostics_cpuid_print_query (dci->processor_index, > + current, current_idx, > + ¤t->qr[current_idx].r); > + } > + > + if (current->xgetbv_ecx_0 !=3D base->xgetbv_ecx_0) > + { > + /* Re-use the 64-bit printing routine. */ > + _dl_printf ("x86.processor[0x%x].", dci->processor_index); > + _dl_diagnostics_print_labeled_value ("xgetbv.ecx[0x0]", > + current->xgetbv_ecx_0); > + } > +} > + > +static void > +_dl_diagnostics_cpuid (void) > +{ > +#if !HAS_CPUID > + /* CPUID is not supported, so there is nothing to dump. */ > + if (__get_cpuid_max (0, 0) =3D=3D 0) > + return; > +#endif > + > + struct dl_iterate_cpu dic; > + _dl_iterate_cpu_init (&dic); > + > + /* Two copies of the data are used. Data is written to the index > + (dic.processor_index & 1). The previous version against which the > + data dump is reported is at index !(processor_index & 1). */ > + struct cpuid_collected_data ccd[2]; > + > + /* The initial data is presumed to be all zero. Zero results are > + not recorded. */ > + ccd[1].used =3D 0; > + ccd[1].xgetbv_ecx_0 =3D 0; > + > + /* Run the CPUID probing on a specific CPU. There are expected > + differences for encoding core IDs and topology information in > + CPUID output, but some firmware/kernel bugs also may result in > + asymmetric data across CPUs in some cases. */ > + while (_dl_iterate_cpu_next (&dic)) > + { > + _dl_diagnostics_cpuid_collect (&ccd[dic.processor_index & 1]); > + _dl_diagnostics_cpuid_report > + (&dic, &ccd[dic.processor_index & 1], > + &ccd[!(dic.processor_index & 1)]); > + } > } > -- > 2.44.0 > > LGTM. Reviewed-by: H.J. Lu Thanks. --=20 H.J.