public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* RFC: Add <sys/platform/x86.h> to glibc 2.29
@ 2018-07-29 16:16 H.J. Lu
  2018-07-31 13:32 ` Carlos O'Donell
  2018-08-17 14:50 ` Florian Weimer
  0 siblings, 2 replies; 12+ messages in thread
From: H.J. Lu @ 2018-07-29 16:16 UTC (permalink / raw)
  To: GNU C Library

I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
an API to access x86 specific platform features:

enum
  {
    /* The integer bit array index of architecture feature bits.  */
    FEATURE_INDEX_1 = 0,
    FEATURE_INDEX_2,
    /* The current maximum size of the feature integer bit array.  */
    FEATURE_INDEX_MAX
  };

enum
  {
    COMMON_CPUID_INDEX_1 = 0,
    COMMON_CPUID_INDEX_7,
    COMMON_CPUID_INDEX_80000001,
    COMMON_CPUID_INDEX_D_ECX_1,
    COMMON_CPUID_INDEX_80000007,
    COMMON_CPUID_INDEX_80000008,
    /* Keep the following line at the end.  */
    COMMON_CPUID_INDEX_MAX
  };

enum cpu_features_kind
  {
    arch_kind_unknown = 0,
    arch_kind_intel,
    arch_kind_amd,
    arch_kind_other
  };

struct cpuid_registers
  {
    unsigned int eax;
    unsigned int ebx;
    unsigned int ecx;
    unsigned int edx;
  };

extern enum cpu_features_kind __x86_get_cpu_kind (void)
     __attribute__ ((const));

extern const struct cpuid_registers *__x86_get_cpuid_registers
     (unsigned int) __attribute__ ((const));

extern unsigned int __x86_get_arch_feature (unsigned int)
     __attribute__ ((const));

extern unsigned long long __x86_tsc_to_ns (unsigned long long);
extern unsigned long long __x86_ns_to_tsc (unsigned long long);

/* HAS_* evaluates to true if we may use the feature at runtime.  */
#define HAS_CPU_FEATURE(name) \
   ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
     & (bit_cpu_##name)) != 0)
#define HAS_ARCH_FEATURE(name) \
   ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)

Users can use <sys/platform/x86.h> to detect if a CPU feature is
supported at run-time with

HAS_CPU_FEATURE (AVX)

which returns true if glibc detects that AVX is available.  Before AVX
can be used, he/she should check

HAS_ARCH_FEATURE (AVX_Usable)

which returns true if glibc detects that AVX is available and usable.

extern unsigned long long __x86_tsc_to_ns (unsigned long long);
extern unsigned long long __x86_ns_to_tsc (unsigned long long);

can be used to convert between TSCs and nanoseconds if

HAS_ARCH_FEATURE (TSC_To_NS_Usable)

returns true.

Backward binary compatibility is provided by:

1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
to cpu_features_kind.
2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
3. __x86_get_arch_feature returns 0 if architecture index >=
FEATURE_INDEX_MAX.

This means:

1. Applications linked against the new __x86_get_cpu_kind will only
run with the newer libc.so.
2. Applications linked against the new __x86_get_cpuid_registers and
__x86_get_arch_feature will run with the older libc.so.  But new
CPU and architecture features won't be detected by the older libc.so.

This program:

https://github.com/hjl-tools/glibc/blob/hjl/platform/master/sysdeps/x86/tst-x86-platform-1.c

may display:

[hjl@gnu-efi-2 build-x86_64-linux]$ ./elf/tst-x86-platform-1
Vendor: Intel
CPU features:
  SSE3
  PCLMULQDQ
  DTES64
  MONITOR
  DS_CPL
  VMX
  EST
  TM2
  SSSE3
  SDBG
  FMA
  CMPXCHG16B
  XTPRUPDCTRL
  PDCM
  PCID
  SSE4_1
  SSE4_2
  X2APIC
  MOVBE
  POPCOUNT
  TSC_DEADLINE
  AES
  XSAVE
  OSXSAVE
  AVX
  F16C
  RDRAND
  FPU
  VME
  DE
  PSE
  TSC
  MSR
  PAE
  MCE
  CX8
  APIC
  SEP
  MTRR
  PGE
  MCA
  CMOV
  PAT
  PSE_36
  DS
  ACPI
  MMX
  FXSR
  SSE
  SSE2
  SS
  HTT
  TM
  PBE
  FSGSBASE
  TSC_ADJUST
  SGX
  BMI1
  AVX2
  BMI2
  ERMS
  INVPCID
  MPX
  RDSEED
  ADX
  SMAP
  TRACE
  LAHF64_SAHF64
  LZCNT
  PREFETCHW
  SYSCALL_SYSRET
  NX
  PAGE1GB
  RDTSCP
  LM
  XSAVEOPT
  XSAVEC
  XGETBV_ECX_1
  XSAVES
  INVARIANT_TSC
Architecture features:
  AVX_Usable
  AVX2_Usable
  FMA_Usable
  XSAVEC_Usable
  TSC_To_NS_Usable
[hjl@gnu-efi-2 build-x86_64-linux]$

Any comments?


H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-07-29 16:16 RFC: Add <sys/platform/x86.h> to glibc 2.29 H.J. Lu
@ 2018-07-31 13:32 ` Carlos O'Donell
  2018-07-31 14:49   ` H.J. Lu
  2018-08-17 14:50 ` Florian Weimer
  1 sibling, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2018-07-31 13:32 UTC (permalink / raw)
  To: H.J. Lu, GNU C Library

On 07/29/2018 12:16 PM, H.J. Lu wrote:
> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
> an API to access x86 specific platform features:
> 
> enum
>   {
>     /* The integer bit array index of architecture feature bits.  */
>     FEATURE_INDEX_1 = 0,
>     FEATURE_INDEX_2,
>     /* The current maximum size of the feature integer bit array.  */
>     FEATURE_INDEX_MAX
>   };
> 
> enum
>   {
>     COMMON_CPUID_INDEX_1 = 0,
>     COMMON_CPUID_INDEX_7,
>     COMMON_CPUID_INDEX_80000001,
>     COMMON_CPUID_INDEX_D_ECX_1,
>     COMMON_CPUID_INDEX_80000007,
>     COMMON_CPUID_INDEX_80000008,
>     /* Keep the following line at the end.  */
>     COMMON_CPUID_INDEX_MAX
>   };
> 
> enum cpu_features_kind
>   {
>     arch_kind_unknown = 0,
>     arch_kind_intel,
>     arch_kind_amd,
>     arch_kind_other
>   };
> 
> struct cpuid_registers
>   {
>     unsigned int eax;
>     unsigned int ebx;
>     unsigned int ecx;
>     unsigned int edx;
>   };
> 
> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>      __attribute__ ((const));
> 
> extern const struct cpuid_registers *__x86_get_cpuid_registers
>      (unsigned int) __attribute__ ((const));
> 
> extern unsigned int __x86_get_arch_feature (unsigned int)
>      __attribute__ ((const));
> 
> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
> 
> /* HAS_* evaluates to true if we may use the feature at runtime.  */
> #define HAS_CPU_FEATURE(name) \
>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>      & (bit_cpu_##name)) != 0)

I don't like these as macros.

I would rather an inline function that does all the work within the function
and returns a result of the appropriate type e.g. boolean.

> #define HAS_ARCH_FEATURE(name) \
>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
> 

Likewise.

> Users can use <sys/platform/x86.h> to detect if a CPU feature is
> supported at run-time with
> 
> HAS_CPU_FEATURE (AVX)

Likewise.

> which returns true if glibc detects that AVX is available.  Before AVX
> can be used, he/she should check
> 
> HAS_ARCH_FEATURE (AVX_Usable)

Likewise.

These macros like this are harder to debug than static inline functions.

> which returns true if glibc detects that AVX is available and usable.
> 
> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
> 
> can be used to convert between TSCs and nanoseconds if
> 
> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
> 
> returns true.
> 
> Backward binary compatibility is provided by:
> 
> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
> to cpu_features_kind.

This function takes void and returns the kind of CPU as an enum value. OK.

> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.

Could you expand on this in more detail please.

> 3. __x86_get_arch_feature returns 0 if architecture index >=
> FEATURE_INDEX_MAX.

Likewise.

> This means:
> 
> 1. Applications linked against the new __x86_get_cpu_kind will only
> run with the newer libc.so.

How? Symbol versioning?

> 2. Applications linked against the new __x86_get_cpuid_registers and
> __x86_get_arch_feature will run with the older libc.so.  But new
> CPU and architecture features won't be detected by the older libc.so.

The caller must attempt to copy or inspect all the data, but it's structure
size may be larger, and the old glibc structure may be smaller, how do you
prevent the newer program from reading out of bounds when run in an older
glibc? This question is only relevant for __x86_get_cpuid_registers which
returns a pointer to a fixed size structure. The structure can grow, but
the caller needs to know exactly how big the returned structure is to avoid
reading outside of bounds.

> This program:
> 
> https://github.com/hjl-tools/glibc/blob/hjl/platform/master/sysdeps/x86/tst-x86-platform-1.c
> 
> may display:
> 
> [hjl@gnu-efi-2 build-x86_64-linux]$ ./elf/tst-x86-platform-1
> Vendor: Intel
> CPU features:
>   SSE3
>   PCLMULQDQ
>   DTES64
>   MONITOR
>   DS_CPL
>   VMX
>   EST
>   TM2
>   SSSE3
>   SDBG
>   FMA
>   CMPXCHG16B
>   XTPRUPDCTRL
>   PDCM
>   PCID
>   SSE4_1
>   SSE4_2
>   X2APIC
>   MOVBE
>   POPCOUNT
>   TSC_DEADLINE
>   AES
>   XSAVE
>   OSXSAVE
>   AVX
>   F16C
>   RDRAND
>   FPU
>   VME
>   DE
>   PSE
>   TSC
>   MSR
>   PAE
>   MCE
>   CX8
>   APIC
>   SEP
>   MTRR
>   PGE
>   MCA
>   CMOV
>   PAT
>   PSE_36
>   DS
>   ACPI
>   MMX
>   FXSR
>   SSE
>   SSE2
>   SS
>   HTT
>   TM
>   PBE
>   FSGSBASE
>   TSC_ADJUST
>   SGX
>   BMI1
>   AVX2
>   BMI2
>   ERMS
>   INVPCID
>   MPX
>   RDSEED
>   ADX
>   SMAP
>   TRACE
>   LAHF64_SAHF64
>   LZCNT
>   PREFETCHW
>   SYSCALL_SYSRET
>   NX
>   PAGE1GB
>   RDTSCP
>   LM
>   XSAVEOPT
>   XSAVEC
>   XGETBV_ECX_1
>   XSAVES
>   INVARIANT_TSC
> Architecture features:
>   AVX_Usable
>   AVX2_Usable
>   FMA_Usable
>   XSAVEC_Usable
>   TSC_To_NS_Usable
> [hjl@gnu-efi-2 build-x86_64-linux]$
> 
> Any comments?

The data you want to share seems reasonable, but I expect we'll need
some iteration on the interface design.
 
> H.J.
> 


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-07-31 13:32 ` Carlos O'Donell
@ 2018-07-31 14:49   ` H.J. Lu
  2018-07-31 15:06     ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2018-07-31 14:49 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>> an API to access x86 specific platform features:
>>
>> enum
>>   {
>>     /* The integer bit array index of architecture feature bits.  */
>>     FEATURE_INDEX_1 = 0,
>>     FEATURE_INDEX_2,
>>     /* The current maximum size of the feature integer bit array.  */
>>     FEATURE_INDEX_MAX
>>   };
>>
>> enum
>>   {
>>     COMMON_CPUID_INDEX_1 = 0,
>>     COMMON_CPUID_INDEX_7,
>>     COMMON_CPUID_INDEX_80000001,
>>     COMMON_CPUID_INDEX_D_ECX_1,
>>     COMMON_CPUID_INDEX_80000007,
>>     COMMON_CPUID_INDEX_80000008,
>>     /* Keep the following line at the end.  */
>>     COMMON_CPUID_INDEX_MAX
>>   };
>>
>> enum cpu_features_kind
>>   {
>>     arch_kind_unknown = 0,
>>     arch_kind_intel,
>>     arch_kind_amd,
>>     arch_kind_other
>>   };
>>
>> struct cpuid_registers
>>   {
>>     unsigned int eax;
>>     unsigned int ebx;
>>     unsigned int ecx;
>>     unsigned int edx;
>>   };
>>
>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>      __attribute__ ((const));
>>
>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>      (unsigned int) __attribute__ ((const));
>>
>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>      __attribute__ ((const));
>>
>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>
>> /* HAS_* evaluates to true if we may use the feature at runtime.  */
>> #define HAS_CPU_FEATURE(name) \
>>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>      & (bit_cpu_##name)) != 0)
>
> I don't like these as macros.
>
> I would rather an inline function that does all the work within the function
> and returns a result of the appropriate type e.g. boolean.

CPUID instruction takes EAX and optionally ECX as input,  It writes to
EAX, EBX, ECD and EDX.  Each CPU feature is determined by CPUID
with specific input values and a bit in EAX, EBX, ECX or EDX.   Take
AVX2 as example:

#define bit_cpu_AVX2 (1u << 5)
#define index_cpu_AVX2 COMMON_CPUID_INDEX_7
#define reg_AVX2 ebx

It is determined by the bit 5 of EBX from CPUID with EAX == 7.
HAS_CPU_FEATURE (AVX2) is expanded to

   ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
     & (bit_cpu_AVX2)) != 0)

which is equivalent to

  ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
     & ((1u << 5))) != 0)

It is perfect for macro, not so convenient for inline function.

>> #define HAS_ARCH_FEATURE(name) \
>>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>
>
> Likewise.
>
>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>> supported at run-time with
>>
>> HAS_CPU_FEATURE (AVX)
>
> Likewise.
>
>> which returns true if glibc detects that AVX is available.  Before AVX
>> can be used, he/she should check
>>
>> HAS_ARCH_FEATURE (AVX_Usable)
>
> Likewise.
>
> These macros like this are harder to debug than static inline functions.
>
>> which returns true if glibc detects that AVX is available and usable.
>>
>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>
>> can be used to convert between TSCs and nanoseconds if
>>
>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>
>> returns true.
>>
>> Backward binary compatibility is provided by:
>>
>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>> to cpu_features_kind.
>
> This function takes void and returns the kind of CPU as an enum value. OK.
>
>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>
> Could you expand on this in more detail please.

Internally, we have

struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];

_x86_get_cpuid_registers is defined as

const struct cpuid_registers *
__x86_get_cpuid_registers (unsigned int i)
{
  const static struct cpuid_registers zero_cpuid_registers = { 0 };
  if (i >= COMMON_CPUID_INDEX_MAX)
    return &zero_cpuid_registers;
  return &GLRO(dl_x86_cpu_features).cpuid[i];
}

>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>> FEATURE_INDEX_MAX.
>
> Likewise.
>
>> This means:
>>
>> 1. Applications linked against the new __x86_get_cpu_kind will only
>> run with the newer libc.so.
>
> How? Symbol versioning?

Yes.

>> 2. Applications linked against the new __x86_get_cpuid_registers and
>> __x86_get_arch_feature will run with the older libc.so.  But new
>> CPU and architecture features won't be detected by the older libc.so.
>
> The caller must attempt to copy or inspect all the data, but it's structure
> size may be larger, and the old glibc structure may be smaller, how do you
> prevent the newer program from reading out of bounds when run in an older
> glibc? This question is only relevant for __x86_get_cpuid_registers which
> returns a pointer to a fixed size structure. The structure can grow, but
> the caller needs to know exactly how big the returned structure is to avoid
> reading outside of bounds.

There is no ABI issue here.   __x86_get_cpuid_registers returns a pointer to
an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
whose size is fixed as

struct cpuid_registers
  {
    unsigned int eax;
    unsigned int ebx;
    unsigned int ecx;
    unsigned int edx;
  };

which is the cached output of CPUID.  As shown above if the input index
>= COMMON_CPUID_INDEX_MAX. we return a pointer to

const static struct cpuid_registers zero_cpuid_registers = { 0 };


-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-07-31 14:49   ` H.J. Lu
@ 2018-07-31 15:06     ` Carlos O'Donell
  2018-07-31 16:17       ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2018-07-31 15:06 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On 07/31/2018 10:48 AM, H.J. Lu wrote:
> On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>>> an API to access x86 specific platform features:
>>>
>>> enum
>>>   {
>>>     /* The integer bit array index of architecture feature bits.  */
>>>     FEATURE_INDEX_1 = 0,
>>>     FEATURE_INDEX_2,
>>>     /* The current maximum size of the feature integer bit array.  */
>>>     FEATURE_INDEX_MAX
>>>   };
>>>
>>> enum
>>>   {
>>>     COMMON_CPUID_INDEX_1 = 0,
>>>     COMMON_CPUID_INDEX_7,
>>>     COMMON_CPUID_INDEX_80000001,
>>>     COMMON_CPUID_INDEX_D_ECX_1,
>>>     COMMON_CPUID_INDEX_80000007,
>>>     COMMON_CPUID_INDEX_80000008,
>>>     /* Keep the following line at the end.  */
>>>     COMMON_CPUID_INDEX_MAX
>>>   };
>>>
>>> enum cpu_features_kind
>>>   {
>>>     arch_kind_unknown = 0,
>>>     arch_kind_intel,
>>>     arch_kind_amd,
>>>     arch_kind_other
>>>   };
>>>
>>> struct cpuid_registers
>>>   {
>>>     unsigned int eax;
>>>     unsigned int ebx;
>>>     unsigned int ecx;
>>>     unsigned int edx;
>>>   };
>>>
>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>      __attribute__ ((const));
>>>
>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>      (unsigned int) __attribute__ ((const));
>>>
>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>      __attribute__ ((const));
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> /* HAS_* evaluates to true if we may use the feature at runtime.  */
>>> #define HAS_CPU_FEATURE(name) \
>>>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>>      & (bit_cpu_##name)) != 0)
>>
>> I don't like these as macros.
>>
>> I would rather an inline function that does all the work within the function
>> and returns a result of the appropriate type e.g. boolean.
> 
> CPUID instruction takes EAX and optionally ECX as input,  It writes to
> EAX, EBX, ECD and EDX.  Each CPU feature is determined by CPUID
> with specific input values and a bit in EAX, EBX, ECX or EDX.   Take
> AVX2 as example:
> 
> #define bit_cpu_AVX2 (1u << 5)
> #define index_cpu_AVX2 COMMON_CPUID_INDEX_7
> #define reg_AVX2 ebx
> 
> It is determined by the bit 5 of EBX from CPUID with EAX == 7.
> HAS_CPU_FEATURE (AVX2) is expanded to
> 
>    ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
>      & (bit_cpu_AVX2)) != 0)
> 
> which is equivalent to
> 
>   ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
>      & ((1u << 5))) != 0)
> 
> It is perfect for macro, not so convenient for inline function.

It's not perfect.

It's difficult to debug.

You are using the c pre-processor here to evaluate the input argument
into another 2 or 3 constant values, this is the kind of propagation
of values that the compiler can do too.

I'd like to see an attempt at a functional definition of this rather
than a macro, and the best argument against a static inline function
is performance. If the macro generates much better code then that's
a real reason to use it.

>>> #define HAS_ARCH_FEATURE(name) \
>>>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>>
>>
>> Likewise.
>>
>>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>>> supported at run-time with
>>>
>>> HAS_CPU_FEATURE (AVX)
>>
>> Likewise.
>>
>>> which returns true if glibc detects that AVX is available.  Before AVX
>>> can be used, he/she should check
>>>
>>> HAS_ARCH_FEATURE (AVX_Usable)
>>
>> Likewise.
>>
>> These macros like this are harder to debug than static inline functions.
>>
>>> which returns true if glibc detects that AVX is available and usable.
>>>
>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>
>>> can be used to convert between TSCs and nanoseconds if
>>>
>>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>>
>>> returns true.
>>>
>>> Backward binary compatibility is provided by:
>>>
>>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>>> to cpu_features_kind.
>>
>> This function takes void and returns the kind of CPU as an enum value. OK.
>>
>>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>>
>> Could you expand on this in more detail please.
> 
> Internally, we have
> 
> struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
> 
> _x86_get_cpuid_registers is defined as
> 
> const struct cpuid_registers *
> __x86_get_cpuid_registers (unsigned int i)
> {
>   const static struct cpuid_registers zero_cpuid_registers = { 0 };
>   if (i >= COMMON_CPUID_INDEX_MAX)
>     return &zero_cpuid_registers;
>   return &GLRO(dl_x86_cpu_features).cpuid[i];
> }
> 

I don't like returning all zero for an error condition, we should have
a distinct error and return value..

Are we certain that an all zero cpuid register return is an invalid
result for cpuid in all cases or could it be confused with a valid
cpuid result but with the options in question disabled?

>>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>>> FEATURE_INDEX_MAX.
>>
>> Likewise.
>>
>>> This means:
>>>
>>> 1. Applications linked against the new __x86_get_cpu_kind will only
>>> run with the newer libc.so.
>>
>> How? Symbol versioning?
> 
> Yes.

OK.

>>> 2. Applications linked against the new __x86_get_cpuid_registers and
>>> __x86_get_arch_feature will run with the older libc.so.  But new
>>> CPU and architecture features won't be detected by the older libc.so.
>>
>> The caller must attempt to copy or inspect all the data, but it's structure
>> size may be larger, and the old glibc structure may be smaller, how do you
>> prevent the newer program from reading out of bounds when run in an older
>> glibc? This question is only relevant for __x86_get_cpuid_registers which
>> returns a pointer to a fixed size structure. The structure can grow, but
>> the caller needs to know exactly how big the returned structure is to avoid
>> reading outside of bounds.
> 
> There is no ABI issue here.   __x86_get_cpuid_registers returns a pointer to
> an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
> whose size is fixed as
> 
> struct cpuid_registers
>   {
>     unsigned int eax;
>     unsigned int ebx;
>     unsigned int ecx;
>     unsigned int edx;
>   };

Ah! Right, it's the cached cpuid return result so it's following the ISA rules
for the registers it touches. I expect this can't change at this point, and if
we got a *new* cpuid instruction we'd just make a new function to implement that.

> which is the cached output of CPUID.  As shown above if the input index
>> = COMMON_CPUID_INDEX_MAX. we return a pointer to
> 
> const static struct cpuid_registers zero_cpuid_registers = { 0 };


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-07-31 15:06     ` Carlos O'Donell
@ 2018-07-31 16:17       ` H.J. Lu
  0 siblings, 0 replies; 12+ messages in thread
From: H.J. Lu @ 2018-07-31 16:17 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Tue, Jul 31, 2018 at 8:06 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 07/31/2018 10:48 AM, H.J. Lu wrote:
>> On Tue, Jul 31, 2018 at 6:32 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 07/29/2018 12:16 PM, H.J. Lu wrote:
>>>> I am proposing to add <sys/platform/x86.h> to glibc 2.29 to provide
>>>> an API to access x86 specific platform features:
>>>>
>>>> enum
>>>>   {
>>>>     /* The integer bit array index of architecture feature bits.  */
>>>>     FEATURE_INDEX_1 = 0,
>>>>     FEATURE_INDEX_2,
>>>>     /* The current maximum size of the feature integer bit array.  */
>>>>     FEATURE_INDEX_MAX
>>>>   };
>>>>
>>>> enum
>>>>   {
>>>>     COMMON_CPUID_INDEX_1 = 0,
>>>>     COMMON_CPUID_INDEX_7,
>>>>     COMMON_CPUID_INDEX_80000001,
>>>>     COMMON_CPUID_INDEX_D_ECX_1,
>>>>     COMMON_CPUID_INDEX_80000007,
>>>>     COMMON_CPUID_INDEX_80000008,
>>>>     /* Keep the following line at the end.  */
>>>>     COMMON_CPUID_INDEX_MAX
>>>>   };
>>>>
>>>> enum cpu_features_kind
>>>>   {
>>>>     arch_kind_unknown = 0,
>>>>     arch_kind_intel,
>>>>     arch_kind_amd,
>>>>     arch_kind_other
>>>>   };
>>>>
>>>> struct cpuid_registers
>>>>   {
>>>>     unsigned int eax;
>>>>     unsigned int ebx;
>>>>     unsigned int ecx;
>>>>     unsigned int edx;
>>>>   };
>>>>
>>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>>      __attribute__ ((const));
>>>>
>>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>>      (unsigned int) __attribute__ ((const));
>>>>
>>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>>      __attribute__ ((const));
>>>>
>>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>>
>>>> /* HAS_* evaluates to true if we may use the feature at runtime.  */
>>>> #define HAS_CPU_FEATURE(name) \
>>>>    ((__x86_get_cpuid_registers (index_cpu_##name)->reg_##name \
>>>>      & (bit_cpu_##name)) != 0)
>>>
>>> I don't like these as macros.
>>>
>>> I would rather an inline function that does all the work within the function
>>> and returns a result of the appropriate type e.g. boolean.
>>
>> CPUID instruction takes EAX and optionally ECX as input,  It writes to
>> EAX, EBX, ECD and EDX.  Each CPU feature is determined by CPUID
>> with specific input values and a bit in EAX, EBX, ECX or EDX.   Take
>> AVX2 as example:
>>
>> #define bit_cpu_AVX2 (1u << 5)
>> #define index_cpu_AVX2 COMMON_CPUID_INDEX_7
>> #define reg_AVX2 ebx
>>
>> It is determined by the bit 5 of EBX from CPUID with EAX == 7.
>> HAS_CPU_FEATURE (AVX2) is expanded to
>>
>>    ((__x86_get_cpuid_registers (index_cpu_AVX2)->reg_AVX2 \
>>      & (bit_cpu_AVX2)) != 0)
>>
>> which is equivalent to
>>
>>   ((__x86_get_cpuid_registers (COMMON_CPUID_INDEX_7)->ebx \
>>      & ((1u << 5))) != 0)
>>
>> It is perfect for macro, not so convenient for inline function.
>
> It's not perfect.
>
> It's difficult to debug.

sysdeps/x86/tst-get-cpu-features.c should have 100% coverage of all
features.  Glibc users can just use them.

> You are using the c pre-processor here to evaluate the input argument
> into another 2 or 3 constant values, this is the kind of propagation
> of values that the compiler can do too.

Not easily with inline function in a public header file.

> I'd like to see an attempt at a functional definition of this rather
> than a macro, and the best argument against a static inline function
> is performance. If the macro generates much better code then that's
> a real reason to use it.

In this case, macro generates very good code and very easy to
maintain.

>>>> #define HAS_ARCH_FEATURE(name) \
>>>>    ((__x86_get_arch_feature (index_arch_##name) & (bit_arch_##name)) != 0)
>>>>
>>>
>>> Likewise.
>>>
>>>> Users can use <sys/platform/x86.h> to detect if a CPU feature is
>>>> supported at run-time with
>>>>
>>>> HAS_CPU_FEATURE (AVX)
>>>
>>> Likewise.
>>>
>>>> which returns true if glibc detects that AVX is available.  Before AVX
>>>> can be used, he/she should check
>>>>
>>>> HAS_ARCH_FEATURE (AVX_Usable)
>>>
>>> Likewise.
>>>
>>> These macros like this are harder to debug than static inline functions.
>>>
>>>> which returns true if glibc detects that AVX is available and usable.
>>>>
>>>> extern unsigned long long __x86_tsc_to_ns (unsigned long long);
>>>> extern unsigned long long __x86_ns_to_tsc (unsigned long long);
>>>>
>>>> can be used to convert between TSCs and nanoseconds if
>>>>
>>>> HAS_ARCH_FEATURE (TSC_To_NS_Usable)
>>>>
>>>> returns true.
>>>>
>>>> Backward binary compatibility is provided by:
>>>>
>>>> 1. __x86_get_cpu_kind will get a new version if a new CPU kind is added
>>>> to cpu_features_kind.
>>>
>>> This function takes void and returns the kind of CPU as an enum value. OK.
>>>
>>>> 2. __x86_get_cpuid_registers returns a pointer to a cpuid_registers
>>>> with all zeros if cpuid index >= COMMON_CPUID_INDEX_MAX.
>>>
>>> Could you expand on this in more detail please.
>>
>> Internally, we have
>>
>> struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
>>
>> _x86_get_cpuid_registers is defined as
>>
>> const struct cpuid_registers *
>> __x86_get_cpuid_registers (unsigned int i)
>> {
>>   const static struct cpuid_registers zero_cpuid_registers = { 0 };
>>   if (i >= COMMON_CPUID_INDEX_MAX)
>>     return &zero_cpuid_registers;
>>   return &GLRO(dl_x86_cpu_features).cpuid[i];
>> }
>>
>
> I don't like returning all zero for an error condition, we should have
> a distinct error and return value..

It isn't an error.  It simply means that this glibc doesn't support
this CPUID input.  It is OK to return 0 in EAX, EBX, ECX and EDX
and HAS_CPU_FEATURE (XXX) evaluates to false.

> Are we certain that an all zero cpuid register return is an invalid
> result for cpuid in all cases or could it be confused with a valid
> cpuid result but with the options in question disabled?

HAS_CPU_FEATURE (XXX) evaluates to true if glibc detects that
XXX is supported.  It is OK for HAS_CPU_FEATURE (XXX) to be
false if glibc doesn't know XXX on processors which has XXX.

BTW, there are many reserved bits in EAX, EBX, ECX and EDX,
which may be changed to new CPU features in future processors.
It is easy to add these features to x86.h with macro without recompiling
glibc.

>>>> 3. __x86_get_arch_feature returns 0 if architecture index >=
>>>> FEATURE_INDEX_MAX.
>>>
>>> Likewise.
>>>
>>>> This means:
>>>>
>>>> 1. Applications linked against the new __x86_get_cpu_kind will only
>>>> run with the newer libc.so.
>>>
>>> How? Symbol versioning?
>>
>> Yes.
>
> OK.
>
>>>> 2. Applications linked against the new __x86_get_cpuid_registers and
>>>> __x86_get_arch_feature will run with the older libc.so.  But new
>>>> CPU and architecture features won't be detected by the older libc.so.
>>>
>>> The caller must attempt to copy or inspect all the data, but it's structure
>>> size may be larger, and the old glibc structure may be smaller, how do you
>>> prevent the newer program from reading out of bounds when run in an older
>>> glibc? This question is only relevant for __x86_get_cpuid_registers which
>>> returns a pointer to a fixed size structure. The structure can grow, but
>>> the caller needs to know exactly how big the returned structure is to avoid
>>> reading outside of bounds.
>>
>> There is no ABI issue here.   __x86_get_cpuid_registers returns a pointer to
>> an element of struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX],
>> whose size is fixed as
>>
>> struct cpuid_registers
>>   {
>>     unsigned int eax;
>>     unsigned int ebx;
>>     unsigned int ecx;
>>     unsigned int edx;
>>   };
>
> Ah! Right, it's the cached cpuid return result so it's following the ISA rules
> for the registers it touches. I expect this can't change at this point, and if
> we got a *new* cpuid instruction we'd just make a new function to implement that.

True.

>> which is the cached output of CPUID.  As shown above if the input index
>>> = COMMON_CPUID_INDEX_MAX. we return a pointer to
>>
>> const static struct cpuid_registers zero_cpuid_registers = { 0 };
>
>
> --
> Cheers,
> Carlos.



-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-07-29 16:16 RFC: Add <sys/platform/x86.h> to glibc 2.29 H.J. Lu
  2018-07-31 13:32 ` Carlos O'Donell
@ 2018-08-17 14:50 ` Florian Weimer
  2018-08-17 15:46   ` H.J. Lu
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2018-08-17 14:50 UTC (permalink / raw)
  To: H.J. Lu, GNU C Library

On 07/29/2018 06:16 PM, H.J. Lu wrote:
> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>       __attribute__ ((const));
> 
> extern const struct cpuid_registers *__x86_get_cpuid_registers
>       (unsigned int) __attribute__ ((const));
> 
> extern unsigned int __x86_get_arch_feature (unsigned int)
>       __attribute__ ((const));

Are these functions supposed to be usable in IFUNC resolvers?

Shouldn't these identifiers be in the public (non-internal) namespace?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-17 14:50 ` Florian Weimer
@ 2018-08-17 15:46   ` H.J. Lu
  2018-08-17 16:19     ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2018-08-17 15:46 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

On Fri, Aug 17, 2018 at 7:50 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 07/29/2018 06:16 PM, H.J. Lu wrote:
>>
>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>       __attribute__ ((const));
>>
>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>       (unsigned int) __attribute__ ((const));
>>
>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>       __attribute__ ((const));
>
>
> Are these functions supposed to be usable in IFUNC resolvers?

They have the same limitation as

/* Used from outside of glibc to get access to the CPU features
   structure.  */
extern const struct cpu_features *__get_cpu_features (void)
     __attribute__ ((const));

> Shouldn't these identifiers be in the public (non-internal) namespace?
>

Do we have precedents for such arch specific public functions?

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-17 15:46   ` H.J. Lu
@ 2018-08-17 16:19     ` Florian Weimer
  2018-08-17 16:44       ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2018-08-17 16:19 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On 08/17/2018 05:45 PM, H.J. Lu wrote:
> On Fri, Aug 17, 2018 at 7:50 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 07/29/2018 06:16 PM, H.J. Lu wrote:
>>>
>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>        __attribute__ ((const));
>>>
>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>        (unsigned int) __attribute__ ((const));
>>>
>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>        __attribute__ ((const));
>>
>>
>> Are these functions supposed to be usable in IFUNC resolvers?
> 
> They have the same limitation as
> 
> /* Used from outside of glibc to get access to the CPU features
>     structure.  */
> extern const struct cpu_features *__get_cpu_features (void)
>       __attribute__ ((const));

Sorry, this does not answer my question.

>> Shouldn't these identifiers be in the public (non-internal) namespace?
>>
> 
> Do we have precedents for such arch specific public functions?

There is modify_ldt.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-17 16:19     ` Florian Weimer
@ 2018-08-17 16:44       ` H.J. Lu
  2018-08-17 19:11         ` Florian Weimer
  2018-08-20 11:29         ` Florian Weimer
  0 siblings, 2 replies; 12+ messages in thread
From: H.J. Lu @ 2018-08-17 16:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

On Fri, Aug 17, 2018 at 9:19 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 08/17/2018 05:45 PM, H.J. Lu wrote:
>>
>> On Fri, Aug 17, 2018 at 7:50 AM, Florian Weimer <fweimer@redhat.com>
>> wrote:
>>>
>>> On 07/29/2018 06:16 PM, H.J. Lu wrote:
>>>>
>>>>
>>>> extern enum cpu_features_kind __x86_get_cpu_kind (void)
>>>>        __attribute__ ((const));
>>>>
>>>> extern const struct cpuid_registers *__x86_get_cpuid_registers
>>>>        (unsigned int) __attribute__ ((const));
>>>>
>>>> extern unsigned int __x86_get_arch_feature (unsigned int)
>>>>        __attribute__ ((const));
>>>
>>>
>>>
>>> Are these functions supposed to be usable in IFUNC resolvers?
>>
>>
>> They have the same limitation as
>>
>> /* Used from outside of glibc to get access to the CPU features
>>     structure.  */
>> extern const struct cpu_features *__get_cpu_features (void)
>>       __attribute__ ((const));
>
>
> Sorry, this does not answer my question.

Just like __get_cpu_features, they can be used in IFUNC resolvers.

>>> Shouldn't these identifiers be in the public (non-internal) namespace?
>>>
>>
>> Do we have precedents for such arch specific public functions?
>
>
> There is modify_ldt.
>

Which header file has its prototype?

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-17 16:44       ` H.J. Lu
@ 2018-08-17 19:11         ` Florian Weimer
  2018-08-20 11:29         ` Florian Weimer
  1 sibling, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2018-08-17 19:11 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On 08/17/2018 06:43 PM, H.J. Lu wrote:
>>>> Are these functions supposed to be usable in IFUNC resolvers?
>>>
>>> They have the same limitation as
>>>
>>> /* Used from outside of glibc to get access to the CPU features
>>>      structure.  */
>>> extern const struct cpu_features *__get_cpu_features (void)
>>>        __attribute__ ((const));
>>
>> Sorry, this does not answer my question.

> Just like __get_cpu_features, they can be used in IFUNC resolvers.

Then we need something like this:

   <https://sourceware.org/ml/libc-alpha/2018-06/msg00077.html>

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-17 16:44       ` H.J. Lu
  2018-08-17 19:11         ` Florian Weimer
@ 2018-08-20 11:29         ` Florian Weimer
  2018-08-20 20:55           ` H.J. Lu
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2018-08-20 11:29 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On 08/17/2018 06:43 PM, H.J. Lu wrote:
>>>> Shouldn't these identifiers be in the public (non-internal) namespace?
>>>>
>>> Do we have precedents for such arch specific public functions?
>>
>> There is modify_ldt.
>>
> Which header file has its prototype?

vm86 is an example for an arch-specific function with a prototype in an 
installed header file.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Add <sys/platform/x86.h> to glibc 2.29
  2018-08-20 11:29         ` Florian Weimer
@ 2018-08-20 20:55           ` H.J. Lu
  0 siblings, 0 replies; 12+ messages in thread
From: H.J. Lu @ 2018-08-20 20:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

On Mon, Aug 20, 2018 at 4:29 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 08/17/2018 06:43 PM, H.J. Lu wrote:
>>>>>
>>>>> Shouldn't these identifiers be in the public (non-internal) namespace?
>>>>>
>>>> Do we have precedents for such arch specific public functions?
>>>
>>>
>>> There is modify_ldt.
>>>
>> Which header file has its prototype?
>
>
> vm86 is an example for an arch-specific function with a prototype in an
> installed header file.

The updated patch is posted at

https://sourceware.org/ml/libc-alpha/2018-08/msg00435.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-20 20:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 16:16 RFC: Add <sys/platform/x86.h> to glibc 2.29 H.J. Lu
2018-07-31 13:32 ` Carlos O'Donell
2018-07-31 14:49   ` H.J. Lu
2018-07-31 15:06     ` Carlos O'Donell
2018-07-31 16:17       ` H.J. Lu
2018-08-17 14:50 ` Florian Weimer
2018-08-17 15:46   ` H.J. Lu
2018-08-17 16:19     ` Florian Weimer
2018-08-17 16:44       ` H.J. Lu
2018-08-17 19:11         ` Florian Weimer
2018-08-20 11:29         ` Florian Weimer
2018-08-20 20:55           ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).