public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
@ 2021-06-24 12:12 H.J. Lu
  2021-06-24 12:35 ` Richard Biener
  2021-06-24 16:12 ` Uros Bizjak
  0 siblings, 2 replies; 13+ messages in thread
From: H.J. Lu @ 2021-06-24 12:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Hongtao Liu

CPUID functions are used to detect CPU features.  If vector ISAs
are enabled, compiler is free to use them in these functions.  Add
__attribute__ ((target("general-regs-only"))) to CPUID functions
to avoid vector instructions.

gcc/

	PR target/101185
	* config/i386/cpuid.h (__get_cpuid_max): Add
	__attribute__ ((target("general-regs-only"))).
	(__get_cpuid): Likewise.
	(__get_cpuid_count): Likewise.
	(__cpuidex): Likewise.

gcc/testsuite/

	PR target/101185
	* gcc.target/i386/avx512-check.h (check_osxsave): Add
	__attribute__ ((target("general-regs-only"))).
	(main): Likewise.
---
 gcc/config/i386/cpuid.h                      | 4 ++++
 gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index aebc17c6827..74881ee91e5 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -243,6 +243,7 @@
    pointer is non-null, then first four bytes of the signature
    (as found in ebx register) are returned in location pointed by sig.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline unsigned int
 __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
 {
@@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
    supported and returns 1 for valid cpuid information or 0 for
    unsupported cpuid leaf.  All pointers are required to be non-null.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline int
 __get_cpuid (unsigned int __leaf,
 	     unsigned int *__eax, unsigned int *__ebx,
@@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
 
 /* Same as above, but sub-leaf can be specified.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline int
 __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
 		   unsigned int *__eax, unsigned int *__ebx,
@@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
   return 1;
 }
 
+__attribute__ ((target("general-regs-only")))
 static __inline void
 __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
 {
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..406faf8fe03 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -25,6 +25,7 @@ do_test (void)
 }
 #endif
 
+__attribute__ ((target("general-regs-only")))
 static int
 check_osxsave (void)
 {
@@ -34,6 +35,7 @@ check_osxsave (void)
   return (ecx & bit_OSXSAVE) != 0;
 }
 
+__attribute__ ((target("general-regs-only")))
 int
 main ()
 {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 12:12 [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only H.J. Lu
@ 2021-06-24 12:35 ` Richard Biener
  2021-06-24 12:41   ` H.J. Lu
  2021-06-24 16:12 ` Uros Bizjak
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Biener @ 2021-06-24 12:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Hongtao Liu

On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> CPUID functions are used to detect CPU features.  If vector ISAs
> are enabled, compiler is free to use them in these functions.  Add
> __attribute__ ((target("general-regs-only"))) to CPUID functions
> to avoid vector instructions.

But there are GPR instructions not in x86_64, so shouldn't
we use target("march=x86_64") or so?  Note doing either will
of course prevent inlining of those "inlines".

So I'm not sure how much of a fix this is ... the error will almost
always be visible in the caller as well.

> gcc/
>
>         PR target/101185
>         * config/i386/cpuid.h (__get_cpuid_max): Add
>         __attribute__ ((target("general-regs-only"))).
>         (__get_cpuid): Likewise.
>         (__get_cpuid_count): Likewise.
>         (__cpuidex): Likewise.
>
> gcc/testsuite/
>
>         PR target/101185
>         * gcc.target/i386/avx512-check.h (check_osxsave): Add
>         __attribute__ ((target("general-regs-only"))).
>         (main): Likewise.
> ---
>  gcc/config/i386/cpuid.h                      | 4 ++++
>  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..74881ee91e5 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -243,6 +243,7 @@
>     pointer is non-null, then first four bytes of the signature
>     (as found in ebx register) are returned in location pointed by sig.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline unsigned int
>  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>  {
> @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>     supported and returns 1 for valid cpuid information or 0 for
>     unsupported cpuid leaf.  All pointers are required to be non-null.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid (unsigned int __leaf,
>              unsigned int *__eax, unsigned int *__ebx,
> @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
>
>  /* Same as above, but sub-leaf can be specified.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>                    unsigned int *__eax, unsigned int *__ebx,
> @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>    return 1;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline void
>  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..406faf8fe03 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -25,6 +25,7 @@ do_test (void)
>  }
>  #endif
>
> +__attribute__ ((target("general-regs-only")))
>  static int
>  check_osxsave (void)
>  {
> @@ -34,6 +35,7 @@ check_osxsave (void)
>    return (ecx & bit_OSXSAVE) != 0;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  int
>  main ()
>  {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 12:35 ` Richard Biener
@ 2021-06-24 12:41   ` H.J. Lu
  2021-06-24 12:47     ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Hongtao Liu

On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > CPUID functions are used to detect CPU features.  If vector ISAs
> > are enabled, compiler is free to use them in these functions.  Add
> > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > to avoid vector instructions.
>
> But there are GPR instructions not in x86_64, so shouldn't
> we use target("march=x86_64") or so?  Note doing either will
> of course prevent inlining of those "inlines".

Does -march=x86_64, which enables CMOV and other GPR
ISAs,  work for -m32?

> So I'm not sure how much of a fix this is ... the error will almost
> always be visible in the caller as well.

I think _attribute__ ((target("general-regs-only"))) is a step
forward.

> > gcc/
> >
> >         PR target/101185
> >         * config/i386/cpuid.h (__get_cpuid_max): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (__get_cpuid): Likewise.
> >         (__get_cpuid_count): Likewise.
> >         (__cpuidex): Likewise.
> >
> > gcc/testsuite/
> >
> >         PR target/101185
> >         * gcc.target/i386/avx512-check.h (check_osxsave): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (main): Likewise.
> > ---
> >  gcc/config/i386/cpuid.h                      | 4 ++++
> >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..74881ee91e5 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -243,6 +243,7 @@
> >     pointer is non-null, then first four bytes of the signature
> >     (as found in ebx register) are returned in location pointed by sig.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline unsigned int
> >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >  {
> > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >     supported and returns 1 for valid cpuid information or 0 for
> >     unsupported cpuid leaf.  All pointers are required to be non-null.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid (unsigned int __leaf,
> >              unsigned int *__eax, unsigned int *__ebx,
> > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> >
> >  /* Same as above, but sub-leaf can be specified.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >                    unsigned int *__eax, unsigned int *__ebx,
> > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >    return 1;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline void
> >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..406faf8fe03 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -25,6 +25,7 @@ do_test (void)
> >  }
> >  #endif
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static int
> >  check_osxsave (void)
> >  {
> > @@ -34,6 +35,7 @@ check_osxsave (void)
> >    return (ecx & bit_OSXSAVE) != 0;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  int
> >  main ()
> >  {
> > --
> > 2.31.1
> >



-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 12:41   ` H.J. Lu
@ 2021-06-24 12:47     ` Richard Biener
  2021-06-24 13:00       ` H.J. Lu
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2021-06-24 12:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Hongtao Liu

On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > are enabled, compiler is free to use them in these functions.  Add
> > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > to avoid vector instructions.
> >
> > But there are GPR instructions not in x86_64, so shouldn't
> > we use target("march=x86_64") or so?  Note doing either will
> > of course prevent inlining of those "inlines".
>
> Does -march=x86_64, which enables CMOV and other GPR
> ISAs,  work for -m32?

I don't think so.  I'm also not sure whether -march=xyz in a
target attribute overrides -mavx512f on the command-line ;)

> > So I'm not sure how much of a fix this is ... the error will almost
> > always be visible in the caller as well.
>
> I think _attribute__ ((target("general-regs-only"))) is a step
> forward.

That I agree to, but then the cpuid code is likely written the
way it is to allow inlining.  But code using CPUID should best compile
functions under the check with additional target attribute
(or in a separate TU) rather than compiling everything with
extra -mXYZ and trying to "disable" things in the dispatching
code (and the code leading to it!).

Richard.

> > > gcc/
> > >
> > >         PR target/101185
> > >         * config/i386/cpuid.h (__get_cpuid_max): Add
> > >         __attribute__ ((target("general-regs-only"))).
> > >         (__get_cpuid): Likewise.
> > >         (__get_cpuid_count): Likewise.
> > >         (__cpuidex): Likewise.
> > >
> > > gcc/testsuite/
> > >
> > >         PR target/101185
> > >         * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > >         __attribute__ ((target("general-regs-only"))).
> > >         (main): Likewise.
> > > ---
> > >  gcc/config/i386/cpuid.h                      | 4 ++++
> > >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> > >  2 files changed, 6 insertions(+)
> > >
> > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > > index aebc17c6827..74881ee91e5 100644
> > > --- a/gcc/config/i386/cpuid.h
> > > +++ b/gcc/config/i386/cpuid.h
> > > @@ -243,6 +243,7 @@
> > >     pointer is non-null, then first four bytes of the signature
> > >     (as found in ebx register) are returned in location pointed by sig.  */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline unsigned int
> > >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > >  {
> > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > >     supported and returns 1 for valid cpuid information or 0 for
> > >     unsupported cpuid leaf.  All pointers are required to be non-null.  */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline int
> > >  __get_cpuid (unsigned int __leaf,
> > >              unsigned int *__eax, unsigned int *__ebx,
> > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> > >
> > >  /* Same as above, but sub-leaf can be specified.  */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline int
> > >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > >                    unsigned int *__eax, unsigned int *__ebx,
> > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > >    return 1;
> > >  }
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline void
> > >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> > >  {
> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > index 0a377dba1d5..406faf8fe03 100644
> > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > @@ -25,6 +25,7 @@ do_test (void)
> > >  }
> > >  #endif
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static int
> > >  check_osxsave (void)
> > >  {
> > > @@ -34,6 +35,7 @@ check_osxsave (void)
> > >    return (ecx & bit_OSXSAVE) != 0;
> > >  }
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  int
> > >  main ()
> > >  {
> > > --
> > > 2.31.1
> > >
>
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 12:47     ` Richard Biener
@ 2021-06-24 13:00       ` H.J. Lu
  0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2021-06-24 13:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Hongtao Liu

On Thu, Jun 24, 2021 at 5:47 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > are enabled, compiler is free to use them in these functions.  Add
> > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > to avoid vector instructions.
> > >
> > > But there are GPR instructions not in x86_64, so shouldn't
> > > we use target("march=x86_64") or so?  Note doing either will
> > > of course prevent inlining of those "inlines".
> >
> > Does -march=x86_64, which enables CMOV and other GPR
> > ISAs,  work for -m32?
>
> I don't think so.  I'm also not sure whether -march=xyz in a
> target attribute overrides -mavx512f on the command-line ;)
>
> > > So I'm not sure how much of a fix this is ... the error will almost
> > > always be visible in the caller as well.
> >
> > I think _attribute__ ((target("general-regs-only"))) is a step
> > forward.
>
> That I agree to, but then the cpuid code is likely written the
> way it is to allow inlining.  But code using CPUID should best compile
> functions under the check with additional target attribute
> (or in a separate TU) rather than compiling everything with
> extra -mXYZ and trying to "disable" things in the dispatching
> code (and the code leading to it!).

CPUID checks in GCC tests should be compiled with noinline, ...
plus minimum ISAs allowed.

> Richard.
>
> > > > gcc/
> > > >
> > > >         PR target/101185
> > > >         * config/i386/cpuid.h (__get_cpuid_max): Add
> > > >         __attribute__ ((target("general-regs-only"))).
> > > >         (__get_cpuid): Likewise.
> > > >         (__get_cpuid_count): Likewise.
> > > >         (__cpuidex): Likewise.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >         PR target/101185
> > > >         * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > > >         __attribute__ ((target("general-regs-only"))).
> > > >         (main): Likewise.
> > > > ---
> > > >  gcc/config/i386/cpuid.h                      | 4 ++++
> > > >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > > > index aebc17c6827..74881ee91e5 100644
> > > > --- a/gcc/config/i386/cpuid.h
> > > > +++ b/gcc/config/i386/cpuid.h
> > > > @@ -243,6 +243,7 @@
> > > >     pointer is non-null, then first four bytes of the signature
> > > >     (as found in ebx register) are returned in location pointed by sig.  */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline unsigned int
> > > >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > > >  {
> > > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > > >     supported and returns 1 for valid cpuid information or 0 for
> > > >     unsupported cpuid leaf.  All pointers are required to be non-null.  */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline int
> > > >  __get_cpuid (unsigned int __leaf,
> > > >              unsigned int *__eax, unsigned int *__ebx,
> > > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> > > >
> > > >  /* Same as above, but sub-leaf can be specified.  */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline int
> > > >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > > >                    unsigned int *__eax, unsigned int *__ebx,
> > > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > > >    return 1;
> > > >  }
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline void
> > > >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> > > >  {
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > index 0a377dba1d5..406faf8fe03 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > @@ -25,6 +25,7 @@ do_test (void)
> > > >  }
> > > >  #endif
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static int
> > > >  check_osxsave (void)
> > > >  {
> > > > @@ -34,6 +35,7 @@ check_osxsave (void)
> > > >    return (ecx & bit_OSXSAVE) != 0;
> > > >  }
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  int
> > > >  main ()
> > > >  {
> > > > --
> > > > 2.31.1
> > > >
> >
> >
> >
> > --
> > H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 12:12 [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only H.J. Lu
  2021-06-24 12:35 ` Richard Biener
@ 2021-06-24 16:12 ` Uros Bizjak
  2021-06-24 18:00   ` H.J. Lu
  2021-06-25  2:56   ` Hongtao Liu
  1 sibling, 2 replies; 13+ messages in thread
From: Uros Bizjak @ 2021-06-24 16:12 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Hongtao Liu

On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> CPUID functions are used to detect CPU features.  If vector ISAs
> are enabled, compiler is free to use them in these functions.  Add
> __attribute__ ((target("general-regs-only"))) to CPUID functions
> to avoid vector instructions.

These functions are intended to be inlined, so how does target
attribute affect inlining?

Uros.

>
> gcc/
>
>         PR target/101185
>         * config/i386/cpuid.h (__get_cpuid_max): Add
>         __attribute__ ((target("general-regs-only"))).
>         (__get_cpuid): Likewise.
>         (__get_cpuid_count): Likewise.
>         (__cpuidex): Likewise.
>
> gcc/testsuite/
>
>         PR target/101185
>         * gcc.target/i386/avx512-check.h (check_osxsave): Add
>         __attribute__ ((target("general-regs-only"))).
>         (main): Likewise.
> ---
>  gcc/config/i386/cpuid.h                      | 4 ++++
>  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..74881ee91e5 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -243,6 +243,7 @@
>     pointer is non-null, then first four bytes of the signature
>     (as found in ebx register) are returned in location pointed by sig.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline unsigned int
>  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>  {
> @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>     supported and returns 1 for valid cpuid information or 0 for
>     unsupported cpuid leaf.  All pointers are required to be non-null.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid (unsigned int __leaf,
>              unsigned int *__eax, unsigned int *__ebx,
> @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
>
>  /* Same as above, but sub-leaf can be specified.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>                    unsigned int *__eax, unsigned int *__ebx,
> @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>    return 1;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline void
>  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..406faf8fe03 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -25,6 +25,7 @@ do_test (void)
>  }
>  #endif
>
> +__attribute__ ((target("general-regs-only")))
>  static int
>  check_osxsave (void)
>  {
> @@ -34,6 +35,7 @@ check_osxsave (void)
>    return (ecx & bit_OSXSAVE) != 0;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  int
>  main ()
>  {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 16:12 ` Uros Bizjak
@ 2021-06-24 18:00   ` H.J. Lu
  2021-06-25  2:56   ` Hongtao Liu
  1 sibling, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2021-06-24 18:00 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Hongtao Liu

On Thu, Jun 24, 2021 at 9:12 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > CPUID functions are used to detect CPU features.  If vector ISAs
> > are enabled, compiler is free to use them in these functions.  Add
> > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > to avoid vector instructions.
>
> These functions are intended to be inlined, so how does target
> attribute affect inlining?

CPUID function won't be inlined.   Then we need to rewrite x86 tests
to put CPUID check in a separate function with target attribute to
limit ISAs used.

> Uros.
>
> >
> > gcc/
> >
> >         PR target/101185
> >         * config/i386/cpuid.h (__get_cpuid_max): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (__get_cpuid): Likewise.
> >         (__get_cpuid_count): Likewise.
> >         (__cpuidex): Likewise.
> >
> > gcc/testsuite/
> >
> >         PR target/101185
> >         * gcc.target/i386/avx512-check.h (check_osxsave): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (main): Likewise.
> > ---
> >  gcc/config/i386/cpuid.h                      | 4 ++++
> >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..74881ee91e5 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -243,6 +243,7 @@
> >     pointer is non-null, then first four bytes of the signature
> >     (as found in ebx register) are returned in location pointed by sig.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline unsigned int
> >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >  {
> > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >     supported and returns 1 for valid cpuid information or 0 for
> >     unsupported cpuid leaf.  All pointers are required to be non-null.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid (unsigned int __leaf,
> >              unsigned int *__eax, unsigned int *__ebx,
> > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> >
> >  /* Same as above, but sub-leaf can be specified.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >                    unsigned int *__eax, unsigned int *__ebx,
> > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >    return 1;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline void
> >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..406faf8fe03 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -25,6 +25,7 @@ do_test (void)
> >  }
> >  #endif
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static int
> >  check_osxsave (void)
> >  {
> > @@ -34,6 +35,7 @@ check_osxsave (void)
> >    return (ecx & bit_OSXSAVE) != 0;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  int
> >  main ()
> >  {
> > --
> > 2.31.1
> >



-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-24 16:12 ` Uros Bizjak
  2021-06-24 18:00   ` H.J. Lu
@ 2021-06-25  2:56   ` Hongtao Liu
  2021-06-25  7:49     ` Uros Bizjak
  1 sibling, 1 reply; 13+ messages in thread
From: Hongtao Liu @ 2021-06-25  2:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, Hongtao Liu, gcc-patches

On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > CPUID functions are used to detect CPU features.  If vector ISAs
> > are enabled, compiler is free to use them in these functions.  Add
> > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > to avoid vector instructions.
>
> These functions are intended to be inlined, so how does target
> attribute affect inlining?
I guess w/ -O0. they may not be inlined, that's why H.J adds those
attributes to those functions.

pr96814.dump:
0804aa40 <main>:
 804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
...
 804aa63: 6a 07                push   $0x7
 804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>

Also we need to add a target attribute to avx512f_os_support (), and
that would be enough to fix the AVX512 part.

Moreover, all check functions in below files may also need to deal with:
adx-check.h
aes-avx-check.h
aes-check.h
amx-check.h
attr-nocf-check-1a.c
attr-nocf-check-3a.c
avx2-check.h
avx2-vpop-check.h
avx512bw-check.h
avx512-check.h
avx512dq-check.h
avx512er-check.h
avx512f-check.h
avx512vl-check.h
avx-check.h
bmi2-check.h
bmi-check.h
cf_check-1.c
cf_check-2.c
cf_check-3.c
cf_check-4.c
cf_check-5.c
f16c-check.h
fma4-check.h
fma-check.h
isa-check.h
lzcnt-check.h
m128-check.h
m256-check.h
m512-check.h
mmx-3dnow-check.h
mmx-check.h
pclmul-avx-check.h
pclmul-check.h
pr39315-check.c
rtm-check.h
sha-check.h
spellcheck-options-1.c
spellcheck-options-2.c
spellcheck-options-3.c
spellcheck-options-4.c
spellcheck-options-5.c
sse2-check.h
sse3-check.h
sse4_1-check.h
sse4_2-check.h
sse4a-check.h
sse-check.h
ssse3-check.h
stack-check-11.c
stack-check-12.c
stack-check-17.c
stack-check-18.c
stack-check-19.c
xop-check.h

>
> Uros.
>
> >
> > gcc/
> >
> >         PR target/101185
> >         * config/i386/cpuid.h (__get_cpuid_max): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (__get_cpuid): Likewise.
> >         (__get_cpuid_count): Likewise.
> >         (__cpuidex): Likewise.
> >
> > gcc/testsuite/
> >
> >         PR target/101185
> >         * gcc.target/i386/avx512-check.h (check_osxsave): Add
> >         __attribute__ ((target("general-regs-only"))).
> >         (main): Likewise.
> > ---
> >  gcc/config/i386/cpuid.h                      | 4 ++++
> >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..74881ee91e5 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -243,6 +243,7 @@
> >     pointer is non-null, then first four bytes of the signature
> >     (as found in ebx register) are returned in location pointed by sig.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline unsigned int
> >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >  {
> > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >     supported and returns 1 for valid cpuid information or 0 for
> >     unsupported cpuid leaf.  All pointers are required to be non-null.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid (unsigned int __leaf,
> >              unsigned int *__eax, unsigned int *__ebx,
> > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> >
> >  /* Same as above, but sub-leaf can be specified.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >                    unsigned int *__eax, unsigned int *__ebx,
> > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >    return 1;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline void
> >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..406faf8fe03 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -25,6 +25,7 @@ do_test (void)
> >  }
> >  #endif
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static int
> >  check_osxsave (void)
> >  {
> > @@ -34,6 +35,7 @@ check_osxsave (void)
> >    return (ecx & bit_OSXSAVE) != 0;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  int
> >  main ()
> >  {
> > --
> > 2.31.1
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
  2021-06-25  2:56   ` Hongtao Liu
@ 2021-06-25  7:49     ` Uros Bizjak
  2021-06-25 12:39       ` [PATCH v2] x86: Check AVX512 without mask instructions H.J. Lu
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2021-06-25  7:49 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: H.J. Lu, Hongtao Liu, gcc-patches

On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > are enabled, compiler is free to use them in these functions.  Add
> > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > to avoid vector instructions.
> >
> > These functions are intended to be inlined, so how does target
> > attribute affect inlining?
> I guess w/ -O0. they may not be inlined, that's why H.J adds those
> attributes to those functions.

The problem is not with these functions, but with surrounding checks
for cpuid features. These checks are implemented with logic
instructions, and nothing prevents RA from allocating mask registers,
and consequently mask insn is emitted. Regarding mentioned functions,
cpuid insn pattern has four GPR single-reg constraints, so mask
registers can't be allocated here.

> pr96814.dump:
> 0804aa40 <main>:
>  804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
> ...
>  804aa63: 6a 07                push   $0x7
>  804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>
>
> Also we need to add a target attribute to avx512f_os_support (), and
> that would be enough to fix the AVX512 part.
>
> Moreover, all check functions in below files may also need to deal with:
> adx-check.h
> aes-avx-check.h
> aes-check.h
> amx-check.h
> attr-nocf-check-1a.c
> attr-nocf-check-3a.c
> avx2-check.h
> avx2-vpop-check.h
> avx512bw-check.h
> avx512-check.h
> avx512dq-check.h
> avx512er-check.h
> avx512f-check.h
> avx512vl-check.h
> avx-check.h
> bmi2-check.h
> bmi-check.h
> cf_check-1.c
> cf_check-2.c
> cf_check-3.c
> cf_check-4.c
> cf_check-5.c
> f16c-check.h
> fma4-check.h
> fma-check.h
> isa-check.h
> lzcnt-check.h
> m128-check.h
> m256-check.h
> m512-check.h
> mmx-3dnow-check.h
> mmx-check.h
> pclmul-avx-check.h
> pclmul-check.h
> pr39315-check.c
> rtm-check.h
> sha-check.h
> spellcheck-options-1.c
> spellcheck-options-2.c
> spellcheck-options-3.c
> spellcheck-options-4.c
> spellcheck-options-5.c
> sse2-check.h
> sse3-check.h
> sse4_1-check.h
> sse4_2-check.h
> sse4a-check.h
> sse-check.h
> ssse3-check.h
> stack-check-11.c
> stack-check-12.c
> stack-check-17.c
> stack-check-18.c
> stack-check-19.c
> xop-check.h

True, but this would just paper over the real problem. Now, it is
expected that the user decorates the function that checks CPUID
features with the target attribute. I'm not sure if this is OK.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] x86: Check AVX512 without mask instructions
  2021-06-25  7:49     ` Uros Bizjak
@ 2021-06-25 12:39       ` H.J. Lu
  2021-07-14 12:27         ` PING^1 " H.J. Lu
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2021-06-25 12:39 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongtao Liu, Hongtao Liu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]

On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > are enabled, compiler is free to use them in these functions.  Add
> > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > to avoid vector instructions.
> > >
> > > These functions are intended to be inlined, so how does target
> > > attribute affect inlining?
> > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > attributes to those functions.
>
> The problem is not with these functions, but with surrounding checks
> for cpuid features. These checks are implemented with logic
> instructions, and nothing prevents RA from allocating mask registers,
> and consequently mask insn is emitted. Regarding mentioned functions,
> cpuid insn pattern has four GPR single-reg constraints, so mask
> registers can't be allocated here.
>
> > pr96814.dump:
> > 0804aa40 <main>:
> >  804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
> > ...
> >  804aa63: 6a 07                push   $0x7
> >  804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>
> >
> > Also we need to add a target attribute to avx512f_os_support (), and
> > that would be enough to fix the AVX512 part.
> >
> > Moreover, all check functions in below files may also need to deal with:
> > adx-check.h
> > aes-avx-check.h
> > aes-check.h
> > amx-check.h
> > attr-nocf-check-1a.c
> > attr-nocf-check-3a.c
> > avx2-check.h
> > avx2-vpop-check.h
> > avx512bw-check.h
> > avx512-check.h
> > avx512dq-check.h
> > avx512er-check.h
> > avx512f-check.h
> > avx512vl-check.h
> > avx-check.h
> > bmi2-check.h
> > bmi-check.h
> > cf_check-1.c
> > cf_check-2.c
> > cf_check-3.c
> > cf_check-4.c
> > cf_check-5.c
> > f16c-check.h
> > fma4-check.h
> > fma-check.h
> > isa-check.h
> > lzcnt-check.h
> > m128-check.h
> > m256-check.h
> > m512-check.h
> > mmx-3dnow-check.h
> > mmx-check.h
> > pclmul-avx-check.h
> > pclmul-check.h
> > pr39315-check.c
> > rtm-check.h
> > sha-check.h
> > spellcheck-options-1.c
> > spellcheck-options-2.c
> > spellcheck-options-3.c
> > spellcheck-options-4.c
> > spellcheck-options-5.c
> > sse2-check.h
> > sse3-check.h
> > sse4_1-check.h
> > sse4_2-check.h
> > sse4a-check.h
> > sse-check.h
> > ssse3-check.h
> > stack-check-11.c
> > stack-check-12.c
> > stack-check-17.c
> > stack-check-18.c
> > stack-check-19.c
> > xop-check.h
>
> True, but this would just paper over the real problem. Now, it is
> expected that the user decorates the function that checks CPUID
> features with the target attribute. I'm not sure if this is OK.
>
> Uros.

CPUID functions are used to detect CPU features.  If mask instructions
are enabled, compiler is free to use them in these functions.  Disable
AVX512F in AVX512 check with target pragma to avoid mask instructions.

OK for master?

Thanks.

-- 
H.J.

[-- Attachment #2: v2-0001-x86-Check-AVX512-without-mask-instructions.patch --]
[-- Type: text/x-patch, Size: 1317 bytes --]

From 7961c445f27de3e813d7332afc14e9844c0831f7 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 24 Jun 2021 04:43:41 -0700
Subject: [PATCH v2] x86: Check AVX512 without mask instructions

CPUID functions are used to detect CPU features.  If mask instructions
are enabled, compiler is free to use them in these functions.  Disable
AVX512F in AVX512 check with target pragma to avoid mask instructions.

	PR target/101185
	* gcc.target/i386/avx512-check.h: Disable AVX512F in AVX512 check
	with target pragma.
---
 gcc/testsuite/gcc.target/i386/avx512-check.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..7ccf730c4f1 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -1,5 +1,8 @@
 #include <stdlib.h>
+#pragma GCC push_options
+#pragma GCC target ("no-mmx,no-sse")
 #include "cpuid.h"
+#pragma GCC pop_options
 #include "m512-check.h"
 #include "avx512f-os-support.h"
 
@@ -25,6 +28,9 @@ do_test (void)
 }
 #endif
 
+#pragma GCC push_options
+#pragma GCC target ("no-mmx,no-sse")
+
 static int
 check_osxsave (void)
 {
@@ -110,3 +116,4 @@ main ()
 #endif
   return 0;
 }
+#pragma GCC pop_options
-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* PING^1 [PATCH v2] x86: Check AVX512 without mask instructions
  2021-06-25 12:39       ` [PATCH v2] x86: Check AVX512 without mask instructions H.J. Lu
@ 2021-07-14 12:27         ` H.J. Lu
  2021-07-26  3:33           ` Hongtao Liu
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2021-07-14 12:27 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongtao Liu, Hongtao Liu, gcc-patches

On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > > are enabled, compiler is free to use them in these functions.  Add
> > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > > to avoid vector instructions.
> > > >
> > > > These functions are intended to be inlined, so how does target
> > > > attribute affect inlining?
> > > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > > attributes to those functions.
> >
> > The problem is not with these functions, but with surrounding checks
> > for cpuid features. These checks are implemented with logic
> > instructions, and nothing prevents RA from allocating mask registers,
> > and consequently mask insn is emitted. Regarding mentioned functions,
> > cpuid insn pattern has four GPR single-reg constraints, so mask
> > registers can't be allocated here.
> >
> > > pr96814.dump:
> > > 0804aa40 <main>:
> > >  804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
> > > ...
> > >  804aa63: 6a 07                push   $0x7
> > >  804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>
> > >
> > > Also we need to add a target attribute to avx512f_os_support (), and
> > > that would be enough to fix the AVX512 part.
> > >
> > > Moreover, all check functions in below files may also need to deal with:
> > > adx-check.h
> > > aes-avx-check.h
> > > aes-check.h
> > > amx-check.h
> > > attr-nocf-check-1a.c
> > > attr-nocf-check-3a.c
> > > avx2-check.h
> > > avx2-vpop-check.h
> > > avx512bw-check.h
> > > avx512-check.h
> > > avx512dq-check.h
> > > avx512er-check.h
> > > avx512f-check.h
> > > avx512vl-check.h
> > > avx-check.h
> > > bmi2-check.h
> > > bmi-check.h
> > > cf_check-1.c
> > > cf_check-2.c
> > > cf_check-3.c
> > > cf_check-4.c
> > > cf_check-5.c
> > > f16c-check.h
> > > fma4-check.h
> > > fma-check.h
> > > isa-check.h
> > > lzcnt-check.h
> > > m128-check.h
> > > m256-check.h
> > > m512-check.h
> > > mmx-3dnow-check.h
> > > mmx-check.h
> > > pclmul-avx-check.h
> > > pclmul-check.h
> > > pr39315-check.c
> > > rtm-check.h
> > > sha-check.h
> > > spellcheck-options-1.c
> > > spellcheck-options-2.c
> > > spellcheck-options-3.c
> > > spellcheck-options-4.c
> > > spellcheck-options-5.c
> > > sse2-check.h
> > > sse3-check.h
> > > sse4_1-check.h
> > > sse4_2-check.h
> > > sse4a-check.h
> > > sse-check.h
> > > ssse3-check.h
> > > stack-check-11.c
> > > stack-check-12.c
> > > stack-check-17.c
> > > stack-check-18.c
> > > stack-check-19.c
> > > xop-check.h
> >
> > True, but this would just paper over the real problem. Now, it is
> > expected that the user decorates the function that checks CPUID
> > features with the target attribute. I'm not sure if this is OK.
> >
> > Uros.
>
> CPUID functions are used to detect CPU features.  If mask instructions
> are enabled, compiler is free to use them in these functions.  Disable
> AVX512F in AVX512 check with target pragma to avoid mask instructions.
>
> OK for master?
>

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573717.html


-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PING^1 [PATCH v2] x86: Check AVX512 without mask instructions
  2021-07-14 12:27         ` PING^1 " H.J. Lu
@ 2021-07-26  3:33           ` Hongtao Liu
  2021-07-30 10:02             ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Hongtao Liu @ 2021-07-26  3:33 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Hongtao Liu, gcc-patches

On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > > >
> > > > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > > > are enabled, compiler is free to use them in these functions.  Add
> > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > > > to avoid vector instructions.
> > > > >
> > > > > These functions are intended to be inlined, so how does target
> > > > > attribute affect inlining?
> > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > > > attributes to those functions.
> > >
> > > The problem is not with these functions, but with surrounding checks
> > > for cpuid features. These checks are implemented with logic
> > > instructions, and nothing prevents RA from allocating mask registers,
> > > and consequently mask insn is emitted. Regarding mentioned functions,
> > > cpuid insn pattern has four GPR single-reg constraints, so mask
> > > registers can't be allocated here.
> > >
> > > > pr96814.dump:
> > > > 0804aa40 <main>:
> > > >  804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
> > > > ...
> > > >  804aa63: 6a 07                push   $0x7
> > > >  804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>
> > > >
> > > > Also we need to add a target attribute to avx512f_os_support (), and
> > > > that would be enough to fix the AVX512 part.
> > > >
> > > > Moreover, all check functions in below files may also need to deal with:
> > > > adx-check.h
> > > > aes-avx-check.h
> > > > aes-check.h
> > > > amx-check.h
> > > > attr-nocf-check-1a.c
> > > > attr-nocf-check-3a.c
> > > > avx2-check.h
> > > > avx2-vpop-check.h
> > > > avx512bw-check.h
> > > > avx512-check.h
> > > > avx512dq-check.h
> > > > avx512er-check.h
> > > > avx512f-check.h
> > > > avx512vl-check.h
> > > > avx-check.h
> > > > bmi2-check.h
> > > > bmi-check.h
> > > > cf_check-1.c
> > > > cf_check-2.c
> > > > cf_check-3.c
> > > > cf_check-4.c
> > > > cf_check-5.c
> > > > f16c-check.h
> > > > fma4-check.h
> > > > fma-check.h
> > > > isa-check.h
> > > > lzcnt-check.h
> > > > m128-check.h
> > > > m256-check.h
> > > > m512-check.h
> > > > mmx-3dnow-check.h
> > > > mmx-check.h
> > > > pclmul-avx-check.h
> > > > pclmul-check.h
> > > > pr39315-check.c
> > > > rtm-check.h
> > > > sha-check.h
> > > > spellcheck-options-1.c
> > > > spellcheck-options-2.c
> > > > spellcheck-options-3.c
> > > > spellcheck-options-4.c
> > > > spellcheck-options-5.c
> > > > sse2-check.h
> > > > sse3-check.h
> > > > sse4_1-check.h
> > > > sse4_2-check.h
> > > > sse4a-check.h
> > > > sse-check.h
> > > > ssse3-check.h
> > > > stack-check-11.c
> > > > stack-check-12.c
> > > > stack-check-17.c
> > > > stack-check-18.c
> > > > stack-check-19.c
> > > > xop-check.h
> > >
> > > True, but this would just paper over the real problem. Now, it is
> > > expected that the user decorates the function that checks CPUID
> > > features with the target attribute. I'm not sure if this is OK.
vmovw is enabled by AVX512FP16, and compile cpuid check function w/
avx512fp16 may result in SIGILL on non-avx512fp16 target(though, we
didn't get a testcase yet).
Would that be a sufficient reason to disable avx512 for cpuid check?
> > >
> > > Uros.
> >
> > CPUID functions are used to detect CPU features.  If mask instructions
> > are enabled, compiler is free to use them in these functions.  Disable
> > AVX512F in AVX512 check with target pragma to avoid mask instructions.
> >
> > OK for master?
> >
>
> PING:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573717.html
>
>
> --
> H.J.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PING^1 [PATCH v2] x86: Check AVX512 without mask instructions
  2021-07-26  3:33           ` Hongtao Liu
@ 2021-07-30 10:02             ` Uros Bizjak
  0 siblings, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2021-07-30 10:02 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: H.J. Lu, Hongtao Liu, gcc-patches

On Mon, Jul 26, 2021 at 5:33 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > > > >
> > > > > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > > > > are enabled, compiler is free to use them in these functions.  Add
> > > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > > > > to avoid vector instructions.
> > > > > >
> > > > > > These functions are intended to be inlined, so how does target
> > > > > > attribute affect inlining?
> > > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > > > > attributes to those functions.
> > > >
> > > > The problem is not with these functions, but with surrounding checks
> > > > for cpuid features. These checks are implemented with logic
> > > > instructions, and nothing prevents RA from allocating mask registers,
> > > > and consequently mask insn is emitted. Regarding mentioned functions,
> > > > cpuid insn pattern has four GPR single-reg constraints, so mask
> > > > registers can't be allocated here.
> > > >
> > > > > pr96814.dump:
> > > > > 0804aa40 <main>:
> > > > >  804aa40: 8d 4c 24 04          lea    0x4(%esp),%ecx
> > > > > ...
> > > > >  804aa63: 6a 07                push   $0x7
> > > > >  804aa65: e8 e0 e7 ff ff        call   804924a <__get_cpuid_count>
> > > > >
> > > > > Also we need to add a target attribute to avx512f_os_support (), and
> > > > > that would be enough to fix the AVX512 part.
> > > > >
> > > > > Moreover, all check functions in below files may also need to deal with:
> > > > > adx-check.h
> > > > > aes-avx-check.h
> > > > > aes-check.h
> > > > > amx-check.h
> > > > > attr-nocf-check-1a.c
> > > > > attr-nocf-check-3a.c
> > > > > avx2-check.h
> > > > > avx2-vpop-check.h
> > > > > avx512bw-check.h
> > > > > avx512-check.h
> > > > > avx512dq-check.h
> > > > > avx512er-check.h
> > > > > avx512f-check.h
> > > > > avx512vl-check.h
> > > > > avx-check.h
> > > > > bmi2-check.h
> > > > > bmi-check.h
> > > > > cf_check-1.c
> > > > > cf_check-2.c
> > > > > cf_check-3.c
> > > > > cf_check-4.c
> > > > > cf_check-5.c
> > > > > f16c-check.h
> > > > > fma4-check.h
> > > > > fma-check.h
> > > > > isa-check.h
> > > > > lzcnt-check.h
> > > > > m128-check.h
> > > > > m256-check.h
> > > > > m512-check.h
> > > > > mmx-3dnow-check.h
> > > > > mmx-check.h
> > > > > pclmul-avx-check.h
> > > > > pclmul-check.h
> > > > > pr39315-check.c
> > > > > rtm-check.h
> > > > > sha-check.h
> > > > > spellcheck-options-1.c
> > > > > spellcheck-options-2.c
> > > > > spellcheck-options-3.c
> > > > > spellcheck-options-4.c
> > > > > spellcheck-options-5.c
> > > > > sse2-check.h
> > > > > sse3-check.h
> > > > > sse4_1-check.h
> > > > > sse4_2-check.h
> > > > > sse4a-check.h
> > > > > sse-check.h
> > > > > ssse3-check.h
> > > > > stack-check-11.c
> > > > > stack-check-12.c
> > > > > stack-check-17.c
> > > > > stack-check-18.c
> > > > > stack-check-19.c
> > > > > xop-check.h
> > > >
> > > > True, but this would just paper over the real problem. Now, it is
> > > > expected that the user decorates the function that checks CPUID
> > > > features with the target attribute. I'm not sure if this is OK.
> vmovw is enabled by AVX512FP16, and compile cpuid check function w/
> avx512fp16 may result in SIGILL on non-avx512fp16 target(though, we
> didn't get a testcase yet).

In struct processor_costs (i386.h) we have:

      const int sse_to_integer;    /* cost of moving SSE register to
integer.  */
      const int integer_to_sse;    /* cost of moving integer register to SSE. */
      const int mask_to_integer; /* cost of moving mask register to integer.  */
      const int integer_to_mask; /* cost of moving integer register to mask.  */

These are currently set sufficiently high, so we won't get vmovw for
the same reason we don't get vmovd and vmovq.

> Would that be a sufficient reason to disable avx512 for cpuid check?

We would like to avoid inter-unit moves, and keep values in their
respective register set as much as possible. This is the reason for
relatively high values for the above costs and special passes were
introduced (STV) to avoid excessive moves between register sets.
Without this approach, register allocator is free to generate e.g.
instructions with mask registers instead of integer registers
(especially under register pressure), trading spills with inter-unit
moves.

We tried to spill to SSE registers, and the experiment ended with a
nice list of PRs. See  ix86_spill_class in i386.c.

Decorating the function with general_regs_only would just paper over
the above problem. Regarding mask registers, some sort of STV-like
target pass is needed to control precisely how values are moved
between register sets, loaded to mask registers, handled there and
stored.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-30 10:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 12:12 [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only H.J. Lu
2021-06-24 12:35 ` Richard Biener
2021-06-24 12:41   ` H.J. Lu
2021-06-24 12:47     ` Richard Biener
2021-06-24 13:00       ` H.J. Lu
2021-06-24 16:12 ` Uros Bizjak
2021-06-24 18:00   ` H.J. Lu
2021-06-25  2:56   ` Hongtao Liu
2021-06-25  7:49     ` Uros Bizjak
2021-06-25 12:39       ` [PATCH v2] x86: Check AVX512 without mask instructions H.J. Lu
2021-07-14 12:27         ` PING^1 " H.J. Lu
2021-07-26  3:33           ` Hongtao Liu
2021-07-30 10:02             ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).