public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
@ 2023-08-10  1:11 liuhongt
  2023-08-10  1:47 ` Xi Ruoyao
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: liuhongt @ 2023-08-10  1:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, ubizjak, hubicka

Currently we have 3 different independent tunes for gather
"use_gather,use_gather_2parts,use_gather_4parts",
similar for scatter, there're
"use_scatter,use_scatter_2parts,use_scatter_4parts"

The patch support 2 standardizing options to enable/disable
vectorization for all gather/scatter instructions. The options is
interpreted by driver to 3 tunes.

bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

	* config/i386/i386.h (DRIVER_SELF_SPECS): Add
	GATHER_SCATTER_DRIVER_SELF_SPECS.
	(GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
	* config/i386/i386.opt (mgather): New option.
	(mscatter): Ditto.
---
 gcc/config/i386/i386.h   | 12 +++++++++++-
 gcc/config/i386/i386.opt |  8 ++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..d9ac2c29bde 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
 # define SUBTARGET_DRIVER_SELF_SPECS ""
 #endif
 
-#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
+#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
+# define GATHER_SCATTER_DRIVER_SELF_SPECS \
+  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
+   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
+   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
+   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
+#endif
+
+#define DRIVER_SELF_SPECS \
+  SUBTARGET_DRIVER_SELF_SPECS " " \
+  GATHER_SCATTER_DRIVER_SELF_SPECS
 
 /* -march=native handling only makes sense with compiler running on
    an x86 or x86_64 chip.  If changing this condition, also change
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index ddb7f110aa2..99948644a8d 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -424,6 +424,14 @@ mdaz-ftz
 Target
 Set the FTZ and DAZ Flags.
 
+mgather
+Target
+Enable vectorization for gather instruction.
+
+mscatter
+Target
+Enable vectorization for scatter instruction.
+
 mpreferred-stack-boundary=
 Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
 Attempt to keep stack aligned to this power of 2.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt
@ 2023-08-10  1:47 ` Xi Ruoyao
  2023-08-10  1:52   ` Liu, Hongtao
  2023-08-10  6:04 ` Uros Bizjak
  2023-08-10  7:39 ` Richard Biener
  2 siblings, 1 reply; 18+ messages in thread
From: Xi Ruoyao @ 2023-08-10  1:47 UTC (permalink / raw)
  To: liuhongt, gcc-patches; +Cc: richard.guenther, ubizjak, hubicka

On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
> 
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
> 
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

And should we set -mno-gather as the default for GDS affected
processors?  We'll likely apply the ucode update for them, and then the
gathering instructions will be much slower.

> gcc/ChangeLog:
> 
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>  
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-
> ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-
> ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-
> ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-
> ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>  
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>  
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger
> Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  1:47 ` Xi Ruoyao
@ 2023-08-10  1:52   ` Liu, Hongtao
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Hongtao @ 2023-08-10  1:52 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: richard.guenther, ubizjak, hubicka



> -----Original Message-----
> From: Xi Ruoyao <xry111@xry111.site>
> Sent: Thursday, August 10, 2023 9:48 AM
> To: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org
> Cc: richard.guenther@gmail.com; ubizjak@gmail.com; hubicka@ucw.cz
> Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable
> vectorization for all gather/scatter instructions.
> 
> On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> 
> And should we set -mno-gather as the default for GDS affected processors?
> We'll likely apply the ucode update for them, and then the gathering
> instructions will be much slower.
Assume you're talking about https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
Yes, there will be an separate patch for microarchitecture tuning.
> 
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> >         GATHER_SCATTER_DRIVER_SELF_SPECS.
> >         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> >         * config/i386/i386.opt (mgather): New option.
> >         (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++++++++++-
> >  gcc/config/i386/i386.opt |  8 ++++++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define
> > +GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-
> > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-
> > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-
> > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-
> > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger
> > Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> 
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt
  2023-08-10  1:47 ` Xi Ruoyao
@ 2023-08-10  6:04 ` Uros Bizjak
  2023-08-10  6:12   ` Hongtao Liu
  2023-08-10  7:39 ` Richard Biener
  2 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2023-08-10  6:04 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, richard.guenther, hubicka

On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
>
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.

Are gather and scatter instructions affected in a separate way, or
should we use one -mgather-scatter option to cover all gather/scatter
tunings?

Uros.

> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  6:04 ` Uros Bizjak
@ 2023-08-10  6:12   ` Hongtao Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Hongtao Liu @ 2023-08-10  6:12 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: liuhongt, gcc-patches, richard.guenther, hubicka

On Thu, Aug 10, 2023 at 2:04 PM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> >         GATHER_SCATTER_DRIVER_SELF_SPECS.
> >         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> >         * config/i386/i386.opt (mgather): New option.
> >         (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++++++++++-
> >  gcc/config/i386/i386.opt |  8 ++++++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
>
> Are gather and scatter instructions affected in a separate way, or
> should we use one -mgather-scatter option to cover all gather/scatter
> tunings?
A separate way.
Gather Data Sampling is only for gather.
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
>
> Uros.
>
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> > --
> > 2.31.1
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt
  2023-08-10  1:47 ` Xi Ruoyao
  2023-08-10  6:04 ` Uros Bizjak
@ 2023-08-10  7:39 ` Richard Biener
  2023-08-10  7:42   ` Uros Bizjak
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-08-10  7:39 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, ubizjak, hubicka

On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
>
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

I think -mgather/-mscatter are too close to -mfma suggesting they
enable part of an ISA but they won't disable the use of intrinsics
or enable gather/scatter on CPUs where the ISA doesn't have them.

May I suggest to invent a more generic "short-cut" to
-mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
tunables add ^use_gather_any to cover all cases?  (or
change what use_gather controls - it seems we changed its
meaning before, and instead add use_gather_8parts and
use_gather_16parts)

That is, what's the point of this?

Richard.

> gcc/ChangeLog:
>
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:39 ` Richard Biener
@ 2023-08-10  7:42   ` Uros Bizjak
  2023-08-10  7:47     ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2023-08-10  7:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
>
> I think -mgather/-mscatter are too close to -mfma suggesting they
> enable part of an ISA but they won't disable the use of intrinsics
> or enable gather/scatter on CPUs where the ISA doesn't have them.
>
> May I suggest to invent a more generic "short-cut" to
> -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> tunables add ^use_gather_any to cover all cases?  (or
> change what use_gather controls - it seems we changed its
> meaning before, and instead add use_gather_8parts and
> use_gather_16parts)
>
> That is, what's the point of this?

https://www.phoronix.com/review/downfall

that caused:

https://www.phoronix.com/review/intel-downfall-benchmarks

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:42   ` Uros Bizjak
@ 2023-08-10  7:47     ` Richard Biener
  2023-08-10  7:55       ` Hongtao Liu
  2023-08-10 12:05       ` Jan Hubicka
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Biener @ 2023-08-10  7:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Currently we have 3 different independent tunes for gather
> > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > similar for scatter, there're
> > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > >
> > > The patch support 2 standardizing options to enable/disable
> > > vectorization for all gather/scatter instructions. The options is
> > > interpreted by driver to 3 tunes.
> > >
> > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > Ok for trunk?
> >
> > I think -mgather/-mscatter are too close to -mfma suggesting they
> > enable part of an ISA but they won't disable the use of intrinsics
> > or enable gather/scatter on CPUs where the ISA doesn't have them.
> >
> > May I suggest to invent a more generic "short-cut" to
> > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > tunables add ^use_gather_any to cover all cases?  (or
> > change what use_gather controls - it seems we changed its
> > meaning before, and instead add use_gather_8parts and
> > use_gather_16parts)
> >
> > That is, what's the point of this?
>
> https://www.phoronix.com/review/downfall
>
> that caused:
>
> https://www.phoronix.com/review/intel-downfall-benchmarks

Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
to resurrect that behavior and add use_gather_8+parts (or two, IIRC
gather works only on SI/SFmode or larger).

Then -mtune-ctl=^use_gather works which I think is nice enough?

Richard.

> Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:47     ` Richard Biener
@ 2023-08-10  7:55       ` Hongtao Liu
  2023-08-10  8:07         ` Hongtao Liu
  2023-08-10 11:11         ` [PATCH] " Richard Biener
  2023-08-10 12:05       ` Jan Hubicka
  1 sibling, 2 replies; 18+ messages in thread
From: Hongtao Liu @ 2023-08-10  7:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
>
> Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
>
> Then -mtune-ctl=^use_gather works which I think is nice enough?
So basically, -mtune-ctrl=^use_gather is used to turn off all gather
vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
We don't have an extrat explicit flag for target tune, just single bit
- ix86_tune_features[X86_TUNE_USE_GATHER]
>
> Richard.
>
> > Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:55       ` Hongtao Liu
@ 2023-08-10  8:07         ` Hongtao Liu
  2023-08-10  9:16           ` Hongtao Liu
  2023-08-10 11:11         ` [PATCH] " Richard Biener
  1 sibling, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2023-08-10  8:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]
Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  8:07         ` Hongtao Liu
@ 2023-08-10  9:16           ` Hongtao Liu
  2023-08-10 11:12             ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2023-08-10  9:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > >
> > > > > > Currently we have 3 different independent tunes for gather
> > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > similar for scatter, there're
> > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > >
> > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > interpreted by driver to 3 tunes.
> > > > > >
> > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > Ok for trunk?
> > > > >
> > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > >
> > > > > May I suggest to invent a more generic "short-cut" to
> > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > change what use_gather controls - it seems we changed its
> > > > > meaning before, and instead add use_gather_8parts and
> > > > > use_gather_16parts)
> > > > >
> > > > > That is, what's the point of this?
The point of this is to keep consistent between GCC, LLVM, and
ICX(Intel® oneAPI DPC++/C++ Compiler) .
LLVM,ICX will support that option.
> > > >
> > > > https://www.phoronix.com/review/downfall
> > > >
> > > > that caused:
> > > >
> > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > >
> > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > gather works only on SI/SFmode or larger).
> > >
> > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > We don't have an extrat explicit flag for target tune, just single bit
> > - ix86_tune_features[X86_TUNE_USE_GATHER]
> Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > >
> > > Richard.
> > >
> > > > Uros.
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:55       ` Hongtao Liu
  2023-08-10  8:07         ` Hongtao Liu
@ 2023-08-10 11:11         ` Richard Biener
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Biener @ 2023-08-10 11:11 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?

No, -mtune-ctl=use_gather should turn them all on as well.

> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]

GCC 11 just had that single bit for all.  I'm not sure how awkward it is
to have use_gather alias use_gather_2_parts, use_gather_4_parts ...

> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  9:16           ` Hongtao Liu
@ 2023-08-10 11:12             ` Richard Biener
  2023-08-10 13:23               ` Hongtao Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-08-10 11:12 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > >
> > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > similar for scatter, there're
> > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > >
> > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > > interpreted by driver to 3 tunes.
> > > > > > >
> > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > >
> > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > change what use_gather controls - it seems we changed its
> > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > use_gather_16parts)
> > > > > >
> > > > > > That is, what's the point of this?
> The point of this is to keep consistent between GCC, LLVM, and
> ICX(Intel® oneAPI DPC++/C++ Compiler) .
> LLVM,ICX will support that option.

GCC has very many options that are not the same as LLVM or ICX,
I don't see a good reason to special case this one.  As said, it's
a very bad name IMHO.

Richard.

> > > > >
> > > > > https://www.phoronix.com/review/downfall
> > > > >
> > > > > that caused:
> > > > >
> > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > >
> > > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > gather works only on SI/SFmode or larger).
> > > >
> > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > We don't have an extrat explicit flag for target tune, just single bit
> > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > >
> > > > Richard.
> > > >
> > > > > Uros.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10  7:47     ` Richard Biener
  2023-08-10  7:55       ` Hongtao Liu
@ 2023-08-10 12:05       ` Jan Hubicka
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2023-08-10 12:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches

> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
> 
> Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
> 
> Then -mtune-ctl=^use_gather works which I think is nice enough?

-mtune-ctl is really intended for GCC developers.  It is not backward
compatible, fully documented and bad sets of values may trigger ICEs.
If gathers became very slow, I think normal users may want to disable
them and in such situation specialized command line option makes sense
to me.

Honza
> 
> Richard.
> 
> > Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
  2023-08-10 11:12             ` Richard Biener
@ 2023-08-10 13:23               ` Hongtao Liu
  2023-08-11  6:01                 ` [PATCH V2] " liuhongt
  0 siblings, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2023-08-10 13:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka

On Thu, Aug 10, 2023 at 7:13 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > > >
> > > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > > similar for scatter, there're
> > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > > >
> > > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > > > interpreted by driver to 3 tunes.
> > > > > > > >
> > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > > Ok for trunk?
> > > > > > >
> > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > > >
> > > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > > change what use_gather controls - it seems we changed its
> > > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > > use_gather_16parts)
> > > > > > >
> > > > > > > That is, what's the point of this?
> > The point of this is to keep consistent between GCC, LLVM, and
> > ICX(Intel® oneAPI DPC++/C++ Compiler) .
> > LLVM,ICX will support that option.
>
> GCC has very many options that are not the same as LLVM or ICX,
> I don't see a good reason to special case this one.  As said, it's
> a very bad name IMHO.
In general terms, yes.
But this is a new option, shouldn't it be better to be consistent?
And the problem with mfma is mainly that the cpuid is just called fma,
but we don't have a cpuid called gather/scatter, with clear document
that the option is only for auto-vectorization,
-m{no-,}{gather,scattter} looks fine to me.
As Honza mentioned, users need to option to turn on/off gather/scatter
auto vectorization, I don't think they will expect the option is also
valid for intrinsic.
If -mtune-crtl= is not suitable for direct exposure to usersusers,
then the original proposal should be ok?
Developers will manintain the relation between mgather/scatter and
-mtune-crtl=XXX to make it consistent between GCC versions.
>
> Richard.
>
> > > > > >
> > > > > > https://www.phoronix.com/review/downfall
> > > > > >
> > > > > > that caused:
> > > > > >
> > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > > >
> > > > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > > gather works only on SI/SFmode or larger).
> > > > >
> > > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > > We don't have an extrat explicit flag for target tune, just single bit
> > > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Uros.
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions
  2023-08-10 13:23               ` Hongtao Liu
@ 2023-08-11  6:01                 ` liuhongt
  2023-08-14  2:40                   ` Hongtao Liu
  0 siblings, 1 reply; 18+ messages in thread
From: liuhongt @ 2023-08-11  6:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, ubizjak, hubicka

Rename original use_gather to use_gather_8parts, Support
-mtune-ctrl={,^}use_gather to set/clear tune features
use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
as alias of -mtune-ctrl=, use_gather, ^use_gather.

Similar for use_scatter.

How about this version?

gcc/ChangeLog:

	* config/i386/i386-builtins.cc
	(ix86_vectorize_builtin_gather): Adjust for use_gather_8parts.
	* config/i386/i386-options.cc (parse_mtune_ctrl_str):
	Set/Clear tune features use_{gather,scatter}_{2parts, 4parts,
	8parts} for -mtune-crtl={,^}{use_gather,use_scatter}.
	* config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust
	for use_scatter_8parts
	* config/i386/i386.h (TARGET_USE_GATHER): Rename to ..
	(TARGET_USE_GATHER_8PARTS): .. this.
	(TARGET_USE_SCATTER): Rename to ..
	(TARGET_USE_SCATTER_8PARTS): .. this.
	* config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to
	(X86_TUNE_USE_GATHER_8PARTS): .. this.
	(X86_TUNE_USE_SCATTER): Rename to
	(X86_TUNE_USE_SCATTER_8PARTS): .. this.
	* config/i386/i386.opt: Add new options mgather, mscatter.
---
 gcc/config/i386/i386-builtins.cc |  2 +-
 gcc/config/i386/i386-options.cc  | 54 +++++++++++++++++++++++---------
 gcc/config/i386/i386.cc          |  2 +-
 gcc/config/i386/i386.h           |  8 ++---
 gcc/config/i386/i386.opt         |  8 +++++
 gcc/config/i386/x86-tune.def     |  4 +--
 6 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 356b6dfd5fb..8a0b8dfe073 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
 	  ? !TARGET_USE_GATHER_2PARTS
 	  : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u)
 	     ? !TARGET_USE_GATHER_4PARTS
-	     : !TARGET_USE_GATHER)))
+	     : !TARGET_USE_GATHER_8PARTS)))
     return NULL_TREE;
 
   if ((TREE_CODE (index_type) != INTEGER_TYPE
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 127ee24203c..b8d038af69d 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump)
           curr_feature_string++;
           clear = true;
         }
-      for (i = 0; i < X86_TUNE_LAST; i++)
-        {
-          if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
-            {
-              ix86_tune_features[i] = !clear;
-              if (dump)
-                fprintf (stderr, "Explicitly %s feature %s\n",
-                         clear ? "clear" : "set", ix86_tune_feature_names[i]);
-              break;
-            }
-        }
-      if (i == X86_TUNE_LAST)
-	error ("unknown parameter to option %<-mtune-ctrl%>: %s",
-	       clear ? curr_feature_string - 1 : curr_feature_string);
+
+      if (!strcmp (curr_feature_string, "use_gather"))
+	{
+	  ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear;
+	  ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear;
+	  ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear;
+	  if (dump)
+	    fprintf (stderr, "Explicitly %s features use_gather_2parts,"
+		     " use_gather_4parts, use_gather_8parts\n",
+		     clear ? "clear" : "set");
+
+	}
+      else if (!strcmp (curr_feature_string, "use_scatter"))
+	{
+	  ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear;
+	  ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear;
+	  ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear;
+	  if (dump)
+	    fprintf (stderr, "Explicitly %s features use_scatter_2parts,"
+		     " use_scatter_4parts, use_scatter_8parts\n",
+		     clear ? "clear" : "set");
+	}
+      else
+	{
+	  for (i = 0; i < X86_TUNE_LAST; i++)
+	    {
+	      if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
+		{
+		  ix86_tune_features[i] = !clear;
+		  if (dump)
+		    fprintf (stderr, "Explicitly %s feature %s\n",
+			     clear ? "clear" : "set", ix86_tune_feature_names[i]);
+		  break;
+		}
+	    }
+
+	  if (i == X86_TUNE_LAST)
+	    error ("unknown parameter to option %<-mtune-ctrl%>: %s",
+		   clear ? curr_feature_string - 1 : curr_feature_string);
+	}
       curr_feature_string = next_feature_string;
     }
   while (curr_feature_string);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index d592ece700a..cd49fb9e47a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
       ? !TARGET_USE_SCATTER_2PARTS
       : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
 	 ? !TARGET_USE_SCATTER_4PARTS
-	 : !TARGET_USE_SCATTER))
+	 : !TARGET_USE_SCATTER_8PARTS))
     return NULL_TREE;
 
   if ((TREE_CODE (index_type) != INTEGER_TYPE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..f7330e818e7 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS]
 #define TARGET_USE_SCATTER_4PARTS \
 	ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS]
-#define TARGET_USE_GATHER \
-	ix86_tune_features[X86_TUNE_USE_GATHER]
-#define TARGET_USE_SCATTER \
-	ix86_tune_features[X86_TUNE_USE_SCATTER]
+#define TARGET_USE_GATHER_8PARTS \
+	ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS]
+#define TARGET_USE_SCATTER_8PARTS \
+	ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS]
 #define TARGET_FUSE_CMP_AND_BRANCH_32 \
 	ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
 #define TARGET_FUSE_CMP_AND_BRANCH_64 \
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 8a43187f703..78b499304a4 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1302,3 +1302,11 @@ msm4
 Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
 SM4 built-in functions and code generation.
+
+mgather
+Target Alias(mtune-ctrl=, use_gather, ^use_gather)
+Enable vectorization for gather instruction.
+
+mscatter
+Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
+Enable vectorization for scatter instruction.
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 40e04ecddbf..d7f20d3a118 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
 
 /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more
    elements.  */
-DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
+DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
 	  ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE
 	    | m_CORE_ATOM | m_GENERIC))
 
 /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
    elements.  */
-DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter",
+DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
 	  ~(m_ZNVER4))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
-- 
2.31.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions
  2023-08-11  6:01                 ` [PATCH V2] " liuhongt
@ 2023-08-14  2:40                   ` Hongtao Liu
  2023-08-16  7:37                     ` Hongtao Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2023-08-14  2:40 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, richard.guenther, ubizjak, hubicka

On Fri, Aug 11, 2023 at 2:02 PM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Rename original use_gather to use_gather_8parts, Support
> -mtune-ctrl={,^}use_gather to set/clear tune features
> use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
> as alias of -mtune-ctrl=, use_gather, ^use_gather.
>
> Similar for use_scatter.
>
> How about this version?
I'll commit the patch if there's no objections in the next 24 hours.
>
> gcc/ChangeLog:
>
>         * config/i386/i386-builtins.cc
>         (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts.
>         * config/i386/i386-options.cc (parse_mtune_ctrl_str):
>         Set/Clear tune features use_{gather,scatter}_{2parts, 4parts,
>         8parts} for -mtune-crtl={,^}{use_gather,use_scatter}.
>         * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust
>         for use_scatter_8parts
>         * config/i386/i386.h (TARGET_USE_GATHER): Rename to ..
>         (TARGET_USE_GATHER_8PARTS): .. this.
>         (TARGET_USE_SCATTER): Rename to ..
>         (TARGET_USE_SCATTER_8PARTS): .. this.
>         * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to
>         (X86_TUNE_USE_GATHER_8PARTS): .. this.
>         (X86_TUNE_USE_SCATTER): Rename to
>         (X86_TUNE_USE_SCATTER_8PARTS): .. this.
>         * config/i386/i386.opt: Add new options mgather, mscatter.
> ---
>  gcc/config/i386/i386-builtins.cc |  2 +-
>  gcc/config/i386/i386-options.cc  | 54 +++++++++++++++++++++++---------
>  gcc/config/i386/i386.cc          |  2 +-
>  gcc/config/i386/i386.h           |  8 ++---
>  gcc/config/i386/i386.opt         |  8 +++++
>  gcc/config/i386/x86-tune.def     |  4 +--
>  6 files changed, 56 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
> index 356b6dfd5fb..8a0b8dfe073 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
>           ? !TARGET_USE_GATHER_2PARTS
>           : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u)
>              ? !TARGET_USE_GATHER_4PARTS
> -            : !TARGET_USE_GATHER)))
> +            : !TARGET_USE_GATHER_8PARTS)))
>      return NULL_TREE;
>
>    if ((TREE_CODE (index_type) != INTEGER_TYPE
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 127ee24203c..b8d038af69d 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump)
>            curr_feature_string++;
>            clear = true;
>          }
> -      for (i = 0; i < X86_TUNE_LAST; i++)
> -        {
> -          if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
> -            {
> -              ix86_tune_features[i] = !clear;
> -              if (dump)
> -                fprintf (stderr, "Explicitly %s feature %s\n",
> -                         clear ? "clear" : "set", ix86_tune_feature_names[i]);
> -              break;
> -            }
> -        }
> -      if (i == X86_TUNE_LAST)
> -       error ("unknown parameter to option %<-mtune-ctrl%>: %s",
> -              clear ? curr_feature_string - 1 : curr_feature_string);
> +
> +      if (!strcmp (curr_feature_string, "use_gather"))
> +       {
> +         ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear;
> +         ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear;
> +         ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear;
> +         if (dump)
> +           fprintf (stderr, "Explicitly %s features use_gather_2parts,"
> +                    " use_gather_4parts, use_gather_8parts\n",
> +                    clear ? "clear" : "set");
> +
> +       }
> +      else if (!strcmp (curr_feature_string, "use_scatter"))
> +       {
> +         ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear;
> +         ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear;
> +         ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear;
> +         if (dump)
> +           fprintf (stderr, "Explicitly %s features use_scatter_2parts,"
> +                    " use_scatter_4parts, use_scatter_8parts\n",
> +                    clear ? "clear" : "set");
> +       }
> +      else
> +       {
> +         for (i = 0; i < X86_TUNE_LAST; i++)
> +           {
> +             if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
> +               {
> +                 ix86_tune_features[i] = !clear;
> +                 if (dump)
> +                   fprintf (stderr, "Explicitly %s feature %s\n",
> +                            clear ? "clear" : "set", ix86_tune_feature_names[i]);
> +                 break;
> +               }
> +           }
> +
> +         if (i == X86_TUNE_LAST)
> +           error ("unknown parameter to option %<-mtune-ctrl%>: %s",
> +                  clear ? curr_feature_string - 1 : curr_feature_string);
> +       }
>        curr_feature_string = next_feature_string;
>      }
>    while (curr_feature_string);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d592ece700a..cd49fb9e47a 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
>        ? !TARGET_USE_SCATTER_2PARTS
>        : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
>          ? !TARGET_USE_SCATTER_4PARTS
> -        : !TARGET_USE_SCATTER))
> +        : !TARGET_USE_SCATTER_8PARTS))
>      return NULL_TREE;
>
>    if ((TREE_CODE (index_type) != INTEGER_TYPE
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..f7330e818e7 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
>         ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS]
>  #define TARGET_USE_SCATTER_4PARTS \
>         ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS]
> -#define TARGET_USE_GATHER \
> -       ix86_tune_features[X86_TUNE_USE_GATHER]
> -#define TARGET_USE_SCATTER \
> -       ix86_tune_features[X86_TUNE_USE_SCATTER]
> +#define TARGET_USE_GATHER_8PARTS \
> +       ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS]
> +#define TARGET_USE_SCATTER_8PARTS \
> +       ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS]
>  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
>         ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
>  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 8a43187f703..78b499304a4 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1302,3 +1302,11 @@ msm4
>  Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save
>  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
>  SM4 built-in functions and code generation.
> +
> +mgather
> +Target Alias(mtune-ctrl=, use_gather, ^use_gather)
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
> +Enable vectorization for scatter instruction.
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 40e04ecddbf..d7f20d3a118 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
>
>  /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more
>     elements.  */
> -DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
> +DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
>           ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE
>             | m_CORE_ATOM | m_GENERIC))
>
>  /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
>     elements.  */
> -DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter",
> +DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
>           ~(m_ZNVER4))
>
>  /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions
  2023-08-14  2:40                   ` Hongtao Liu
@ 2023-08-16  7:37                     ` Hongtao Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Hongtao Liu @ 2023-08-16  7:37 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, richard.guenther, ubizjak, hubicka

On Mon, Aug 14, 2023 at 10:40 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 11, 2023 at 2:02 PM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Rename original use_gather to use_gather_8parts, Support
> > -mtune-ctrl={,^}use_gather to set/clear tune features
> > use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
> > as alias of -mtune-ctrl=, use_gather, ^use_gather.
> >
> > Similar for use_scatter.
> >
> > How about this version?
> I'll commit the patch if there's no objections in the next 24 hours.
Pushed to trunk and backport to release/gcc-{13,12,11}.
Note for GCC11, The backport patch only supports -m{no,}gather since
the branch doesn't have scatter tunings.
For GCC12/GCC13. both -m{no,}gather/scatter are supported.
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-builtins.cc
> >         (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts.
> >         * config/i386/i386-options.cc (parse_mtune_ctrl_str):
> >         Set/Clear tune features use_{gather,scatter}_{2parts, 4parts,
> >         8parts} for -mtune-crtl={,^}{use_gather,use_scatter}.
> >         * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust
> >         for use_scatter_8parts
> >         * config/i386/i386.h (TARGET_USE_GATHER): Rename to ..
> >         (TARGET_USE_GATHER_8PARTS): .. this.
> >         (TARGET_USE_SCATTER): Rename to ..
> >         (TARGET_USE_SCATTER_8PARTS): .. this.
> >         * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to
> >         (X86_TUNE_USE_GATHER_8PARTS): .. this.
> >         (X86_TUNE_USE_SCATTER): Rename to
> >         (X86_TUNE_USE_SCATTER_8PARTS): .. this.
> >         * config/i386/i386.opt: Add new options mgather, mscatter.
> > ---
> >  gcc/config/i386/i386-builtins.cc |  2 +-
> >  gcc/config/i386/i386-options.cc  | 54 +++++++++++++++++++++++---------
> >  gcc/config/i386/i386.cc          |  2 +-
> >  gcc/config/i386/i386.h           |  8 ++---
> >  gcc/config/i386/i386.opt         |  8 +++++
> >  gcc/config/i386/x86-tune.def     |  4 +--
> >  6 files changed, 56 insertions(+), 22 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
> > index 356b6dfd5fb..8a0b8dfe073 100644
> > --- a/gcc/config/i386/i386-builtins.cc
> > +++ b/gcc/config/i386/i386-builtins.cc
> > @@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
> >           ? !TARGET_USE_GATHER_2PARTS
> >           : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u)
> >              ? !TARGET_USE_GATHER_4PARTS
> > -            : !TARGET_USE_GATHER)))
> > +            : !TARGET_USE_GATHER_8PARTS)))
> >      return NULL_TREE;
> >
> >    if ((TREE_CODE (index_type) != INTEGER_TYPE
> > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> > index 127ee24203c..b8d038af69d 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump)
> >            curr_feature_string++;
> >            clear = true;
> >          }
> > -      for (i = 0; i < X86_TUNE_LAST; i++)
> > -        {
> > -          if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
> > -            {
> > -              ix86_tune_features[i] = !clear;
> > -              if (dump)
> > -                fprintf (stderr, "Explicitly %s feature %s\n",
> > -                         clear ? "clear" : "set", ix86_tune_feature_names[i]);
> > -              break;
> > -            }
> > -        }
> > -      if (i == X86_TUNE_LAST)
> > -       error ("unknown parameter to option %<-mtune-ctrl%>: %s",
> > -              clear ? curr_feature_string - 1 : curr_feature_string);
> > +
> > +      if (!strcmp (curr_feature_string, "use_gather"))
> > +       {
> > +         ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear;
> > +         ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear;
> > +         ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear;
> > +         if (dump)
> > +           fprintf (stderr, "Explicitly %s features use_gather_2parts,"
> > +                    " use_gather_4parts, use_gather_8parts\n",
> > +                    clear ? "clear" : "set");
> > +
> > +       }
> > +      else if (!strcmp (curr_feature_string, "use_scatter"))
> > +       {
> > +         ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear;
> > +         ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear;
> > +         ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear;
> > +         if (dump)
> > +           fprintf (stderr, "Explicitly %s features use_scatter_2parts,"
> > +                    " use_scatter_4parts, use_scatter_8parts\n",
> > +                    clear ? "clear" : "set");
> > +       }
> > +      else
> > +       {
> > +         for (i = 0; i < X86_TUNE_LAST; i++)
> > +           {
> > +             if (!strcmp (curr_feature_string, ix86_tune_feature_names[i]))
> > +               {
> > +                 ix86_tune_features[i] = !clear;
> > +                 if (dump)
> > +                   fprintf (stderr, "Explicitly %s feature %s\n",
> > +                            clear ? "clear" : "set", ix86_tune_feature_names[i]);
> > +                 break;
> > +               }
> > +           }
> > +
> > +         if (i == X86_TUNE_LAST)
> > +           error ("unknown parameter to option %<-mtune-ctrl%>: %s",
> > +                  clear ? curr_feature_string - 1 : curr_feature_string);
> > +       }
> >        curr_feature_string = next_feature_string;
> >      }
> >    while (curr_feature_string);
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index d592ece700a..cd49fb9e47a 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
> >        ? !TARGET_USE_SCATTER_2PARTS
> >        : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
> >          ? !TARGET_USE_SCATTER_4PARTS
> > -        : !TARGET_USE_SCATTER))
> > +        : !TARGET_USE_SCATTER_8PARTS))
> >      return NULL_TREE;
> >
> >    if ((TREE_CODE (index_type) != INTEGER_TYPE
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index ef342fcee9b..f7330e818e7 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
> >         ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS]
> >  #define TARGET_USE_SCATTER_4PARTS \
> >         ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS]
> > -#define TARGET_USE_GATHER \
> > -       ix86_tune_features[X86_TUNE_USE_GATHER]
> > -#define TARGET_USE_SCATTER \
> > -       ix86_tune_features[X86_TUNE_USE_SCATTER]
> > +#define TARGET_USE_GATHER_8PARTS \
> > +       ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS]
> > +#define TARGET_USE_SCATTER_8PARTS \
> > +       ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> >         ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index 8a43187f703..78b499304a4 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1302,3 +1302,11 @@ msm4
> >  Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save
> >  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
> >  SM4 built-in functions and code generation.
> > +
> > +mgather
> > +Target Alias(mtune-ctrl=, use_gather, ^use_gather)
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
> > +Enable vectorization for scatter instruction.
> > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> > index 40e04ecddbf..d7f20d3a118 100644
> > --- a/gcc/config/i386/x86-tune.def
> > +++ b/gcc/config/i386/x86-tune.def
> > @@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
> >
> >  /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more
> >     elements.  */
> > -DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
> > +DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
> >           ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE
> >             | m_CORE_ATOM | m_GENERIC))
> >
> >  /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
> >     elements.  */
> > -DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter",
> > +DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
> >           ~(m_ZNVER4))
> >
> >  /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
> > --
> > 2.31.1
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-08-16  7:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10  1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt
2023-08-10  1:47 ` Xi Ruoyao
2023-08-10  1:52   ` Liu, Hongtao
2023-08-10  6:04 ` Uros Bizjak
2023-08-10  6:12   ` Hongtao Liu
2023-08-10  7:39 ` Richard Biener
2023-08-10  7:42   ` Uros Bizjak
2023-08-10  7:47     ` Richard Biener
2023-08-10  7:55       ` Hongtao Liu
2023-08-10  8:07         ` Hongtao Liu
2023-08-10  9:16           ` Hongtao Liu
2023-08-10 11:12             ` Richard Biener
2023-08-10 13:23               ` Hongtao Liu
2023-08-11  6:01                 ` [PATCH V2] " liuhongt
2023-08-14  2:40                   ` Hongtao Liu
2023-08-16  7:37                     ` Hongtao Liu
2023-08-10 11:11         ` [PATCH] " Richard Biener
2023-08-10 12:05       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).