* [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. @ 2023-08-10 1:11 liuhongt 2023-08-10 1:47 ` Xi Ruoyao ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: liuhongt @ 2023-08-10 1:11 UTC (permalink / raw) To: gcc-patches; +Cc: richard.guenther, ubizjak, hubicka Currently we have 3 different independent tunes for gather "use_gather,use_gather_2parts,use_gather_4parts", similar for scatter, there're "use_scatter,use_scatter_2parts,use_scatter_4parts" The patch support 2 standardizing options to enable/disable vectorization for all gather/scatter instructions. The options is interpreted by driver to 3 tunes. bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.h (DRIVER_SELF_SPECS): Add GATHER_SCATTER_DRIVER_SELF_SPECS. (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. * config/i386/i386.opt (mgather): New option. (mscatter): Ditto. --- gcc/config/i386/i386.h | 12 +++++++++++- gcc/config/i386/i386.opt | 8 ++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ef342fcee9b..d9ac2c29bde 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; # define SUBTARGET_DRIVER_SELF_SPECS "" #endif -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ + "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ + %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" +#endif + +#define DRIVER_SELF_SPECS \ + SUBTARGET_DRIVER_SELF_SPECS " " \ + GATHER_SCATTER_DRIVER_SELF_SPECS /* -march=native handling only makes sense with compiler running on an x86 or x86_64 chip. If changing this condition, also change diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index ddb7f110aa2..99948644a8d 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -424,6 +424,14 @@ mdaz-ftz Target Set the FTZ and DAZ Flags. +mgather +Target +Enable vectorization for gather instruction. + +mscatter +Target +Enable vectorization for scatter instruction. + mpreferred-stack-boundary= Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) Attempt to keep stack aligned to this power of 2. -- 2.31.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt @ 2023-08-10 1:47 ` Xi Ruoyao 2023-08-10 1:52 ` Liu, Hongtao 2023-08-10 6:04 ` Uros Bizjak 2023-08-10 7:39 ` Richard Biener 2 siblings, 1 reply; 18+ messages in thread From: Xi Ruoyao @ 2023-08-10 1:47 UTC (permalink / raw) To: liuhongt, gcc-patches; +Cc: richard.guenther, ubizjak, hubicka On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote: > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? And should we set -mno-gather as the default for GDS affected processors? We'll likely apply the ucode update for them, and then the gathering instructions will be much slower. > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++++++++++- > gcc/config/i386/i386.opt | 8 ++++++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + "%{mno-gather:-mtune- > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune- > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + %{mno-scatter:-mtune- > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > + %{mscatter:-mtune- > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger > Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 1:47 ` Xi Ruoyao @ 2023-08-10 1:52 ` Liu, Hongtao 0 siblings, 0 replies; 18+ messages in thread From: Liu, Hongtao @ 2023-08-10 1:52 UTC (permalink / raw) To: Xi Ruoyao, gcc-patches; +Cc: richard.guenther, ubizjak, hubicka > -----Original Message----- > From: Xi Ruoyao <xry111@xry111.site> > Sent: Thursday, August 10, 2023 9:48 AM > To: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org > Cc: richard.guenther@gmail.com; ubizjak@gmail.com; hubicka@ucw.cz > Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable > vectorization for all gather/scatter instructions. > > On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote: > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > And should we set -mno-gather as the default for GDS affected processors? > We'll likely apply the ucode update for them, and then the gathering > instructions will be much slower. Assume you're talking about https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html Yes, there will be an separate patch for microarchitecture tuning. > > > gcc/ChangeLog: > > > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > > GATHER_SCATTER_DRIVER_SELF_SPECS. > > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > > * config/i386/i386.opt (mgather): New option. > > (mscatter): Ditto. > > --- > > gcc/config/i386/i386.h | 12 +++++++++++- > > gcc/config/i386/i386.opt | 8 ++++++++ > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index > > ef342fcee9b..d9ac2c29bde 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > > # define SUBTARGET_DRIVER_SELF_SPECS "" > > #endif > > > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define > > +GATHER_SCATTER_DRIVER_SELF_SPECS \ > > + "%{mno-gather:-mtune- > > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > > + %{mgather:-mtune- > > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > > + %{mno-scatter:-mtune- > > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > > + %{mscatter:-mtune- > > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > > +#endif > > + > > +#define DRIVER_SELF_SPECS \ > > + SUBTARGET_DRIVER_SELF_SPECS " " \ > > + GATHER_SCATTER_DRIVER_SELF_SPECS > > > > /* -march=native handling only makes sense with compiler running on > > an x86 or x86_64 chip. If changing this condition, also change > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index > > ddb7f110aa2..99948644a8d 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -424,6 +424,14 @@ mdaz-ftz > > Target > > Set the FTZ and DAZ Flags. > > > > +mgather > > +Target > > +Enable vectorization for gather instruction. > > + > > +mscatter > > +Target > > +Enable vectorization for scatter instruction. > > + > > mpreferred-stack-boundary= > > Target RejectNegative Joined UInteger > > Var(ix86_preferred_stack_boundary_arg) > > Attempt to keep stack aligned to this power of 2. > > -- > Xi Ruoyao <xry111@xry111.site> > School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt 2023-08-10 1:47 ` Xi Ruoyao @ 2023-08-10 6:04 ` Uros Bizjak 2023-08-10 6:12 ` Hongtao Liu 2023-08-10 7:39 ` Richard Biener 2 siblings, 1 reply; 18+ messages in thread From: Uros Bizjak @ 2023-08-10 6:04 UTC (permalink / raw) To: liuhongt; +Cc: gcc-patches, richard.guenther, hubicka On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++++++++++- > gcc/config/i386/i386.opt | 8 ++++++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. Are gather and scatter instructions affected in a separate way, or should we use one -mgather-scatter option to cover all gather/scatter tunings? Uros. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 6:04 ` Uros Bizjak @ 2023-08-10 6:12 ` Hongtao Liu 0 siblings, 0 replies; 18+ messages in thread From: Hongtao Liu @ 2023-08-10 6:12 UTC (permalink / raw) To: Uros Bizjak; +Cc: liuhongt, gcc-patches, richard.guenther, hubicka On Thu, Aug 10, 2023 at 2:04 PM Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > > GATHER_SCATTER_DRIVER_SELF_SPECS. > > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > > * config/i386/i386.opt (mgather): New option. > > (mscatter): Ditto. > > --- > > gcc/config/i386/i386.h | 12 +++++++++++- > > gcc/config/i386/i386.opt | 8 ++++++++ > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index ef342fcee9b..d9ac2c29bde 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > > # define SUBTARGET_DRIVER_SELF_SPECS "" > > #endif > > > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > > + "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > > + %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > > + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > > +#endif > > + > > +#define DRIVER_SELF_SPECS \ > > + SUBTARGET_DRIVER_SELF_SPECS " " \ > > + GATHER_SCATTER_DRIVER_SELF_SPECS > > > > /* -march=native handling only makes sense with compiler running on > > an x86 or x86_64 chip. If changing this condition, also change > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > index ddb7f110aa2..99948644a8d 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -424,6 +424,14 @@ mdaz-ftz > > Target > > Set the FTZ and DAZ Flags. > > > > +mgather > > +Target > > +Enable vectorization for gather instruction. > > + > > +mscatter > > +Target > > +Enable vectorization for scatter instruction. > > Are gather and scatter instructions affected in a separate way, or > should we use one -mgather-scatter option to cover all gather/scatter > tunings? A separate way. Gather Data Sampling is only for gather. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html > > Uros. > > > + > > mpreferred-stack-boundary= > > Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) > > Attempt to keep stack aligned to this power of 2. > > -- > > 2.31.1 > > -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt 2023-08-10 1:47 ` Xi Ruoyao 2023-08-10 6:04 ` Uros Bizjak @ 2023-08-10 7:39 ` Richard Biener 2023-08-10 7:42 ` Uros Bizjak 2 siblings, 1 reply; 18+ messages in thread From: Richard Biener @ 2023-08-10 7:39 UTC (permalink / raw) To: liuhongt; +Cc: gcc-patches, ubizjak, hubicka On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? I think -mgather/-mscatter are too close to -mfma suggesting they enable part of an ISA but they won't disable the use of intrinsics or enable gather/scatter on CPUs where the ISA doesn't have them. May I suggest to invent a more generic "short-cut" to -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter tunables add ^use_gather_any to cover all cases? (or change what use_gather controls - it seems we changed its meaning before, and instead add use_gather_8parts and use_gather_16parts) That is, what's the point of this? Richard. > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++++++++++- > gcc/config/i386/i386.opt | 8 ++++++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:39 ` Richard Biener @ 2023-08-10 7:42 ` Uros Bizjak 2023-08-10 7:47 ` Richard Biener 0 siblings, 1 reply; 18+ messages in thread From: Uros Bizjak @ 2023-08-10 7:42 UTC (permalink / raw) To: Richard Biener; +Cc: liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 9:40 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > I think -mgather/-mscatter are too close to -mfma suggesting they > enable part of an ISA but they won't disable the use of intrinsics > or enable gather/scatter on CPUs where the ISA doesn't have them. > > May I suggest to invent a more generic "short-cut" to > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > tunables add ^use_gather_any to cover all cases? (or > change what use_gather controls - it seems we changed its > meaning before, and instead add use_gather_8parts and > use_gather_16parts) > > That is, what's the point of this? https://www.phoronix.com/review/downfall that caused: https://www.phoronix.com/review/intel-downfall-benchmarks Uros. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:42 ` Uros Bizjak @ 2023-08-10 7:47 ` Richard Biener 2023-08-10 7:55 ` Hongtao Liu 2023-08-10 12:05 ` Jan Hubicka 0 siblings, 2 replies; 18+ messages in thread From: Richard Biener @ 2023-08-10 7:47 UTC (permalink / raw) To: Uros Bizjak; +Cc: liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > Currently we have 3 different independent tunes for gather > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > similar for scatter, there're > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > The patch support 2 standardizing options to enable/disable > > > vectorization for all gather/scatter instructions. The options is > > > interpreted by driver to 3 tunes. > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > Ok for trunk? > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > enable part of an ISA but they won't disable the use of intrinsics > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > May I suggest to invent a more generic "short-cut" to > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > tunables add ^use_gather_any to cover all cases? (or > > change what use_gather controls - it seems we changed its > > meaning before, and instead add use_gather_8parts and > > use_gather_16parts) > > > > That is, what's the point of this? > > https://www.phoronix.com/review/downfall > > that caused: > > https://www.phoronix.com/review/intel-downfall-benchmarks Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. GCC 11 had only 'use_gather', covering all number of lanes. I suggest to resurrect that behavior and add use_gather_8+parts (or two, IIRC gather works only on SI/SFmode or larger). Then -mtune-ctl=^use_gather works which I think is nice enough? Richard. > Uros. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:47 ` Richard Biener @ 2023-08-10 7:55 ` Hongtao Liu 2023-08-10 8:07 ` Hongtao Liu 2023-08-10 11:11 ` [PATCH] " Richard Biener 2023-08-10 12:05 ` Jan Hubicka 1 sibling, 2 replies; 18+ messages in thread From: Hongtao Liu @ 2023-08-10 7:55 UTC (permalink / raw) To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > Currently we have 3 different independent tunes for gather > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > similar for scatter, there're > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > vectorization for all gather/scatter instructions. The options is > > > > interpreted by driver to 3 tunes. > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > Ok for trunk? > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > enable part of an ISA but they won't disable the use of intrinsics > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > May I suggest to invent a more generic "short-cut" to > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > tunables add ^use_gather_any to cover all cases? (or > > > change what use_gather controls - it seems we changed its > > > meaning before, and instead add use_gather_8parts and > > > use_gather_16parts) > > > > > > That is, what's the point of this? > > > > https://www.phoronix.com/review/downfall > > > > that caused: > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > gather works only on SI/SFmode or larger). > > Then -mtune-ctl=^use_gather works which I think is nice enough? So basically, -mtune-ctrl=^use_gather is used to turn off all gather vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? We don't have an extrat explicit flag for target tune, just single bit - ix86_tune_features[X86_TUNE_USE_GATHER] > > Richard. > > > Uros. -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:55 ` Hongtao Liu @ 2023-08-10 8:07 ` Hongtao Liu 2023-08-10 9:16 ` Hongtao Liu 2023-08-10 11:11 ` [PATCH] " Richard Biener 1 sibling, 1 reply; 18+ messages in thread From: Hongtao Liu @ 2023-08-10 8:07 UTC (permalink / raw) To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > <richard.guenther@gmail.com> wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > similar for scatter, there're > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > vectorization for all gather/scatter instructions. The options is > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > Ok for trunk? > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > tunables add ^use_gather_any to cover all cases? (or > > > > change what use_gather controls - it seems we changed its > > > > meaning before, and instead add use_gather_8parts and > > > > use_gather_16parts) > > > > > > > > That is, what's the point of this? > > > > > > https://www.phoronix.com/review/downfall > > > > > > that caused: > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > gather works only on SI/SFmode or larger). > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > We don't have an extrat explicit flag for target tune, just single bit > - ix86_tune_features[X86_TUNE_USE_GATHER] Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > Richard. > > > > > Uros. > > > > -- > BR, > Hongtao -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 8:07 ` Hongtao Liu @ 2023-08-10 9:16 ` Hongtao Liu 2023-08-10 11:12 ` Richard Biener 0 siblings, 1 reply; 18+ messages in thread From: Hongtao Liu @ 2023-08-10 9:16 UTC (permalink / raw) To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > similar for scatter, there're > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > vectorization for all gather/scatter instructions. The options is > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > Ok for trunk? > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > change what use_gather controls - it seems we changed its > > > > > meaning before, and instead add use_gather_8parts and > > > > > use_gather_16parts) > > > > > > > > > > That is, what's the point of this? The point of this is to keep consistent between GCC, LLVM, and ICX(Intel® oneAPI DPC++/C++ Compiler) . LLVM,ICX will support that option. > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > that caused: > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > gather works only on SI/SFmode or larger). > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > We don't have an extrat explicit flag for target tune, just single bit > > - ix86_tune_features[X86_TUNE_USE_GATHER] > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > Richard. > > > > > > > Uros. > > > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 9:16 ` Hongtao Liu @ 2023-08-10 11:12 ` Richard Biener 2023-08-10 13:23 ` Hongtao Liu 0 siblings, 1 reply; 18+ messages in thread From: Richard Biener @ 2023-08-10 11:12 UTC (permalink / raw) To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > > <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > > similar for scatter, there're > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > > vectorization for all gather/scatter instructions. The options is > > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > > Ok for trunk? > > > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > > change what use_gather controls - it seems we changed its > > > > > > meaning before, and instead add use_gather_8parts and > > > > > > use_gather_16parts) > > > > > > > > > > > > That is, what's the point of this? > The point of this is to keep consistent between GCC, LLVM, and > ICX(Intel® oneAPI DPC++/C++ Compiler) . > LLVM,ICX will support that option. GCC has very many options that are not the same as LLVM or ICX, I don't see a good reason to special case this one. As said, it's a very bad name IMHO. Richard. > > > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > > > that caused: > > > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > > gather works only on SI/SFmode or larger). > > > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > > We don't have an extrat explicit flag for target tune, just single bit > > > - ix86_tune_features[X86_TUNE_USE_GATHER] > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > > > Richard. > > > > > > > > > Uros. > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 11:12 ` Richard Biener @ 2023-08-10 13:23 ` Hongtao Liu 2023-08-11 6:01 ` [PATCH V2] " liuhongt 0 siblings, 1 reply; 18+ messages in thread From: Hongtao Liu @ 2023-08-10 13:23 UTC (permalink / raw) To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 7:13 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > > > <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > > > similar for scatter, there're > > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > > > vectorization for all gather/scatter instructions. The options is > > > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > > > Ok for trunk? > > > > > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > > > change what use_gather controls - it seems we changed its > > > > > > > meaning before, and instead add use_gather_8parts and > > > > > > > use_gather_16parts) > > > > > > > > > > > > > > That is, what's the point of this? > > The point of this is to keep consistent between GCC, LLVM, and > > ICX(Intel® oneAPI DPC++/C++ Compiler) . > > LLVM,ICX will support that option. > > GCC has very many options that are not the same as LLVM or ICX, > I don't see a good reason to special case this one. As said, it's > a very bad name IMHO. In general terms, yes. But this is a new option, shouldn't it be better to be consistent? And the problem with mfma is mainly that the cpuid is just called fma, but we don't have a cpuid called gather/scatter, with clear document that the option is only for auto-vectorization, -m{no-,}{gather,scattter} looks fine to me. As Honza mentioned, users need to option to turn on/off gather/scatter auto vectorization, I don't think they will expect the option is also valid for intrinsic. If -mtune-crtl= is not suitable for direct exposure to usersusers, then the original proposal should be ok? Developers will manintain the relation between mgather/scatter and -mtune-crtl=XXX to make it consistent between GCC versions. > > Richard. > > > > > > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > > > > > that caused: > > > > > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > > > > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > > > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > > > gather works only on SI/SFmode or larger). > > > > > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > > > We don't have an extrat explicit flag for target tune, just single bit > > > > - ix86_tune_features[X86_TUNE_USE_GATHER] > > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > > > > > Richard. > > > > > > > > > > > Uros. > > > > > > > > > > > > > > > > -- > > > > BR, > > > > Hongtao > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions 2023-08-10 13:23 ` Hongtao Liu @ 2023-08-11 6:01 ` liuhongt 2023-08-14 2:40 ` Hongtao Liu 0 siblings, 1 reply; 18+ messages in thread From: liuhongt @ 2023-08-11 6:01 UTC (permalink / raw) To: gcc-patches; +Cc: richard.guenther, ubizjak, hubicka Rename original use_gather to use_gather_8parts, Support -mtune-ctrl={,^}use_gather to set/clear tune features use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather as alias of -mtune-ctrl=, use_gather, ^use_gather. Similar for use_scatter. How about this version? gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts. * config/i386/i386-options.cc (parse_mtune_ctrl_str): Set/Clear tune features use_{gather,scatter}_{2parts, 4parts, 8parts} for -mtune-crtl={,^}{use_gather,use_scatter}. * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust for use_scatter_8parts * config/i386/i386.h (TARGET_USE_GATHER): Rename to .. (TARGET_USE_GATHER_8PARTS): .. this. (TARGET_USE_SCATTER): Rename to .. (TARGET_USE_SCATTER_8PARTS): .. this. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to (X86_TUNE_USE_GATHER_8PARTS): .. this. (X86_TUNE_USE_SCATTER): Rename to (X86_TUNE_USE_SCATTER_8PARTS): .. this. * config/i386/i386.opt: Add new options mgather, mscatter. --- gcc/config/i386/i386-builtins.cc | 2 +- gcc/config/i386/i386-options.cc | 54 +++++++++++++++++++++++--------- gcc/config/i386/i386.cc | 2 +- gcc/config/i386/i386.h | 8 ++--- gcc/config/i386/i386.opt | 8 +++++ gcc/config/i386/x86-tune.def | 4 +-- 6 files changed, 56 insertions(+), 22 deletions(-) diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index 356b6dfd5fb..8a0b8dfe073 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, ? !TARGET_USE_GATHER_2PARTS : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u) ? !TARGET_USE_GATHER_4PARTS - : !TARGET_USE_GATHER))) + : !TARGET_USE_GATHER_8PARTS))) return NULL_TREE; if ((TREE_CODE (index_type) != INTEGER_TYPE diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 127ee24203c..b8d038af69d 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump) curr_feature_string++; clear = true; } - for (i = 0; i < X86_TUNE_LAST; i++) - { - if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) - { - ix86_tune_features[i] = !clear; - if (dump) - fprintf (stderr, "Explicitly %s feature %s\n", - clear ? "clear" : "set", ix86_tune_feature_names[i]); - break; - } - } - if (i == X86_TUNE_LAST) - error ("unknown parameter to option %<-mtune-ctrl%>: %s", - clear ? curr_feature_string - 1 : curr_feature_string); + + if (!strcmp (curr_feature_string, "use_gather")) + { + ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear; + ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear; + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear; + if (dump) + fprintf (stderr, "Explicitly %s features use_gather_2parts," + " use_gather_4parts, use_gather_8parts\n", + clear ? "clear" : "set"); + + } + else if (!strcmp (curr_feature_string, "use_scatter")) + { + ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear; + ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear; + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear; + if (dump) + fprintf (stderr, "Explicitly %s features use_scatter_2parts," + " use_scatter_4parts, use_scatter_8parts\n", + clear ? "clear" : "set"); + } + else + { + for (i = 0; i < X86_TUNE_LAST; i++) + { + if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) + { + ix86_tune_features[i] = !clear; + if (dump) + fprintf (stderr, "Explicitly %s feature %s\n", + clear ? "clear" : "set", ix86_tune_feature_names[i]); + break; + } + } + + if (i == X86_TUNE_LAST) + error ("unknown parameter to option %<-mtune-ctrl%>: %s", + clear ? curr_feature_string - 1 : curr_feature_string); + } curr_feature_string = next_feature_string; } while (curr_feature_string); diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index d592ece700a..cd49fb9e47a 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype, ? !TARGET_USE_SCATTER_2PARTS : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) ? !TARGET_USE_SCATTER_4PARTS - : !TARGET_USE_SCATTER)) + : !TARGET_USE_SCATTER_8PARTS)) return NULL_TREE; if ((TREE_CODE (index_type) != INTEGER_TYPE diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ef342fcee9b..f7330e818e7 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] #define TARGET_USE_SCATTER_4PARTS \ ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] -#define TARGET_USE_GATHER \ - ix86_tune_features[X86_TUNE_USE_GATHER] -#define TARGET_USE_SCATTER \ - ix86_tune_features[X86_TUNE_USE_SCATTER] +#define TARGET_USE_GATHER_8PARTS \ + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] +#define TARGET_USE_SCATTER_8PARTS \ + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] #define TARGET_FUSE_CMP_AND_BRANCH_32 \ ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32] #define TARGET_FUSE_CMP_AND_BRANCH_64 \ diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 8a43187f703..78b499304a4 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1302,3 +1302,11 @@ msm4 Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and SM4 built-in functions and code generation. + +mgather +Target Alias(mtune-ctrl=, use_gather, ^use_gather) +Enable vectorization for gather instruction. + +mscatter +Target Alias(mtune-ctrl=, use_scatter, ^use_scatter) +Enable vectorization for scatter instruction. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 40e04ecddbf..d7f20d3a118 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts", /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more elements. */ -DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather", +DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts", ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE | m_CORE_ATOM | m_GENERIC)) /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more elements. */ -DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter", +DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts", ~(m_ZNVER4)) /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or -- 2.31.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions 2023-08-11 6:01 ` [PATCH V2] " liuhongt @ 2023-08-14 2:40 ` Hongtao Liu 2023-08-16 7:37 ` Hongtao Liu 0 siblings, 1 reply; 18+ messages in thread From: Hongtao Liu @ 2023-08-14 2:40 UTC (permalink / raw) To: liuhongt; +Cc: gcc-patches, richard.guenther, ubizjak, hubicka On Fri, Aug 11, 2023 at 2:02 PM liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Rename original use_gather to use_gather_8parts, Support > -mtune-ctrl={,^}use_gather to set/clear tune features > use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather > as alias of -mtune-ctrl=, use_gather, ^use_gather. > > Similar for use_scatter. > > How about this version? I'll commit the patch if there's no objections in the next 24 hours. > > gcc/ChangeLog: > > * config/i386/i386-builtins.cc > (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts. > * config/i386/i386-options.cc (parse_mtune_ctrl_str): > Set/Clear tune features use_{gather,scatter}_{2parts, 4parts, > 8parts} for -mtune-crtl={,^}{use_gather,use_scatter}. > * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust > for use_scatter_8parts > * config/i386/i386.h (TARGET_USE_GATHER): Rename to .. > (TARGET_USE_GATHER_8PARTS): .. this. > (TARGET_USE_SCATTER): Rename to .. > (TARGET_USE_SCATTER_8PARTS): .. this. > * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to > (X86_TUNE_USE_GATHER_8PARTS): .. this. > (X86_TUNE_USE_SCATTER): Rename to > (X86_TUNE_USE_SCATTER_8PARTS): .. this. > * config/i386/i386.opt: Add new options mgather, mscatter. > --- > gcc/config/i386/i386-builtins.cc | 2 +- > gcc/config/i386/i386-options.cc | 54 +++++++++++++++++++++++--------- > gcc/config/i386/i386.cc | 2 +- > gcc/config/i386/i386.h | 8 ++--- > gcc/config/i386/i386.opt | 8 +++++ > gcc/config/i386/x86-tune.def | 4 +-- > 6 files changed, 56 insertions(+), 22 deletions(-) > > diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc > index 356b6dfd5fb..8a0b8dfe073 100644 > --- a/gcc/config/i386/i386-builtins.cc > +++ b/gcc/config/i386/i386-builtins.cc > @@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, > ? !TARGET_USE_GATHER_2PARTS > : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u) > ? !TARGET_USE_GATHER_4PARTS > - : !TARGET_USE_GATHER))) > + : !TARGET_USE_GATHER_8PARTS))) > return NULL_TREE; > > if ((TREE_CODE (index_type) != INTEGER_TYPE > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc > index 127ee24203c..b8d038af69d 100644 > --- a/gcc/config/i386/i386-options.cc > +++ b/gcc/config/i386/i386-options.cc > @@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump) > curr_feature_string++; > clear = true; > } > - for (i = 0; i < X86_TUNE_LAST; i++) > - { > - if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) > - { > - ix86_tune_features[i] = !clear; > - if (dump) > - fprintf (stderr, "Explicitly %s feature %s\n", > - clear ? "clear" : "set", ix86_tune_feature_names[i]); > - break; > - } > - } > - if (i == X86_TUNE_LAST) > - error ("unknown parameter to option %<-mtune-ctrl%>: %s", > - clear ? curr_feature_string - 1 : curr_feature_string); > + > + if (!strcmp (curr_feature_string, "use_gather")) > + { > + ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear; > + ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear; > + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear; > + if (dump) > + fprintf (stderr, "Explicitly %s features use_gather_2parts," > + " use_gather_4parts, use_gather_8parts\n", > + clear ? "clear" : "set"); > + > + } > + else if (!strcmp (curr_feature_string, "use_scatter")) > + { > + ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear; > + ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear; > + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear; > + if (dump) > + fprintf (stderr, "Explicitly %s features use_scatter_2parts," > + " use_scatter_4parts, use_scatter_8parts\n", > + clear ? "clear" : "set"); > + } > + else > + { > + for (i = 0; i < X86_TUNE_LAST; i++) > + { > + if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) > + { > + ix86_tune_features[i] = !clear; > + if (dump) > + fprintf (stderr, "Explicitly %s feature %s\n", > + clear ? "clear" : "set", ix86_tune_feature_names[i]); > + break; > + } > + } > + > + if (i == X86_TUNE_LAST) > + error ("unknown parameter to option %<-mtune-ctrl%>: %s", > + clear ? curr_feature_string - 1 : curr_feature_string); > + } > curr_feature_string = next_feature_string; > } > while (curr_feature_string); > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index d592ece700a..cd49fb9e47a 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype, > ? !TARGET_USE_SCATTER_2PARTS > : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) > ? !TARGET_USE_SCATTER_4PARTS > - : !TARGET_USE_SCATTER)) > + : !TARGET_USE_SCATTER_8PARTS)) > return NULL_TREE; > > if ((TREE_CODE (index_type) != INTEGER_TYPE > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..f7330e818e7 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; > ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] > #define TARGET_USE_SCATTER_4PARTS \ > ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] > -#define TARGET_USE_GATHER \ > - ix86_tune_features[X86_TUNE_USE_GATHER] > -#define TARGET_USE_SCATTER \ > - ix86_tune_features[X86_TUNE_USE_SCATTER] > +#define TARGET_USE_GATHER_8PARTS \ > + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] > +#define TARGET_USE_SCATTER_8PARTS \ > + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] > #define TARGET_FUSE_CMP_AND_BRANCH_32 \ > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32] > #define TARGET_FUSE_CMP_AND_BRANCH_64 \ > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index 8a43187f703..78b499304a4 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -1302,3 +1302,11 @@ msm4 > Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and > SM4 built-in functions and code generation. > + > +mgather > +Target Alias(mtune-ctrl=, use_gather, ^use_gather) > +Enable vectorization for gather instruction. > + > +mscatter > +Target Alias(mtune-ctrl=, use_scatter, ^use_scatter) > +Enable vectorization for scatter instruction. > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 40e04ecddbf..d7f20d3a118 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts", > > /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more > elements. */ > -DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather", > +DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts", > ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE > | m_CORE_ATOM | m_GENERIC)) > > /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more > elements. */ > -DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter", > +DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts", > ~(m_ZNVER4)) > > /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or > -- > 2.31.1 > -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V2] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions 2023-08-14 2:40 ` Hongtao Liu @ 2023-08-16 7:37 ` Hongtao Liu 0 siblings, 0 replies; 18+ messages in thread From: Hongtao Liu @ 2023-08-16 7:37 UTC (permalink / raw) To: liuhongt; +Cc: gcc-patches, richard.guenther, ubizjak, hubicka On Mon, Aug 14, 2023 at 10:40 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Aug 11, 2023 at 2:02 PM liuhongt via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Rename original use_gather to use_gather_8parts, Support > > -mtune-ctrl={,^}use_gather to set/clear tune features > > use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather > > as alias of -mtune-ctrl=, use_gather, ^use_gather. > > > > Similar for use_scatter. > > > > How about this version? > I'll commit the patch if there's no objections in the next 24 hours. Pushed to trunk and backport to release/gcc-{13,12,11}. Note for GCC11, The backport patch only supports -m{no,}gather since the branch doesn't have scatter tunings. For GCC12/GCC13. both -m{no,}gather/scatter are supported. > > > > gcc/ChangeLog: > > > > * config/i386/i386-builtins.cc > > (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts. > > * config/i386/i386-options.cc (parse_mtune_ctrl_str): > > Set/Clear tune features use_{gather,scatter}_{2parts, 4parts, > > 8parts} for -mtune-crtl={,^}{use_gather,use_scatter}. > > * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust > > for use_scatter_8parts > > * config/i386/i386.h (TARGET_USE_GATHER): Rename to .. > > (TARGET_USE_GATHER_8PARTS): .. this. > > (TARGET_USE_SCATTER): Rename to .. > > (TARGET_USE_SCATTER_8PARTS): .. this. > > * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to > > (X86_TUNE_USE_GATHER_8PARTS): .. this. > > (X86_TUNE_USE_SCATTER): Rename to > > (X86_TUNE_USE_SCATTER_8PARTS): .. this. > > * config/i386/i386.opt: Add new options mgather, mscatter. > > --- > > gcc/config/i386/i386-builtins.cc | 2 +- > > gcc/config/i386/i386-options.cc | 54 +++++++++++++++++++++++--------- > > gcc/config/i386/i386.cc | 2 +- > > gcc/config/i386/i386.h | 8 ++--- > > gcc/config/i386/i386.opt | 8 +++++ > > gcc/config/i386/x86-tune.def | 4 +-- > > 6 files changed, 56 insertions(+), 22 deletions(-) > > > > diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc > > index 356b6dfd5fb..8a0b8dfe073 100644 > > --- a/gcc/config/i386/i386-builtins.cc > > +++ b/gcc/config/i386/i386-builtins.cc > > @@ -1657,7 +1657,7 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, > > ? !TARGET_USE_GATHER_2PARTS > > : (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 4u) > > ? !TARGET_USE_GATHER_4PARTS > > - : !TARGET_USE_GATHER))) > > + : !TARGET_USE_GATHER_8PARTS))) > > return NULL_TREE; > > > > if ((TREE_CODE (index_type) != INTEGER_TYPE > > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc > > index 127ee24203c..b8d038af69d 100644 > > --- a/gcc/config/i386/i386-options.cc > > +++ b/gcc/config/i386/i386-options.cc > > @@ -1731,20 +1731,46 @@ parse_mtune_ctrl_str (struct gcc_options *opts, bool dump) > > curr_feature_string++; > > clear = true; > > } > > - for (i = 0; i < X86_TUNE_LAST; i++) > > - { > > - if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) > > - { > > - ix86_tune_features[i] = !clear; > > - if (dump) > > - fprintf (stderr, "Explicitly %s feature %s\n", > > - clear ? "clear" : "set", ix86_tune_feature_names[i]); > > - break; > > - } > > - } > > - if (i == X86_TUNE_LAST) > > - error ("unknown parameter to option %<-mtune-ctrl%>: %s", > > - clear ? curr_feature_string - 1 : curr_feature_string); > > + > > + if (!strcmp (curr_feature_string, "use_gather")) > > + { > > + ix86_tune_features[X86_TUNE_USE_GATHER_2PARTS] = !clear; > > + ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] = !clear; > > + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] = !clear; > > + if (dump) > > + fprintf (stderr, "Explicitly %s features use_gather_2parts," > > + " use_gather_4parts, use_gather_8parts\n", > > + clear ? "clear" : "set"); > > + > > + } > > + else if (!strcmp (curr_feature_string, "use_scatter")) > > + { > > + ix86_tune_features[X86_TUNE_USE_SCATTER_2PARTS] = !clear; > > + ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] = !clear; > > + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] = !clear; > > + if (dump) > > + fprintf (stderr, "Explicitly %s features use_scatter_2parts," > > + " use_scatter_4parts, use_scatter_8parts\n", > > + clear ? "clear" : "set"); > > + } > > + else > > + { > > + for (i = 0; i < X86_TUNE_LAST; i++) > > + { > > + if (!strcmp (curr_feature_string, ix86_tune_feature_names[i])) > > + { > > + ix86_tune_features[i] = !clear; > > + if (dump) > > + fprintf (stderr, "Explicitly %s feature %s\n", > > + clear ? "clear" : "set", ix86_tune_feature_names[i]); > > + break; > > + } > > + } > > + > > + if (i == X86_TUNE_LAST) > > + error ("unknown parameter to option %<-mtune-ctrl%>: %s", > > + clear ? curr_feature_string - 1 : curr_feature_string); > > + } > > curr_feature_string = next_feature_string; > > } > > while (curr_feature_string); > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index d592ece700a..cd49fb9e47a 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -19193,7 +19193,7 @@ ix86_vectorize_builtin_scatter (const_tree vectype, > > ? !TARGET_USE_SCATTER_2PARTS > > : (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) > > ? !TARGET_USE_SCATTER_4PARTS > > - : !TARGET_USE_SCATTER)) > > + : !TARGET_USE_SCATTER_8PARTS)) > > return NULL_TREE; > > > > if ((TREE_CODE (index_type) != INTEGER_TYPE > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index ef342fcee9b..f7330e818e7 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -403,10 +403,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; > > ix86_tune_features[X86_TUNE_USE_GATHER_4PARTS] > > #define TARGET_USE_SCATTER_4PARTS \ > > ix86_tune_features[X86_TUNE_USE_SCATTER_4PARTS] > > -#define TARGET_USE_GATHER \ > > - ix86_tune_features[X86_TUNE_USE_GATHER] > > -#define TARGET_USE_SCATTER \ > > - ix86_tune_features[X86_TUNE_USE_SCATTER] > > +#define TARGET_USE_GATHER_8PARTS \ > > + ix86_tune_features[X86_TUNE_USE_GATHER_8PARTS] > > +#define TARGET_USE_SCATTER_8PARTS \ > > + ix86_tune_features[X86_TUNE_USE_SCATTER_8PARTS] > > #define TARGET_FUSE_CMP_AND_BRANCH_32 \ > > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32] > > #define TARGET_FUSE_CMP_AND_BRANCH_64 \ > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > index 8a43187f703..78b499304a4 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -1302,3 +1302,11 @@ msm4 > > Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save > > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and > > SM4 built-in functions and code generation. > > + > > +mgather > > +Target Alias(mtune-ctrl=, use_gather, ^use_gather) > > +Enable vectorization for gather instruction. > > + > > +mscatter > > +Target Alias(mtune-ctrl=, use_scatter, ^use_scatter) > > +Enable vectorization for scatter instruction. > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > > index 40e04ecddbf..d7f20d3a118 100644 > > --- a/gcc/config/i386/x86-tune.def > > +++ b/gcc/config/i386/x86-tune.def > > @@ -511,13 +511,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts", > > > > /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more > > elements. */ > > -DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather", > > +DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts", > > ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_ALDERLAKE | m_ARROWLAKE > > | m_CORE_ATOM | m_GENERIC)) > > > > /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more > > elements. */ > > -DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter", > > +DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts", > > ~(m_ZNVER4)) > > > > /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or > > -- > > 2.31.1 > > > > > -- > BR, > Hongtao -- BR, Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:55 ` Hongtao Liu 2023-08-10 8:07 ` Hongtao Liu @ 2023-08-10 11:11 ` Richard Biener 1 sibling, 0 replies; 18+ messages in thread From: Richard Biener @ 2023-08-10 11:11 UTC (permalink / raw) To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, gcc-patches, hubicka On Thu, Aug 10, 2023 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > <richard.guenther@gmail.com> wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > similar for scatter, there're > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > vectorization for all gather/scatter instructions. The options is > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > Ok for trunk? > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > tunables add ^use_gather_any to cover all cases? (or > > > > change what use_gather controls - it seems we changed its > > > > meaning before, and instead add use_gather_8parts and > > > > use_gather_16parts) > > > > > > > > That is, what's the point of this? > > > > > > https://www.phoronix.com/review/downfall > > > > > > that caused: > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > gather works only on SI/SFmode or larger). > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? No, -mtune-ctl=use_gather should turn them all on as well. > We don't have an extrat explicit flag for target tune, just single bit > - ix86_tune_features[X86_TUNE_USE_GATHER] GCC 11 just had that single bit for all. I'm not sure how awkward it is to have use_gather alias use_gather_2_parts, use_gather_4_parts ... > > > > Richard. > > > > > Uros. > > > > -- > BR, > Hongtao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. 2023-08-10 7:47 ` Richard Biener 2023-08-10 7:55 ` Hongtao Liu @ 2023-08-10 12:05 ` Jan Hubicka 1 sibling, 0 replies; 18+ messages in thread From: Jan Hubicka @ 2023-08-10 12:05 UTC (permalink / raw) To: Richard Biener; +Cc: Uros Bizjak, liuhongt, gcc-patches > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > Currently we have 3 different independent tunes for gather > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > similar for scatter, there're > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > vectorization for all gather/scatter instructions. The options is > > > > interpreted by driver to 3 tunes. > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > Ok for trunk? > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > enable part of an ISA but they won't disable the use of intrinsics > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > May I suggest to invent a more generic "short-cut" to > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > tunables add ^use_gather_any to cover all cases? (or > > > change what use_gather controls - it seems we changed its > > > meaning before, and instead add use_gather_8parts and > > > use_gather_16parts) > > > > > > That is, what's the point of this? > > > > https://www.phoronix.com/review/downfall > > > > that caused: > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > gather works only on SI/SFmode or larger). > > Then -mtune-ctl=^use_gather works which I think is nice enough? -mtune-ctl is really intended for GCC developers. It is not backward compatible, fully documented and bad sets of values may trigger ICEs. If gathers became very slow, I think normal users may want to disable them and in such situation specialized command line option makes sense to me. Honza > > Richard. > > > Uros. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-08-16 7:29 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-08-10 1:11 [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions liuhongt 2023-08-10 1:47 ` Xi Ruoyao 2023-08-10 1:52 ` Liu, Hongtao 2023-08-10 6:04 ` Uros Bizjak 2023-08-10 6:12 ` Hongtao Liu 2023-08-10 7:39 ` Richard Biener 2023-08-10 7:42 ` Uros Bizjak 2023-08-10 7:47 ` Richard Biener 2023-08-10 7:55 ` Hongtao Liu 2023-08-10 8:07 ` Hongtao Liu 2023-08-10 9:16 ` Hongtao Liu 2023-08-10 11:12 ` Richard Biener 2023-08-10 13:23 ` Hongtao Liu 2023-08-11 6:01 ` [PATCH V2] " liuhongt 2023-08-14 2:40 ` Hongtao Liu 2023-08-16 7:37 ` Hongtao Liu 2023-08-10 11:11 ` [PATCH] " Richard Biener 2023-08-10 12:05 ` Jan Hubicka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).