public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD,STORE}_OPTIMAL for generic tune [PR target/98172]
@ 2021-01-28  6:26 Hongtao Liu
  2021-01-28  9:05 ` [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL " Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2021-01-28  6:26 UTC (permalink / raw)
  To: Uros Bizjak, Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

Hi:
   GCC11 will be the system GCC 2 years from now, and for the
processors then, they shouldn't even need to split a 256-bit vector
into 2 128-bits vectors.
   .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
option B is better than Option A.
Option A:
-march=x86-64 -mtune=generic -mavx2 -mfma -Ofast

Option B:
Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"

  Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
  Ok for trunk?




-- 
BR,
Hongtao

[-- Attachment #2: 0001-Enable-X86_TUNE_AVX256_UNALIGNED_-LOAD-STORE-_OPTIMA.patch --]
[-- Type: application/octet-stream, Size: 1674 bytes --]

From c7b3af0a55bad991291e212961a6e0c40cefe7a8 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 28 Jan 2021 14:07:00 +0800
Subject: [PATCH] =?UTF-8?q?Enable=20X86=5FTUNE=5FAVX256=5FUNALIGNED=5F{LOA?=
 =?UTF-8?q?D=EF=BC=8CSTORE}=5FOPTIMAL=20in=20generic=20tune.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

gcc/ChangeLog:

	PR target/98172
	* config/i386/x86-tune.def (X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL):
	(X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL): Remove m_GENERIC
	from ~list.
---
 gcc/config/i386/x86-tune.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 7ace8da7989..f0e07c0f81f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -453,12 +453,12 @@ DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVE
 /* X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL: if false, unaligned loads are
    split.  */
 DEF_TUNE (X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL, "256_unaligned_load_optimal",
-          ~(m_NEHALEM | m_SANDYBRIDGE | m_GENERIC))
+          ~(m_NEHALEM | m_SANDYBRIDGE))
 
 /* X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL: if false, unaligned stores are
    split.  */
 DEF_TUNE (X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL, "256_unaligned_store_optimal",
-	  ~(m_NEHALEM | m_SANDYBRIDGE | m_BDVER | m_ZNVER1 | m_GENERIC))
+	  ~(m_NEHALEM | m_SANDYBRIDGE | m_BDVER | m_ZNVER1))
 
 /* X86_TUNE_AVX256_SPLIT_REGS: if true, AVX256 ops are split into two AVX128 ops.  */
 DEF_TUNE (X86_TUNE_AVX256_SPLIT_REGS, "avx256_split_regs",m_BDVER | m_BTVER2
-- 
2.18.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-01-28  6:26 [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD,STORE}_OPTIMAL for generic tune [PR target/98172] Hongtao Liu
@ 2021-01-28  9:05 ` Richard Biener
  2021-01-28 13:17   ` H.J. Lu
  2021-02-04 10:31   ` Martin Jambor
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2021-01-28  9:05 UTC (permalink / raw)
  To: Hongtao Liu, Martin Jambor; +Cc: Uros Bizjak, Jan Hubicka, GCC Patches

On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi:
>    GCC11 will be the system GCC 2 years from now, and for the
> processors then, they shouldn't even need to split a 256-bit vector
> into 2 128-bits vectors.
>    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> option B is better than Option A.
> Option A:
> -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
>
> Option B:
> Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
>
>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.

Given the explicit list for unaligned loads it's a no-brainer to change that
for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
we should try to benchmark the effect on ZNVER1 - Martin, do we still
have a znver1 machine around?

Note that with the settings differing in a way to split stores but not to split
loads, loading a just stored value can cause bad STLF and quite a
performance hit (since znver1 has 128bit data paths that shouldn't
be an issue there but it would have an issue for actually aligned data
on CPUs with 256bit data paths).

Thanks,
Richard.

>   Ok for trunk?
>
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-01-28  9:05 ` [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL " Richard Biener
@ 2021-01-28 13:17   ` H.J. Lu
  2021-02-04  4:28     ` Hongtao Liu
  2021-02-04 10:31   ` Martin Jambor
  1 sibling, 1 reply; 8+ messages in thread
From: H.J. Lu @ 2021-01-28 13:17 UTC (permalink / raw)
  To: Richard Biener
  Cc: Hongtao Liu, Martin Jambor, Jan Hubicka, Uros Bizjak, GCC Patches

On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi:
> >    GCC11 will be the system GCC 2 years from now, and for the
> > processors then, they shouldn't even need to split a 256-bit vector
> > into 2 128-bits vectors.
> >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > option B is better than Option A.
> > Option A:
> > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> >
> > Option B:
> > Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> >
> >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> Given the explicit list for unaligned loads it's a no-brainer to change that
> for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> we should try to benchmark the effect on ZNVER1 - Martin, do we still
> have a znver1 machine around?

They are also turned on for Sandybridge.  I don't believe we should keep it
in GCC 11 to penalize today's CPUs as well as CPUs in 2024.

> Note that with the settings differing in a way to split stores but not to split
> loads, loading a just stored value can cause bad STLF and quite a
> performance hit (since znver1 has 128bit data paths that shouldn't
> be an issue there but it would have an issue for actually aligned data
> on CPUs with 256bit data paths).
>
> Thanks,
> Richard.
>
> >   Ok for trunk?
> >
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
H.J.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-01-28 13:17   ` H.J. Lu
@ 2021-02-04  4:28     ` Hongtao Liu
  2021-02-04  6:45       ` Uros Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2021-02-04  4:28 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Richard Biener, Martin Jambor, Jan Hubicka, GCC Patches, H.J. Lu

On Thu, Jan 28, 2021 at 9:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi:
> > >    GCC11 will be the system GCC 2 years from now, and for the
> > > processors then, they shouldn't even need to split a 256-bit vector
> > > into 2 128-bits vectors.
> > >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > option B is better than Option A.
> > > Option A:
> > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > >
> > > Option B:
> > > Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > >
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> >
> > Given the explicit list for unaligned loads it's a no-brainer to change that
> > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > have a znver1 machine around?
>
> They are also turned on for Sandybridge.  I don't believe we should keep it
> in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
>
I agree with H.J, and I would also like to hear Uros' opinion.
> > Note that with the settings differing in a way to split stores but not to split
> > loads, loading a just stored value can cause bad STLF and quite a
> > performance hit (since znver1 has 128bit data paths that shouldn't
> > be an issue there but it would have an issue for actually aligned data
> > on CPUs with 256bit data paths).
> >
> > Thanks,
> > Richard.
> >
> > >   Ok for trunk?
> > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> H.J.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-02-04  4:28     ` Hongtao Liu
@ 2021-02-04  6:45       ` Uros Bizjak
  2021-02-04  8:52         ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2021-02-04  6:45 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Biener, Martin Jambor, Jan Hubicka, GCC Patches, H.J. Lu

On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu <crazylht@gmail.com> wrote:

> > > >    GCC11 will be the system GCC 2 years from now, and for the
> > > > processors then, they shouldn't even need to split a 256-bit vector
> > > > into 2 128-bits vectors.
> > > >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > > option B is better than Option A.
> > > > Option A:
> > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > > >
> > > > Option B:
> > > > Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > > >
> > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >
> > > Given the explicit list for unaligned loads it's a no-brainer to change that
> > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > > have a znver1 machine around?
> >
> > They are also turned on for Sandybridge.  I don't believe we should keep it
> > in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
> >
> I agree with H.J, and I would also like to hear Uros' opinion.

I don't have any benchmark data to form my opinion on, but I
definitely agree that the compiler should tune for the newer processor
where speed matters the most, and 10 years old processors are
irrelevant as far as speed is concerned.

So, if it is expected that gcc-11 will be most used in 2-3 years from
now, it should by default target the architecture that will be most
used at that time. But I think that distribution maintainers should
decide here.

Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-02-04  6:45       ` Uros Bizjak
@ 2021-02-04  8:52         ` Richard Biener
  2021-02-05  1:43           ` Hongtao Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2021-02-04  8:52 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongtao Liu, Martin Jambor, Jan Hubicka, GCC Patches, H.J. Lu

On Thu, Feb 4, 2021 at 7:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > > > >    GCC11 will be the system GCC 2 years from now, and for the
> > > > > processors then, they shouldn't even need to split a 256-bit vector
> > > > > into 2 128-bits vectors.
> > > > >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > > > option B is better than Option A.
> > > > > Option A:
> > > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > > > >
> > > > > Option B:
> > > > > Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > > > >
> > > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > >
> > > > Given the explicit list for unaligned loads it's a no-brainer to change that
> > > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > > > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > > > have a znver1 machine around?
> > >
> > > They are also turned on for Sandybridge.  I don't believe we should keep it
> > > in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
> > >
> > I agree with H.J, and I would also like to hear Uros' opinion.
>
> I don't have any benchmark data to form my opinion on, but I
> definitely agree that the compiler should tune for the newer processor
> where speed matters the most, and 10 years old processors are
> irrelevant as far as speed is concerned.
>
> So, if it is expected that gcc-11 will be most used in 2-3 years from
> now, it should by default target the architecture that will be most
> used at that time. But I think that distribution maintainers should
> decide here.

I'm all for the change - the case it could regress is odd anyway as it needs
AVX2 enabled and on CPUs with a 128bit data path those shouldn't be
prefered mutlilibs (thinking of this new x86_64-v2/v3 stuff).

Richard.

> Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-01-28  9:05 ` [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL " Richard Biener
  2021-01-28 13:17   ` H.J. Lu
@ 2021-02-04 10:31   ` Martin Jambor
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Jambor @ 2021-02-04 10:31 UTC (permalink / raw)
  To: Richard Biener, Hongtao Liu; +Cc: Uros Bizjak, Jan Hubicka, GCC Patches

Hi,

On Thu, Jan 28 2021, Richard Biener wrote:
> On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi:
>>    GCC11 will be the system GCC 2 years from now, and for the
>> processors then, they shouldn't even need to split a 256-bit vector
>> into 2 128-bits vectors.
>>    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
>> option B is better than Option A.
>> Option A:
>> -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
>>
>> Option B:
>> Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
>>
>>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> Given the explicit list for unaligned loads it's a no-brainer to change that
> for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> we should try to benchmark the effect on ZNVER1 - Martin, do we still
> have a znver1 machine around?

Sorry, I kept forgetting about this and when I did not, the machine was
busy.  I did just one SPEC CPUrate (single threaded) reference run
without and with the patch (on top of 6b1633378b7), both times with
-Ofast -mavx2 -mtune=generic and with LTO, and the results were actually
rather good (smaller is better):

  SPEC 2017 FPrate (time):
  | Benchmark       | Before | After |      % |
  |-----------------+--------+-------+--------|
  | 503.bwaves_r    |    217 |   209 |  -3.69 |
  | 507.cactuBSSN_r |    236 |   235 |  -0.42 |
  | 508.namd_r      |    252 |   242 |  -3.97 |
  | 510.parest_r    |    384 |   383 |  -0.26 |
  | 511.povray_r    |    486 |   495 |  +1.85 |
  | 519.lbm_r       |    172 |   173 |  +0.58 |
  | 521.wrf_r       |    292 |   277 |  -5.14 |
  | 526.blender_r   |    300 |   303 |  +1.00 |
  | 527.cam4_r      |    255 |   248 |  -2.75 |
  | 538.imagick_r   |    400 |   400 |  +0.00 |
  | 544.nab_r       |    316 |   316 |  +0.00 |
  | 549.fotonik3d_r |    366 |   351 |  -4.10 |
  | 554.roms_r      |    283 |   248 | -12.37 |
  #+TBLFM: $4=100*$3/$2-100;%+.2f


  SPEC 2017 INTrate (time):
  | Benchmark       | Before | After |     % |
  |-----------------+--------+-------+-------|
  | 500.perlbench_r |    446 |   443 | -0.67 |
  | 502.gcc_r       |    267 |   267 | +0.00 |
  | 505.mcf_r       |    285 |   285 | +0.00 |
  | 520.omnetpp_r   |    437 |   436 | -0.23 |
  | 523.xalancbmk_r |    302 |   308 | +1.99 |
  | 525.x264_r      |    217 |   219 | +0.92 |
  | 531.deepsjeng_r |    316 |   311 | -1.58 |
  | 541.leela_r     |    500 |   499 | -0.20 |
  | 548.exchange2_r |    314 |   315 | +0.32 |
  | 557.xz_r        |    391 |   392 | +0.26 |
  #+TBLFM: $4=100*$3/$2-100;%+.2f

If we regard any regressions smaller than 2% as noise then there were
none.  And 554.roms_r really liked the change, even on znver1.

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]
  2021-02-04  8:52         ` Richard Biener
@ 2021-02-05  1:43           ` Hongtao Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Hongtao Liu @ 2021-02-05  1:43 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uros Bizjak, Martin Jambor, Jan Hubicka, GCC Patches, H.J. Lu

On Thu, Feb 4, 2021 at 4:52 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Feb 4, 2021 at 7:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > > > >    GCC11 will be the system GCC 2 years from now, and for the
> > > > > > processors then, they shouldn't even need to split a 256-bit vector
> > > > > > into 2 128-bits vectors.
> > > > > >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > > > > option B is better than Option A.
> > > > > > Option A:
> > > > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > > > > >
> > > > > > Option B:
> > > > > > Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > > > > >
> > > > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > > >
> > > > > Given the explicit list for unaligned loads it's a no-brainer to change that
> > > > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > > > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > > > > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > > > > have a znver1 machine around?
> > > >
> > > > They are also turned on for Sandybridge.  I don't believe we should keep it
> > > > in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
> > > >
> > > I agree with H.J, and I would also like to hear Uros' opinion.
> >
> > I don't have any benchmark data to form my opinion on, but I
> > definitely agree that the compiler should tune for the newer processor
> > where speed matters the most, and 10 years old processors are
> > irrelevant as far as speed is concerned.
> >
> > So, if it is expected that gcc-11 will be most used in 2-3 years from
> > now, it should by default target the architecture that will be most
> > used at that time. But I think that distribution maintainers should
> > decide here.
>
> I'm all for the change - the case it could regress is odd anyway as it needs
> AVX2 enabled and on CPUs with a 128bit data path those shouldn't be
> prefered mutlilibs (thinking of this new x86_64-v2/v3 stuff).
>
I'm going to check in the patch.
> Richard.
>
> > Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-05  1:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28  6:26 [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD,STORE}_OPTIMAL for generic tune [PR target/98172] Hongtao Liu
2021-01-28  9:05 ` [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL " Richard Biener
2021-01-28 13:17   ` H.J. Lu
2021-02-04  4:28     ` Hongtao Liu
2021-02-04  6:45       ` Uros Bizjak
2021-02-04  8:52         ` Richard Biener
2021-02-05  1:43           ` Hongtao Liu
2021-02-04 10:31   ` Martin Jambor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).