[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
@ 2011-06-14  1:32 Fang, Changpeng
  2011-06-14 10:17 ` Richard Guenther
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-14  1:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: hjl.tools, Fang, Changpeng

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

Hi,

The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads. 
However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).

In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
is beneficial.

The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.

Is it OK to commit?

Thanks,

Changpeng 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-pr49089-enable-avx256-splitting-unaligned-load-only-.patch --]
[-- Type: text/x-patch; name="0001-pr49089-enable-avx256-splitting-unaligned-load-only-.patch", Size: 2885 bytes --]

From 415012803abf2cac95c067394504c55dd968f4f5 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH] pr49089: enable avx256 splitting unaligned load only when beneficial

	* config/i386/i386.h (ix86_tune_indices): Introduce
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL): New definition.

	* config/i386/i386.c (ix86_tune_features): Add entry for
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (ix86_option_override_internal): Enable avx256 unaligned load splitting
	  only when TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL is set.
---
 gcc/config/i386/i386.c |    9 +++++++--
 gcc/config/i386/i386.h |    3 +++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..d5f358f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2088,7 +2088,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching
      at -O3.  For the moment, the prefetching seems badly tuned for Intel
      chips.  */
-  m_K6_GEODE | m_AMD_MULTIPLE
+  m_K6_GEODE | m_AMD_MULTIPLE,
+
+  /* X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL: Enable splitting 256-bit
+     unaligned load.  It hurts the performance on Bulldozer.  */
+  m_COREI7
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -4194,7 +4198,8 @@ ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
 	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8badcbb..b2a1bc8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -312,6 +312,7 @@ enum ix86_tune_indices {
   X86_TUNE_OPT_AGU,
   X86_TUNE_VECTORIZE_DOUBLE,
   X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL,
+  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL,
 
   X86_TUNE_LAST
 };
@@ -410,6 +411,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
 #define TARGET_SOFTWARE_PREFETCHING_BENEFICIAL \
 	ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL]
+#define TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL \
+	ix86_tune_features[X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14  1:32 [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic Fang, Changpeng
@ 2011-06-14 10:17 ` Richard Guenther
  2011-06-14 10:20   ` Jakub Jelinek
  2011-06-14 22:47   ` Fang, Changpeng
  0 siblings, 2 replies; 22+ messages in thread
From: Richard Guenther @ 2011-06-14 10:17 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches, hjl.tools

On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads.
> However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>
> In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
> is beneficial.
>
> The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.
>
> Is it OK to commit?

It probably should go to the 4.6 branch as well.  Note that I find the
X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?

I'll defer to x86 maintainers for approval.

Richard.

> Thanks,
>
> Changpeng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 10:17 ` Richard Guenther
@ 2011-06-14 10:20   ` Jakub Jelinek
  2011-06-14 13:24     ` H.J. Lu
  2011-06-14 22:47   ` Fang, Changpeng
  1 sibling, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2011-06-14 10:20 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Fang, Changpeng, gcc-patches, hjl.tools

On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads.
> > However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
> >
> > In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
> > is beneficial.
> >
> > The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.
> >
> > Is it OK to commit?
> 
> It probably should go to the 4.6 branch as well.  Note that I find the
> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?

I also wonder what we should do for -mtune=generic.  Should we split or not?
How big improvement is it on Intel chips, how big degradation does it
cause on AMD chips (I assume no other chip maker currently supports AVX)?

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 10:20   ` Jakub Jelinek
@ 2011-06-14 13:24     ` H.J. Lu
  2011-06-14 23:22       ` Fang, Changpeng
  0 siblings, 1 reply; 22+ messages in thread
From: H.J. Lu @ 2011-06-14 13:24 UTC (permalink / raw)
  To: Jakub Jelinek, sergos.gnu; +Cc: Richard Guenther, Fang, Changpeng, gcc-patches

On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
>> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads.
>> > However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>> >
>> > In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
>> > is beneficial.
>> >
>> > The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.
>> >
>> > Is it OK to commit?
>>
>> It probably should go to the 4.6 branch as well.  Note that I find the
>> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> I also wonder what we should do for -mtune=generic.  Should we split or not?
> How big improvement is it on Intel chips, how big degradation does it
> cause on AMD chips (I assume no other chip maker currently supports AVX)?
>

Simply turning off 32byte aligned load split, which introduces
performance regressions on
Intel Sandy Bridge processors, isn't an appropriate solution.

I am proposing a different approach so that we can improve
-mtune=generic performance
on current Intel and AMD processors.

The current default GCC tuning, -mtune=generic, was implemented in
2005 for Intel
Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
are no longer
applicable to the current Intel nor AMD processors.

We should choose a set of optimization choices for -mtune=generic,
including 32byte
unaligned load split, for the current Intel and AMD processors,  which
should improve
performance with no performance regressions.


-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 10:17 ` Richard Guenther
  2011-06-14 10:20   ` Jakub Jelinek
@ 2011-06-14 22:47   ` Fang, Changpeng
  2011-06-14 23:15     ` H.J. Lu
  1 sibling, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-14 22:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, hjl.tools

>It probably should go to the 4.6 branch as well.

H.J. Lu's original patch that splits unaligned load and store was checked in gcc 4.7
trunk. We found that,  splitting unaligned store is beneficial to bdver1, splitting unaligned
load degrades cfp2006 by 1.3% in geomean on Bulldozer. As a result, we suggest putting
unaligned store splitting (H.J. original patch + this one) back to 4.6 branch.

 >Note that I find the
>X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?

AVX256_SPLIT_UNALIGNED_LOAD has already  been used for the flag: 
-mavx256-split-unaligned-load, and we intend to keep that flag for performance tuning. 
As a result, we put _OPTIMAL (or _BENEFICAL) at the end for default setting.

>I'll defer to x86 maintainers for approval.

So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?

Thanks,

Changpeng

>Richard.

> Thanks,
>
> Changpeng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 22:47   ` Fang, Changpeng
@ 2011-06-14 23:15     ` H.J. Lu
  2011-06-15  0:35       ` Fang, Changpeng
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: H.J. Lu @ 2011-06-14 23:15 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Richard Guenther, gcc-patches

On Tue, Jun 14, 2011 at 3:41 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>>It probably should go to the 4.6 branch as well.
>
> H.J. Lu's original patch that splits unaligned load and store was checked in gcc 4.7
> trunk. We found that,  splitting unaligned store is beneficial to bdver1, splitting unaligned
> load degrades cfp2006 by 1.3% in geomean on Bulldozer. As a result, we suggest putting
> unaligned store splitting (H.J. original patch + this one) back to 4.6 branch.
>
>  >Note that I find the
>>X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>>why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> AVX256_SPLIT_UNALIGNED_LOAD has already  been used for the flag:
> -mavx256-split-unaligned-load, and we intend to keep that flag for performance tuning.
> As a result, we put _OPTIMAL (or _BENEFICAL) at the end for default setting.
>
>
>>I'll defer to x86 maintainers for approval.
>
> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>
>

I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
change and did suggest a different approach for -mtune=generic.

-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 13:24     ` H.J. Lu
@ 2011-06-14 23:22       ` Fang, Changpeng
  2011-06-15  0:21         ` H.J. Lu
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-14 23:22 UTC (permalink / raw)
  To: H.J. Lu, Jakub Jelinek, sergos.gnu; +Cc: Richard Guenther, gcc-patches

A similar argument is for software prefetching, which we observed a ~2% benefit on greyhound (not that much
for Bulldozer). We would also prefer turning on software prefetching at -O3 for -mtune=generic.

--Changprng

________________________________________
From: H.J. Lu [hjl.tools@gmail.com]
Sent: Tuesday, June 14, 2011 8:05 AM
To: Jakub Jelinek; sergos.gnu@gmail.com
Cc: Richard Guenther; Fang, Changpeng; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
>> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads.
>> > However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>> >
>> > In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
>> > is beneficial.
>> >
>> > The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.
>> >
>> > Is it OK to commit?
>>
>> It probably should go to the 4.6 branch as well.  Note that I find the
>> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> I also wonder what we should do for -mtune=generic.  Should we split or not?
> How big improvement is it on Intel chips, how big degradation does it
> cause on AMD chips (I assume no other chip maker currently supports AVX)?
>

Simply turning off 32byte aligned load split, which introduces
performance regressions on
Intel Sandy Bridge processors, isn't an appropriate solution.

I am proposing a different approach so that we can improve
-mtune=generic performance
on current Intel and AMD processors.

The current default GCC tuning, -mtune=generic, was implemented in
2005 for Intel
Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
are no longer
applicable to the current Intel nor AMD processors.

We should choose a set of optimization choices for -mtune=generic,
including 32byte
unaligned load split, for the current Intel and AMD processors,  which
should improve
performance with no performance regressions.

--
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 23:22       ` Fang, Changpeng
@ 2011-06-15  0:21         ` H.J. Lu
  0 siblings, 0 replies; 22+ messages in thread
From: H.J. Lu @ 2011-06-15  0:21 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Jakub Jelinek, sergos.gnu, Richard Guenther, gcc-patches

On Tue, Jun 14, 2011 at 4:01 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> A similar argument is for software prefetching, which we observed a ~2% benefit on greyhound (not that much
> for Bulldozer). We would also prefer turning on software prefetching at -O3 for -mtune=generic.

Sure, we can put everything on the table and take a look.

> Simply turning off 32byte aligned load split, which introduces
> performance regressions on
> Intel Sandy Bridge processors, isn't an appropriate solution.
>
> I am proposing a different approach so that we can improve
> -mtune=generic performance
> on current Intel and AMD processors.
>
> The current default GCC tuning, -mtune=generic, was implemented in
> 2005 for Intel
> Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
> are no longer
> applicable to the current Intel nor AMD processors.
>
> We should choose a set of optimization choices for -mtune=generic,
> including 32byte
> unaligned load split, for the current Intel and AMD processors,  which
> should improve
> performance with no performance regressions.
>
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 23:15     ` H.J. Lu
@ 2011-06-15  0:35       ` Fang, Changpeng
  2011-06-15  2:54         ` H.J. Lu
  2011-06-15 22:07       ` Fang, Changpeng
  2011-06-17  0:53       ` Fang, Changpeng
  2 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-15  0:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, gcc-patches



>
> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?

>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.

What's your suggested approach for -mtune=generic?

My understanding of putting something into -mtune=generic is that all are happy, or
some are willing to lose. This rule guided us not to turn on software prefetching for 
-mtune=generic.

I would like to hear your suggestion that sandy bridge could get the performance while
bulldozer does not lose from splitting avx256 unaligned loads!

Thanks,

Changpeng



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-15  0:35       ` Fang, Changpeng
@ 2011-06-15  2:54         ` H.J. Lu
  0 siblings, 0 replies; 22+ messages in thread
From: H.J. Lu @ 2011-06-15  2:54 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Richard Guenther, gcc-patches

On Tue, Jun 14, 2011 at 4:59 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>
>
>>
>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
> What's your suggested approach for -mtune=generic?
>
> My understanding of putting something into -mtune=generic is that all are happy, or
> some are willing to lose. This rule guided us not to turn on software prefetching for
> -mtune=generic.
>
> I would like to hear your suggestion that sandy bridge could get the performance while
> bulldozer does not lose from splitting avx256 unaligned loads!
>
> Thanks,

See:

http://gcc.gnu.org/ml/gcc-patches/2006-01/msg01045.html

for how we did it last time.

-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 23:15     ` H.J. Lu
  2011-06-15  0:35       ` Fang, Changpeng
@ 2011-06-15 22:07       ` Fang, Changpeng
  2011-06-16  7:51         ` Richard Guenther
  2011-06-17  0:53       ` Fang, Changpeng
  2 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-15 22:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, gcc-patches

>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.

Something must have been broken for the unaligned load splitting in generic mode.

While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting
unaligned loads in generic mode is KILLING us:

For 459.GemsFDTD (ref) on Bulldozer,
 -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
-Ofast -mavx                                                       :    2527s

So, splitting unaligned loads results in the program to run 5~6 times slower!

For 434.zeusmp train run
 -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
-Ofast -mavx                                                       :    106s

Other tests are on-going!


Changpeng.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-15 22:07       ` Fang, Changpeng
@ 2011-06-16  7:51         ` Richard Guenther
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Guenther @ 2011-06-16  7:51 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: H.J. Lu, gcc-patches

On Wed, Jun 15, 2011 at 11:06 PM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
> Something must have been broken for the unaligned load splitting in generic mode.
>
> While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting
> unaligned loads in generic mode is KILLING us:
>
> For 459.GemsFDTD (ref) on Bulldozer,
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
> -Ofast -mavx                                                       :    2527s
>
> So, splitting unaligned loads results in the program to run 5~6 times slower!
>
> For 434.zeusmp train run
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
> -Ofast -mavx                                                       :    106s
>
> Other tests are on-going!

I suspect that the split loads get further split into mov[lh]ps pieces?
We do that for SSE moves with generic tuning at least IIRC.

Richard.

>
> Changpeng.
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-14 23:15     ` H.J. Lu
  2011-06-15  0:35       ` Fang, Changpeng
  2011-06-15 22:07       ` Fang, Changpeng
@ 2011-06-17  0:53       ` Fang, Changpeng
  2011-06-17  1:10         ` H.J. Lu
  2 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-17  0:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

Hi, 

 I modify the patch to disable unaligned load splitting only for bdver1 at this moment.
Unaligned load splitting degrades CFP2006 by 1.3% in geomean for both -mtune=bdver1 and
-mtune=generic on Bulldozer. However, we agree with H.J's suggestion to determine
the optimal optimization sets for modern cpus.

Is is OK to commit the attached patch?

Thanks,

Changpeng



>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>
>>

>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.


.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-pr49089-enable-avx256-splitting-unaligned-load-only-.patch --]
[-- Type: text/x-patch; name="0002-pr49089-enable-avx256-splitting-unaligned-load-only-.patch", Size: 2965 bytes --]

From 913a31b425759ac3427a365646de866161a7908a Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load only when beneficial

	* config/i386/i386.h (ix86_tune_indices): Introduce
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL): New definition.

	* config/i386/i386.c (ix86_tune_features): Add entry for
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (ix86_option_override_internal): Enable avx256 unaligned load splitting
	  only when TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL is set.
---
 gcc/config/i386/i386.c |   10 ++++++++--
 gcc/config/i386/i386.h |    3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..82e6d3e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2088,7 +2088,12 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching
      at -O3.  For the moment, the prefetching seems badly tuned for Intel
      chips.  */
-  m_K6_GEODE | m_AMD_MULTIPLE
+  m_K6_GEODE | m_AMD_MULTIPLE,
+
+  /* X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL: Enable splitting 256-bit
+     unaligned load.  It hurts the performance on Bulldozer. We need to
+     re-tune the generic options for current cpus!  */
+  m_COREI7 | m_GENERIC
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -4194,7 +4199,8 @@ ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
 	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8badcbb..b2a1bc8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -312,6 +312,7 @@ enum ix86_tune_indices {
   X86_TUNE_OPT_AGU,
   X86_TUNE_VECTORIZE_DOUBLE,
   X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL,
+  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL,
 
   X86_TUNE_LAST
 };
@@ -410,6 +411,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
 #define TARGET_SOFTWARE_PREFETCHING_BENEFICIAL \
 	ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL]
+#define TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL \
+	ix86_tune_features[X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17  0:53       ` Fang, Changpeng
@ 2011-06-17  1:10         ` H.J. Lu
  2011-06-17 18:08           ` Fang, Changpeng
  0 siblings, 1 reply; 22+ messages in thread
From: H.J. Lu @ 2011-06-17  1:10 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Richard Guenther, gcc-patches

On Thu, Jun 16, 2011 at 4:54 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
>  I modify the patch to disable unaligned load splitting only for bdver1 at this moment.
> Unaligned load splitting degrades CFP2006 by 1.3% in geomean for both -mtune=bdver1 and
> -mtune=generic on Bulldozer. However, we agree with H.J's suggestion to determine
> the optimal optimization sets for modern cpus.
>
> Is is OK to commit the attached patch?
>
> Thanks,

Why not just move AVX256_SPLIT_UNALIGNED_STORE
and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?

H.J.
> Changpeng
>
>
>
>>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>>
>>>
>
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
>
> .
>
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17  1:10         ` H.J. Lu
@ 2011-06-17 18:08           ` Fang, Changpeng
  2011-06-17 18:33             ` H.J. Lu
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-17 18:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, gcc-patches

>Why not just move AVX256_SPLIT_UNALIGNED_STORE
>and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?

I would like to keep the -m option so that at least we can explicitly turn
off the splittings when regressions are found!

By the way, I can add an index for store splitting, if you want.

Thanks,

Changpeng




H.J.
> Changpeng
>
>
>
>>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>>
>>>
>
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
>
> .
>
>



--
H.J.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17 18:08           ` Fang, Changpeng
@ 2011-06-17 18:33             ` H.J. Lu
  2011-06-17 22:49               ` Fang, Changpeng
  0 siblings, 1 reply; 22+ messages in thread
From: H.J. Lu @ 2011-06-17 18:33 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Richard Guenther, gcc-patches

On Fri, Jun 17, 2011 at 10:45 AM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>Why not just move AVX256_SPLIT_UNALIGNED_STORE
>>and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?
>
> I would like to keep the -m option so that at least we can explicitly turn
> off the splittings when regressions are found!

I prefer to implement it the same way as:

x86_accumulate_outgoing_args
x86_arch_always_fancy_math_387

> By the way, I can add an index for store splitting, if you want.
>

Yes, please.

-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17 18:33             ` H.J. Lu
@ 2011-06-17 22:49               ` Fang, Changpeng
  2011-06-17 23:12                 ` H.J. Lu
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-17 22:49 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

Hi, 

 I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices
and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that
enable it.

Is this OK?

Thanks,

Changpeng



________________________________________
From: H.J. Lu [hjl.tools@gmail.com]
Sent: Friday, June 17, 2011 1:08 PM
To: Fang, Changpeng
Cc: Richard Guenther; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

On Fri, Jun 17, 2011 at 10:45 AM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>Why not just move AVX256_SPLIT_UNALIGNED_STORE
>>and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?
>
> I would like to keep the -m option so that at least we can explicitly turn
> off the splittings when regressions are found!

I prefer to implement it the same way as:

x86_accumulate_outgoing_args
x86_arch_always_fancy_math_387

> By the way, I can add an index for store splitting, if you want.
>

Yes, please.

--
H.J.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch --]
[-- Type: text/x-patch; name="0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch", Size: 3496 bytes --]

From 91e715213bb37d089cb490e769b115d1d131918f Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial

	* config/i386/i386.h (ix86_tune_indices): Introduce
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL and
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL.
	  (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL): New definition.
	  (TARGET_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL): New definition.

	* config/i386/i386.c (ix86_tune_features): Add entries for
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL and
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL.
	  (ix86_option_override_internal): Enable avx256 unaligned load(store)
	  splitting when TARGET_AVX256_SPLIT_UNALIGNED_LOAD(STORE)_OPTIMAL
	  are set.
---
 gcc/config/i386/i386.c |   17 ++++++++++++++---
 gcc/config/i386/i386.h |    4 ++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..b50d349 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2088,7 +2088,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching
      at -O3.  For the moment, the prefetching seems badly tuned for Intel
      chips.  */
-  m_K6_GEODE | m_AMD_MULTIPLE
+  m_K6_GEODE | m_AMD_MULTIPLE,
+
+  /* X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL: Enable splitting 256-bit
+     unaligned load.  It hurts the performance on Bulldozer. We need to
+     re-tune the generic options for current cpus!  */
+  m_COREI7 | m_GENERIC,
+
+  /* X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL: Enable splitting 256-bit
+     unaligned store.  */
+  m_COREI7 | m_BDVER1 | m_GENERIC
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -4194,9 +4203,11 @@ ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
+	  if (TARGET_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
 	}
     }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8badcbb..b6e5570 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -312,6 +312,8 @@ enum ix86_tune_indices {
   X86_TUNE_OPT_AGU,
   X86_TUNE_VECTORIZE_DOUBLE,
   X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL,
+  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL,
+  X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL,
 
   X86_TUNE_LAST
 };
@@ -410,6 +412,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
 #define TARGET_SOFTWARE_PREFETCHING_BENEFICIAL \
 	ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL]
+#define TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL \
+	ix86_tune_features[X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17 22:49               ` Fang, Changpeng
@ 2011-06-17 23:12                 ` H.J. Lu
  2011-06-20 18:44                   ` Fang, Changpeng
  0 siblings, 1 reply; 22+ messages in thread
From: H.J. Lu @ 2011-06-17 23:12 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: Richard Guenther, gcc-patches

On Fri, Jun 17, 2011 at 3:18 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
>  I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices
> and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that
> enable it.
>
> Is this OK?

Can you do something similar to how MASK_ACCUMULATE_OUTGOING_ARGS
is handled?

Thanks.

H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-17 23:12                 ` H.J. Lu
@ 2011-06-20 18:44                   ` Fang, Changpeng
  2011-06-20 18:51                     ` Uros Bizjak
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-20 18:44 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches; +Cc: hubicka, ubizjak, rguenther

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

Hi,

  I modified the patch as H.J. suggested (patch attached).

Is it OK to commit to trunk now?

Thanks,

Changpeng


________________________________________
From: H.J. Lu [hjl.tools@gmail.com]
Sent: Friday, June 17, 2011 5:44 PM
To: Fang, Changpeng
Cc: Richard Guenther; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

On Fri, Jun 17, 2011 at 3:18 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
>  I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices
> and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that
> enable it.
>
> Is this OK?

Can you do something similar to how MASK_ACCUMULATE_OUTGOING_ARGS
is handled?

Thanks.

H.J.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch --]
[-- Type: text/x-patch; name="0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch", Size: 2027 bytes --]

From 50310fc367348b406fc88d54c3ab54d1a304ad52 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial

	* config/i386/i386.c (avx256_split_unaligned_load): New definition.
	  (avx256_split_unaligned_store): New definition.
	  (ix86_option_override_internal): Enable avx256 unaligned load(store)
	  splitting only when avx256_split_unaligned_load(store) is set.
---
 gcc/config/i386/i386.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..3bc0b53 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2121,6 +2121,12 @@ static const unsigned int x86_arch_always_fancy_math_387
   = m_PENT | m_ATOM | m_PPRO | m_AMD_MULTIPLE | m_PENT4
     | m_NOCONA | m_CORE2I7 | m_GENERIC;
 
+static const unsigned int x86_avx256_split_unaligned_load
+  = m_COREI7 | m_GENERIC;
+
+static const unsigned int x86_avx256_split_unaligned_store
+  = m_COREI7 | m_BDVER1 | m_GENERIC;
+
 /* In case the average insn count for single function invocation is
    lower than this constant, emit fast (but longer) prologue and
    epilogue code.  */
@@ -4194,9 +4200,11 @@ ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if ((x86_avx256_split_unaligned_load & ix86_tune_mask)
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
+	  if ((x86_avx256_split_unaligned_store & ix86_tune_mask)
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
 	}
     }
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-20 18:44                   ` Fang, Changpeng
@ 2011-06-20 18:51                     ` Uros Bizjak
  2011-06-20 22:17                       ` Fang, Changpeng
  0 siblings, 1 reply; 22+ messages in thread
From: Uros Bizjak @ 2011-06-20 18:51 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: H.J. Lu, gcc-patches, hubicka, rguenther

On Mon, Jun 20, 2011 at 8:03 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:

>  I modified the patch as H.J. suggested (patch attached).
>
> Is it OK to commit to trunk now?

Yes, this is OK for trunk.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-20 18:51                     ` Uros Bizjak
@ 2011-06-20 22:17                       ` Fang, Changpeng
  2011-06-29 16:19                         ` Eric Botcazou
  0 siblings, 1 reply; 22+ messages in thread
From: Fang, Changpeng @ 2011-06-20 22:17 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, hubicka, rguenther

Thanks,
Patch has been committed to trunk as revision 175230.

Changpeng

________________________________________
From: Uros Bizjak [ubizjak@gmail.com]
Sent: Monday, June 20, 2011 1:38 PM
To: Fang, Changpeng
Cc: H.J. Lu; gcc-patches@gcc.gnu.org; hubicka@ucw.cz; rguenther@suse.de
Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

On Mon, Jun 20, 2011 at 8:03 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:

>  I modified the patch as H.J. suggested (patch attached).
>
> Is it OK to commit to trunk now?

Yes, this is OK for trunk.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
  2011-06-20 22:17                       ` Fang, Changpeng
@ 2011-06-29 16:19                         ` Eric Botcazou
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Botcazou @ 2011-06-29 16:19 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches, Uros Bizjak, H.J. Lu, hubicka, rguenther

> Thanks,

Note that there is no "i386" component in Bugzilla, only a "target" so this 
should have been PR target/49089.  The end result is that there are no xrefs in 
the PR, which is still open btw.  So please add the xrefs to the commits in the 
PR manually and close it if you are done with it.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-06-29 16:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-14  1:32 [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic Fang, Changpeng
2011-06-14 10:17 ` Richard Guenther
2011-06-14 10:20   ` Jakub Jelinek
2011-06-14 13:24     ` H.J. Lu
2011-06-14 23:22       ` Fang, Changpeng
2011-06-15  0:21         ` H.J. Lu
2011-06-14 22:47   ` Fang, Changpeng
2011-06-14 23:15     ` H.J. Lu
2011-06-15  0:35       ` Fang, Changpeng
2011-06-15  2:54         ` H.J. Lu
2011-06-15 22:07       ` Fang, Changpeng
2011-06-16  7:51         ` Richard Guenther
2011-06-17  0:53       ` Fang, Changpeng
2011-06-17  1:10         ` H.J. Lu
2011-06-17 18:08           ` Fang, Changpeng
2011-06-17 18:33             ` H.J. Lu
2011-06-17 22:49               ` Fang, Changpeng
2011-06-17 23:12                 ` H.J. Lu
2011-06-20 18:44                   ` Fang, Changpeng
2011-06-20 18:51                     ` Uros Bizjak
2011-06-20 22:17                       ` Fang, Changpeng
2011-06-29 16:19                         ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).