public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Enabling Software Prefetching by Default at -O3
@ 2010-06-15 22:21 Fang, Changpeng
  2010-06-15 22:24 ` H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-15 22:21 UTC (permalink / raw)
  To: gcc-patches, rguenther; +Cc: sebpop, Zdenek Dvorak, Fang, Changpeng

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]


Hi,

This patch serves as a proposal to turn on software prefetching by default at -O3.

Software prefetching pass has been introduced into gcc for a long time. It has been
observed that the prefetch pass is quite stable now and can noticeably improve
program performance where locality is a concern. As a result, we think it is time
to enable it under -O3 so that we can get the benefit out of it.

We have collected the data that shows the impact of prefetch on cpu2006 performance
 under -O3 with gcc 4.6. There is a ~2% improvement on both integer and floating programs 
and there is no apparent degradation. Programs listed below are those that have at least
(+/-)1% variation:
434.zeusmp (+1.77%), 436.cactusADM (+2.4%), 450.soplex (+1.28%), 
459.GemsFDTD (+5.48%), 470.lbm (+31.84%), 482.sphnix3 (+1.01%), 
458.sjeng (-1.27%), 462.libquantum (+19.23%). 

The patch passed bootstrapping with -O3 -g.  We have observed 2 failues in regression tests:
(1) gcc.dg/torture/stackalign/setjmp-1.c  -O3 -fomit-frame-pointer  (internal compiler error)
This is bug 44503. It is a CFG problem associate with _builtin_prefetch and we have had
a fix for it.
(2) gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2
With prefetching turned on, the memory count is different. We can easily fix the test case after
this patch is accepted.

We realize that prefetch in gcc still has room for improvement. But we think it is good enough
now to be turned on.

Is it  acceptable to commit this patch?

Thanks,

Changpeng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-Enable-fprefetch-loop-arrays-under-O3.patch --]
[-- Type: text/x-patch; name="0002-Enable-fprefetch-loop-arrays-under-O3.patch", Size: 1987 bytes --]

From 50ef9b1dd700ace9854f88814488b7807fcbae1c Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Thu, 10 Jun 2010 14:52:15 -0700
Subject: [PATCH 2/2] Enable -fprefetch-loop-arrays under -O3

	*opts.c (decode_options): Turn on flag_prefetch_loop_arrays
	under -O3.

	*doc/invoke.texi(@item -O3): Say that -O3 includes -fprefetch-loop-arrays.
	(@item -fprefetch-loop-arrays): Say that -fprefetch-loop-arrays
	is enabled by default under -O3.
---
 gcc/doc/invoke.texi |    6 ++++--
 gcc/opts.c          |    1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d8c0c22..1c28a91 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5883,7 +5883,8 @@ invoking @option{-O2} on programs that use computed gotos.
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload} and @option{-ftree-vectorize} options.
+@option{-fgcse-after-reload}, @option{-ftree-vectorize} and
+@option{-fprefetch-loop-arrays} options.
 
 @item -O0
 @opindex O0
@@ -7037,7 +7038,8 @@ memory to improve the performance of loops that access large arrays.
 This option may generate better or worse code; results are highly
 dependent on the structure of loops within the source code.
 
-Disabled at level @option{-Os}.
+This flag is enabled by default at @option{-O3} and disabled at
+level @option{-Os}.
 
 @item -fno-peephole
 @itemx -fno-peephole2
diff --git a/gcc/opts.c b/gcc/opts.c
index 8699ec3..7814341 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -951,6 +951,7 @@ decode_options (unsigned int argc, const char **argv)
   flag_unswitch_loops = opt3;
   flag_gcse_after_reload = opt3;
   flag_tree_vectorize = opt3;
+  flag_prefetch_loop_arrays = opt3;
   flag_ipa_cp_clone = opt3;
   if (flag_ipa_cp_clone)
     flag_ipa_cp = 1;
-- 
1.6.3.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:21 [PATCH] Enabling Software Prefetching by Default at -O3 Fang, Changpeng
@ 2010-06-15 22:24 ` H.J. Lu
  2010-06-15 22:28   ` Fang, Changpeng
  2010-06-18  9:13 ` [PATCH] Enabling Software Prefetching by Default at -O3 Andreas Krebbel
  2010-06-18  9:26 ` Andreas Krebbel
  2 siblings, 1 reply; 31+ messages in thread
From: H.J. Lu @ 2010-06-15 22:24 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches, rguenther, sebpop, Zdenek Dvorak

On Tue, Jun 15, 2010 at 2:34 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>
> Hi,
>
> This patch serves as a proposal to turn on software prefetching by default at -O3.
>
> Software prefetching pass has been introduced into gcc for a long time. It has been
> observed that the prefetch pass is quite stable now and can noticeably improve
> program performance where locality is a concern. As a result, we think it is time
> to enable it under -O3 so that we can get the benefit out of it.
>
> We have collected the data that shows the impact of prefetch on cpu2006 performance
>  under -O3 with gcc 4.6. There is a ~2% improvement on both integer and floating programs
> and there is no apparent degradation. Programs listed below are those that have at least
> (+/-)1% variation:
> 434.zeusmp (+1.77%), 436.cactusADM (+2.4%), 450.soplex (+1.28%),
> 459.GemsFDTD (+5.48%), 470.lbm (+31.84%), 482.sphnix3 (+1.01%),
> 458.sjeng (-1.27%), 462.libquantum (+19.23%).
>
> The patch passed bootstrapping with -O3 -g.  We have observed 2 failues in regression tests:
> (1) gcc.dg/torture/stackalign/setjmp-1.c  -O3 -fomit-frame-pointer  (internal compiler error)
> This is bug 44503. It is a CFG problem associate with _builtin_prefetch and we have had
> a fix for it.
> (2) gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2
> With prefetching turned on, the memory count is different. We can easily fix the test case after
> this patch is accepted.
>
> We realize that prefetch in gcc still has room for improvement. But we think it is good enough
> now to be turned on.
>
> Is it  acceptable to commit this patch?
>

Have you measured performance impact on Intel Core i7?


-- 
H.J.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:24 ` H.J. Lu
@ 2010-06-15 22:28   ` Fang, Changpeng
  2010-06-15 22:44     ` H.J. Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-15 22:28 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, rguenther, sebpop, Zdenek Dvorak

>Have you measured performance impact on Intel Core i7?

Noy yet. I should have mentioned that the performance data I posted in based
on the test on amd-linux64 system and gcc 4.6 r160766. (which includes my
last change to fix tonto regression).

Thanks,

Changpeng


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:28   ` Fang, Changpeng
@ 2010-06-15 22:44     ` H.J. Lu
  2010-06-15 22:47       ` Fang, Changpeng
  0 siblings, 1 reply; 31+ messages in thread
From: H.J. Lu @ 2010-06-15 22:44 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches, rguenther, sebpop, Zdenek Dvorak

On Tue, Jun 15, 2010 at 3:10 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>>Have you measured performance impact on Intel Core i7?
>
> Noy yet. I should have mentioned that the performance data I posted in based
> on the test on amd-linux64 system and gcc 4.6 r160766. (which includes my
> last change to fix tonto regression).
>

Is that rate or speed? I would like to see numbers on Intel Core i7 if you are
proposing enable it by default for x86.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:44     ` H.J. Lu
@ 2010-06-15 22:47       ` Fang, Changpeng
  2010-06-15 23:42         ` Mark Mitchell
  0 siblings, 1 reply; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-15 22:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, rguenther, sebpop, Zdenek Dvorak



>On Tue, Jun 15, 2010 at 3:10 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>>>Have you measured performance impact on Intel Core i7?
>>
>> Noy yet. I should have mentioned that the performance data I posted in based
>> on the test on amd-linux64 system and gcc 4.6 r160766. (which includes my
>> last change to fix tonto regression).
>>

>Is that rate or speed? I would like to see numbers on Intel Core i7 if you are
>proposing enable it by default for x86.

 Speed run. Hopefully prefetching could benefit all targets that support sw prefetch.

Thanks,

Changpeng

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:47       ` Fang, Changpeng
@ 2010-06-15 23:42         ` Mark Mitchell
  2010-06-16  0:30           ` H.J. Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Mark Mitchell @ 2010-06-15 23:42 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: H.J. Lu, gcc-patches, rguenther, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

Fang, Changpeng wrote:

>> Is that rate or speed? I would like to see numbers on Intel Core i7 if you are
>> proposing enable it by default for x86.

HJ, are you in position to test this on Core i7?  Since you work at
Intel, it seems like it might be easier for you to do it than for the
folks at AMD to do it?

Maxim, is this something that you could easily test on some of our
embedded hardware, assuming that the machine descriptions define the
relevant prefetch instructions?  For example, I think that ARM, MIPS,
and Power all have theses instructions in the ISA, and we could look at
CoreMark numbers to at least convince ourselves that this does no harm.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 23:42         ` Mark Mitchell
@ 2010-06-16  0:30           ` H.J. Lu
  2010-06-16  3:43             ` Mark Mitchell
  2010-06-19  0:30             ` H.J. Lu
  0 siblings, 2 replies; 31+ messages in thread
From: H.J. Lu @ 2010-06-16  0:30 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Fang, Changpeng, gcc-patches, rguenther, sebpop, Zdenek Dvorak,
	Maxim Kuvyrkov

On Tue, Jun 15, 2010 at 3:47 PM, Mark Mitchell <mark@codesourcery.com> wrote:
> Fang, Changpeng wrote:
>
>>> Is that rate or speed? I would like to see numbers on Intel Core i7 if you are
>>> proposing enable it by default for x86.
>
> HJ, are you in position to test this on Core i7?  Since you work at
> Intel, it seems like it might be easier for you to do it than for the
> folks at AMD to do it?

We never tried software prefetch since Intel processors rarely need
it.  We will try it. It will take some times.


-- 
H.J.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-16  0:30           ` H.J. Lu
@ 2010-06-16  3:43             ` Mark Mitchell
  2010-06-18  8:51               ` Christian Borntraeger
  2010-06-19  0:30             ` H.J. Lu
  1 sibling, 1 reply; 31+ messages in thread
From: Mark Mitchell @ 2010-06-16  3:43 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Fang, Changpeng, gcc-patches, rguenther, sebpop, Zdenek Dvorak,
	Maxim Kuvyrkov

H.J. Lu wrote:

> We never tried software prefetch since Intel processors rarely need
> it.  We will try it. It will take some times.

OK.

The other thing we can certainly do, once we get some input about how
well it does on various architectures, is to enable it conditionally.
There is no reason that -O3 has to enable the same things on all
architectures.  If prefetching is a win on AMD CPUs, then -O3 should
enable it on AMD CPUs -- but if it hurts Intel CPUs, or MIPS CPUs, or
whatever, then it shouldn't be enabled on those architectures.

So, let's see how generalizable Changpeng's numbers are to other
architectures, and then we can decide where the setting goes.  I think
there's no question from the numbers that -O3 should enable this
optimization on AMD CPUs.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-16  3:43             ` Mark Mitchell
@ 2010-06-18  8:51               ` Christian Borntraeger
  2010-06-18 18:00                 ` Fang, Changpeng
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Borntraeger @ 2010-06-18  8:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: Mark Mitchell, H.J. Lu, Fang, Changpeng, rguenther, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

Am Mittwoch 16 Juni 2010, 01:03:20 schrieb Mark Mitchell:
> H.J. Lu wrote:
> 
> > We never tried software prefetch since Intel processors rarely need
> > it.  We will try it. It will take some times.
> 
> OK.
> 
> The other thing we can certainly do, once we get some input about how
> well it does on various architectures, is to enable it conditionally.
> There is no reason that -O3 has to enable the same things on all
> architectures.  If prefetching is a win on AMD CPUs, then -O3 should
> enable it on AMD CPUs -- but if it hurts Intel CPUs, or MIPS CPUs, or
> whatever, then it shouldn't be enabled on those architectures.
> 
> So, let's see how generalizable Changpeng's numbers are to other
> architectures, and then we can decide where the setting goes.  I think
> there's no question from the numbers that -O3 should enable this
> optimization on AMD CPUs.

We have also seen an overall win (around +1% for int and float) with
-fprefetch-loop-arrays on s390 and activated it on O3 (see config/s390/s390.c
override_options) for 4.6.
So we (s390) are fine with enabling loop prefetching in general for O3.

Changpeng, can you confirm that -O3 -fno-prefetch-loop-arrays does indeed
deactivate prefetching?

Christian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:21 [PATCH] Enabling Software Prefetching by Default at -O3 Fang, Changpeng
  2010-06-15 22:24 ` H.J. Lu
@ 2010-06-18  9:13 ` Andreas Krebbel
  2010-06-18  9:26 ` Andreas Krebbel
  2 siblings, 0 replies; 31+ messages in thread
From: Andreas Krebbel @ 2010-06-18  9:13 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches, borntraeger

Hi,

> This patch serves as a proposal to turn on software prefetching by default at -O3.

On S/390 I ran into a problem with the runtime of the prefetching pass when using
aggressive loop unrolling.  There is an algorithm with quadratic runtime regarding the
memory references which causes a very long compile time.  Christian already opened a
bugzilla for that (PR 44576). It probably would make sense to have a closer look at this
before enabling the pass by default. Perhaps we could simply limit the miss rate
computation to a certain amount of memory references?

Besides of that enabling the prefetching pass by default is ok with me. I think it would
help getting more people into improving it.

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-15 22:21 [PATCH] Enabling Software Prefetching by Default at -O3 Fang, Changpeng
  2010-06-15 22:24 ` H.J. Lu
  2010-06-18  9:13 ` [PATCH] Enabling Software Prefetching by Default at -O3 Andreas Krebbel
@ 2010-06-18  9:26 ` Andreas Krebbel
  2 siblings, 0 replies; 31+ messages in thread
From: Andreas Krebbel @ 2010-06-18  9:26 UTC (permalink / raw)
  To: Fang, Changpeng; +Cc: gcc-patches

Hi,

> This patch serves as a proposal to turn on software prefetching by default at -O3.

We probably also should remove these warnings if -fprefetch-loop-arrays hasn't been
specified explicitly:

#ifndef HAVE_prefetch
  if (flag_prefetch_loop_arrays)
    {
      warning (0, "-fprefetch-loop-arrays not supported for this target");
      flag_prefetch_loop_arrays = 0;
    }
#else
  if (flag_prefetch_loop_arrays && !HAVE_prefetch)
    {
      warning (0, "-fprefetch-loop-arrays not supported for this target (try -march
switches)");
      flag_prefetch_loop_arrays = 0;
    }
#endif

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-18  8:51               ` Christian Borntraeger
@ 2010-06-18 18:00                 ` Fang, Changpeng
  0 siblings, 0 replies; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-18 18:00 UTC (permalink / raw)
  To: Christian Borntraeger, gcc-patches
  Cc: Mark Mitchell, H.J. Lu, rguenther, sebpop, Zdenek Dvorak, Maxim Kuvyrkov


________________________________________
>From: Christian Borntraeger [borntraeger@de.ibm.com]

>We have also seen an overall win (around +1% for int and float) with
>-fprefetch-loop-arrays on s390 and activated it on O3 (see config/s390/s390.c
>override_options) for 4.6.
>So we (s390) are fine with enabling loop prefetching in general for O3.

>Changpeng, can you confirm that -O3 -fno-prefetch-loop-arrays does indeed
>deactivate prefetching?

Yes. It should be and I have verified that. Thanks.

Changpeng


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-16  0:30           ` H.J. Lu
  2010-06-16  3:43             ` Mark Mitchell
@ 2010-06-19  0:30             ` H.J. Lu
  2010-06-20  0:11               ` Christian Borntraeger
  1 sibling, 1 reply; 31+ messages in thread
From: H.J. Lu @ 2010-06-19  0:30 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Fang, Changpeng, gcc-patches, rguenther, sebpop, Zdenek Dvorak,
	Maxim Kuvyrkov

On Tue, Jun 15, 2010 at 3:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jun 15, 2010 at 3:47 PM, Mark Mitchell <mark@codesourcery.com> wrote:
>> Fang, Changpeng wrote:
>>
>>>> Is that rate or speed? I would like to see numbers on Intel Core i7 if you are
>>>> proposing enable it by default for x86.
>>
>> HJ, are you in position to test this on Core i7?  Since you work at
>> Intel, it seems like it might be easier for you to do it than for the
>> folks at AMD to do it?
>
> We never tried software prefetch since Intel processors rarely need
> it.  We will try it. It will take some times.
>

Here are what we got on Intel Core i7 64bit with -O3 vs -O3
-fprefetch-loop-arrays:

400.perlbench 			 -0.369004%
401.bzip2 			 -3%
403.gcc 			 -1.5748%
429.mcf 			 0.784314%
445.gobmk 			 -1.28755%
456.hmmer 			 -1.67364%
458.sjeng 			 0%
462.libquantum 			 2.57827%
464.h264ref 			 -0.806452%
471.omnetpp 			 -1.51515%
473.astar 			 -0.581395%
483.xalancbmk 			 0%
SPECint(R)_base2006 			 -0.766284%
410.bwaves 			 -1.27796%
416.gamess 			 -1.2605%
433.milc 			 0%
434.zeusmp 			 1.24481%
435.gromacs 			 -0.478469%
436.cactusADM 			 -5.07813%
437.leslie3d 			 -1.10294%
444.namd 			 0%
450.soplex 			 0.628931%
453.povray 			 0.392157%
454.calculix 			 0%
459.GemsFDTD 			 -1.51515%
465.tonto 			 0%
470.lbm 			 -1.16279%
481.wrf 			 -0.442478%
482.sphinx3 			 2.51397%
SPECfp(R)_base2006 			 -0.769231%

It doesn't help Intel Core i7.  I think it should be enabled
with -mtune=XXX on x86 where prefetch improves
performance on XXX.


-- 
H.J.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-19  0:30             ` H.J. Lu
@ 2010-06-20  0:11               ` Christian Borntraeger
  2010-06-20  2:34                 ` Mark Mitchell
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Borntraeger @ 2010-06-20  0:11 UTC (permalink / raw)
  To: gcc-patches
  Cc: H.J. Lu, Mark Mitchell, Fang, Changpeng, rguenther, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

Am Samstag 19 Juni 2010, 00:07:07 schrieb H.J. Lu:
> > We never tried software prefetch since Intel processors rarely need
> > it.  We will try it. It will take some times.
> >
> 
> Here are what we got on Intel Core i7 64bit with -O3 vs -O3
> -fprefetch-loop-arrays:
> 
> 400.perlbench 			 -0.369004%
> 401.bzip2 			 -3%
> 403.gcc 			 -1.5748%
> 429.mcf 			 0.784314%
> 445.gobmk 			 -1.28755%
> 456.hmmer 			 -1.67364%
> 458.sjeng 			 0%
> 462.libquantum 			 2.57827%
> 464.h264ref 			 -0.806452%
> 471.omnetpp 			 -1.51515%
> 473.astar 			 -0.581395%
> 483.xalancbmk 			 0%
> SPECint(R)_base2006 			 -0.766284%
> 410.bwaves 			 -1.27796%
> 416.gamess 			 -1.2605%
> 433.milc 			 0%
> 434.zeusmp 			 1.24481%
> 435.gromacs 			 -0.478469%
> 436.cactusADM 			 -5.07813%
> 437.leslie3d 			 -1.10294%
> 444.namd 			 0%
> 450.soplex 			 0.628931%
> 453.povray 			 0.392157%
> 454.calculix 			 0%
> 459.GemsFDTD 			 -1.51515%
> 465.tonto 			 0%
> 470.lbm 			 -1.16279%
> 481.wrf 			 -0.442478%
> 482.sphinx3 			 2.51397%
> SPECfp(R)_base2006 			 -0.769231%
> 
> It doesn't help Intel Core i7.  I think it should be enabled
> with -mtune=XXX on x86 where prefetch improves
> performance on XXX.

It also might be worth to investigate if overriding the parameters per
-mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
that on s390 since the default values were not ideal:
e.g. we have several of these
[...]
  if (!PARAM_SET_P (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO))
    set_param_value ("prefetch-min-insn-to-mem-ratio", 2);
  if (!PARAM_SET_P (PARAM_SIMULTANEOUS_PREFETCHES))
    set_param_value ("simultaneous-prefetches", 6);
[...]
in override_options.

e.g. if software prefetch is expensive you could make it happen less often
for core i7 or vice versa.

Christian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-20  0:11               ` Christian Borntraeger
@ 2010-06-20  2:34                 ` Mark Mitchell
  2010-06-20  4:29                   ` Andi Kleen
                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Mark Mitchell @ 2010-06-20  2:34 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, H.J. Lu, Fang, Changpeng, rguenther, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

Christian Borntraeger wrote:

> It also might be worth to investigate if overriding the parameters per
> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> that on s390 since the default values were not ideal

Yes, that might be a good idea for i7.

But, in the meantime, I think we should get a version of the patch that
turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
consistency for all CPUs and it clearly benefits the AMD CPUs.
Changpeng, would you please submit a patch that activates this
optimization only with tuning for AMD CPUs?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-20  2:34                 ` Mark Mitchell
@ 2010-06-20  4:29                   ` Andi Kleen
  2010-06-22 23:40                   ` Fang, Changpeng
  2010-06-24 23:37                   ` Fang, Changpeng
  2 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2010-06-20  4:29 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Christian Borntraeger, gcc-patches, H.J. Lu, Fang, Changpeng,
	rguenther, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

Mark Mitchell <mark@codesourcery.com> writes:
>
> Yes, that might be a good idea for i7.
>
> But, in the meantime, I think we should get a version of the patch that
> turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
> consistency for all CPUs and it clearly benefits the AMD CPUs.
> Changpeng, would you please submit a patch that activates this
> optimization only with tuning for AMD CPUs?

On i7 it will be likely still a win if hardware prefetch is disabled
in the BIOS. Most BIOS have a setting like this.

Unfortunately this cannot be easily detected by a program.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-20  2:34                 ` Mark Mitchell
  2010-06-20  4:29                   ` Andi Kleen
@ 2010-06-22 23:40                   ` Fang, Changpeng
  2010-06-23  6:21                     ` Christian Borntraeger
  2010-06-24 23:37                   ` Fang, Changpeng
  2 siblings, 1 reply; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-22 23:40 UTC (permalink / raw)
  To: Mark Mitchell, Christian Borntraeger
  Cc: gcc-patches, H.J. Lu, rguenther, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

Hi, Mark:

>> It also might be worth to investigate if overriding the parameters per
>> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
>> that on s390 since the default values were not ideal

>Yes, that might be a good idea for i7.

>But, in the meantime, I think we should get a version of the patch that
>turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
>consistency for all CPUs and it clearly benefits the AMD CPUs.
>Changpeng, would you please submit a patch that activates this
>optimization only with tuning for AMD CPUs?

It seems I have a problem in setting prefetching default at -O3 only for AMD CPUs.
When we know it is AMD CPUS after the command line options have been parsed,
we don't know whether -fno-prefetch-loop-arrays has been specified in the command line
or not. If we turn on prefetching at -O3, then we could not explicitly turn it off if we want.

What I want is something like:
 if  (!OPTION_SET_P (flag_prefetch_loop_arrays))
   flag_prefetch_loop_arrays = 1;

An alternative way is parsing -m options before other options.

What do you think?

Thanks,

Changpeng

 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-22 23:40                   ` Fang, Changpeng
@ 2010-06-23  6:21                     ` Christian Borntraeger
  2010-06-23  9:26                       ` Richard Guenther
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Borntraeger @ 2010-06-23  6:21 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: Mark Mitchell, gcc-patches, H.J. Lu, rguenther, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

Am Mittwoch 23 Juni 2010, 00:05:57 schrieb Fang, Changpeng:
> Hi, Mark:
> 
> >> It also might be worth to investigate if overriding the parameters per
> >> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> >> that on s390 since the default values were not ideal
> 
> >Yes, that might be a good idea for i7.
> 
> >But, in the meantime, I think we should get a version of the patch that
> >turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
> >consistency for all CPUs and it clearly benefits the AMD CPUs.
> >Changpeng, would you please submit a patch that activates this
> >optimization only with tuning for AMD CPUs?
> 
> It seems I have a problem in setting prefetching default at -O3 only for AMD CPUs.
> When we know it is AMD CPUS after the command line options have been parsed,
> we don't know whether -fno-prefetch-loop-arrays has been specified in the command line
> or not. If we turn on prefetching at -O3, then we could not explicitly turn it off if we want.
> 
> What I want is something like:
>  if  (!OPTION_SET_P (flag_prefetch_loop_arrays))
>    flag_prefetch_loop_arrays = 1;

I think having an OPTION_SET_P  (or maybe name that FLAG_SET_P) makes a lot of sense and
would match the PARAM_SET_P way of doing things.

Christian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-23  6:21                     ` Christian Borntraeger
@ 2010-06-23  9:26                       ` Richard Guenther
  2010-06-23 12:02                         ` Joseph S. Myers
  2010-06-23 12:30                         ` Christian Borntraeger
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Guenther @ 2010-06-23  9:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Fang, Changpeng, Mark Mitchell, gcc-patches, H.J. Lu, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

On Wed, 23 Jun 2010, Christian Borntraeger wrote:

> Am Mittwoch 23 Juni 2010, 00:05:57 schrieb Fang, Changpeng:
> > Hi, Mark:
> > 
> > >> It also might be worth to investigate if overriding the parameters per
> > >> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> > >> that on s390 since the default values were not ideal
> > 
> > >Yes, that might be a good idea for i7.
> > 
> > >But, in the meantime, I think we should get a version of the patch that
> > >turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
> > >consistency for all CPUs and it clearly benefits the AMD CPUs.
> > >Changpeng, would you please submit a patch that activates this
> > >optimization only with tuning for AMD CPUs?
> > 
> > It seems I have a problem in setting prefetching default at -O3 only for AMD CPUs.
> > When we know it is AMD CPUS after the command line options have been parsed,
> > we don't know whether -fno-prefetch-loop-arrays has been specified in the command line
> > or not. If we turn on prefetching at -O3, then we could not explicitly turn it off if we want.
> > 
> > What I want is something like:
> >  if  (!OPTION_SET_P (flag_prefetch_loop_arrays))
> >    flag_prefetch_loop_arrays = 1;
> 
> I think having an OPTION_SET_P  (or maybe name that FLAG_SET_P) makes a lot of sense and
> would match the PARAM_SET_P way of doing things.

We use tri-states for this elsewhere in the compiler.

Richard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-23  9:26                       ` Richard Guenther
@ 2010-06-23 12:02                         ` Joseph S. Myers
  2010-06-23 12:30                         ` Christian Borntraeger
  1 sibling, 0 replies; 31+ messages in thread
From: Joseph S. Myers @ 2010-06-23 12:02 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Christian Borntraeger, Fang, Changpeng, Mark Mitchell,
	gcc-patches, H.J. Lu, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

On Wed, 23 Jun 2010, Richard Guenther wrote:

> > > What I want is something like:
> > >  if  (!OPTION_SET_P (flag_prefetch_loop_arrays))
> > >    flag_prefetch_loop_arrays = 1;
> > 
> > I think having an OPTION_SET_P  (or maybe name that FLAG_SET_P) makes a lot of sense and
> > would match the PARAM_SET_P way of doing things.
> 
> We use tri-states for this elsewhere in the compiler.

We use a mixture of tri-states and separate *_set variables.  When my 
patch series for option handling and multilib selection reaches the point 
of an options structure replacing separate variables for each option, I 
expect to use a separate such structure to record which options have been 
set explicitly.  That is:

  if (!global_options_explicit.opt.prefetch_loop_arrays)
    global_options.opt.prefetch_loop_arrays = 1;

(but hopefully using pointers to the relevant structures rather than 
hardcoding global_options and global_options_explicit).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-23  9:26                       ` Richard Guenther
  2010-06-23 12:02                         ` Joseph S. Myers
@ 2010-06-23 12:30                         ` Christian Borntraeger
  2010-06-23 13:13                           ` Manuel López-Ibáñez
  1 sibling, 1 reply; 31+ messages in thread
From: Christian Borntraeger @ 2010-06-23 12:30 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Fang, Changpeng, Mark Mitchell, gcc-patches, H.J. Lu, sebpop,
	Zdenek Dvorak, Maxim Kuvyrkov

Am Mittwoch 23 Juni 2010, 10:50:35 schrieb Richard Guenther:
> > > What I want is something like:
> > >  if  (!OPTION_SET_P (flag_prefetch_loop_arrays))
> > >    flag_prefetch_loop_arrays = 1;
> > 
> > I think having an OPTION_SET_P  (or maybe name that FLAG_SET_P) makes a lot of sense and
> > would match the PARAM_SET_P way of doing things.
> 
> We use tri-states for this elsewhere in the compiler.

So something like the following unfinished patch 
would be the (current) way to go?

Christian

Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig
--- gcc/common.opt
*************** Common Report Var(flag_predictive_common
*** 945,951 ****
  Run predictive commoning optimization.
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays) Optimization
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
--- 945,951 ----
  Run predictive commoning optimization.
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays) Init(2) Optimization
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** override_options (void)
*** 1676,1682 ****
  
    /* This cannot reside in optimization_options since HAVE_prefetch
       requires the arch flags to be evaluated already.  */
!   if (HAVE_prefetch && optimize >= 3)
      flag_prefetch_loop_arrays = 1;
  }
  
--- 1676,1682 ----
  
    /* This cannot reside in optimization_options since HAVE_prefetch
       requires the arch flags to be evaluated already.  */
!   if (HAVE_prefetch && optimize >= 3 && flag_prefetch_loop_arrays == 2)
      flag_prefetch_loop_arrays = 1;
  }
  
Index: gcc/toplev.c
===================================================================
*** gcc/toplev.c.orig
--- gcc/toplev.c
*************** process_options (void)
*** 2048,2053 ****
--- 2048,2056 ----
      }
  
  #ifndef HAVE_prefetch
+   if (flag_prefetch_loop_arrays == 2)
+     flag_prefetch_loop_arrays = 0;
+ 
    if (flag_prefetch_loop_arrays)
      {
        warning (0, "-fprefetch-loop-arrays not supported for this target");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-23 12:30                         ` Christian Borntraeger
@ 2010-06-23 13:13                           ` Manuel López-Ibáñez
  0 siblings, 0 replies; 31+ messages in thread
From: Manuel López-Ibáñez @ 2010-06-23 13:13 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Richard Guenther, Fang, Changpeng, Mark Mitchell, gcc-patches,
	H.J. Lu, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

>
>  fprefetch-loop-arrays
> ! Common Report Var(flag_prefetch_loop_arrays) Init(2) Optimization

For consistency with other options, I would suggest to use Init(-1)
for an "uninitialized" option. As Joseph said, this should be done
differently in the future, but for now using -1 is the usual setting.
I didn't understand how (or if) Joseph will implement the framework to
set default values. As I understand it, Init(X) should only set the
value X to the Var if it is not set by anything else before. Then, we
can move the default values back to the *.opt files. Otherwise, such a
patch would be an improvement.

Cheers,

Manuel.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-20  2:34                 ` Mark Mitchell
  2010-06-20  4:29                   ` Andi Kleen
  2010-06-22 23:40                   ` Fang, Changpeng
@ 2010-06-24 23:37                   ` Fang, Changpeng
  2010-06-25 11:48                     ` Richard Guenther
  2010-06-25 12:29                     ` Christian Borntraeger
  2 siblings, 2 replies; 31+ messages in thread
From: Fang, Changpeng @ 2010-06-24 23:37 UTC (permalink / raw)
  To: Mark Mitchell, Christian Borntraeger
  Cc: gcc-patches, H.J. Lu, rguenther, sebpop, Zdenek Dvorak, Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

Hi, 

Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c 
(override_options).

Is this OK to commit now?

Thanks, 

Changpeng

________________________________________
From: Mark Mitchell [mark@codesourcery.com]
Sent: Saturday, June 19, 2010 3:05 PM
To: Christian Borntraeger
Cc: gcc-patches@gcc.gnu.org; H.J. Lu; Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com; Zdenek Dvorak; Maxim Kuvyrkov
Subject: Re: [PATCH] Enabling Software Prefetching by Default at -O3

Christian Borntraeger wrote:

> It also might be worth to investigate if overriding the parameters per
> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> that on s390 since the default values were not ideal

Yes, that might be a good idea for i7.

But, in the meantime, I think we should get a version of the patch that
turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
consistency for all CPUs and it clearly benefits the AMD CPUs.
Changpeng, would you please submit a patch that activates this
optimization only with tuning for AMD CPUs?

Thanks,

--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Enable-prefetching-at-O3-for-AMD-cpus.patch --]
[-- Type: text/x-patch; name="0003-Enable-prefetching-at-O3-for-AMD-cpus.patch", Size: 4248 bytes --]

From 7f48bc625b0e451dd8c05a3a3cc20f68dcaa695c Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Wed, 23 Jun 2010 17:05:59 -0700
Subject: [PATCH 3/3] Enable prefetching at -O3 for AMD cpus

	* gcc/common.opt (fprefetch-loop-arrays): Re-define
	-fprefetch-loop-arrays as a tri-state option with the
	initial value of -1.
	* gcc/tree-ssa-loop.c (gate_tree_ssa_loop_prefetch): Invoke
	prefetch pass only when flag_prefetch_loop_arrays > 0.
	* gcc/toplev.c (process_options): Note that, with tri-states,
	flag_prefetch_loop_arrays>0 means prefetching is enabled.
	* gcc/config/i386/i386.c (override_options): Enable prefetching
	at -O3 for a set of CPUs that sw prefetching is helpful.
	(software_prefetching_beneficial_p): New.  Return TRUE if
	software prefetching is beneficial for the given CPU.
---
 gcc/common.opt         |    2 +-
 gcc/config/i386/i386.c |   27 +++++++++++++++++++++++++++
 gcc/toplev.c           |    6 +++---
 gcc/tree-ssa-loop.c    |    2 +-
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4904481..74fbd1d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -937,7 +937,7 @@ Common Report Var(flag_predictive_commoning) Optimization
 Run predictive commoning optimization.
 
 fprefetch-loop-arrays
-Common Report Var(flag_prefetch_loop_arrays) Optimization
+Common Report Var(flag_prefetch_loop_arrays) Init(-1) Optimization
 Generate prefetch instructions, if available, for arrays in loops
 
 fprofile
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2a46f89..605e57b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2691,6 +2691,26 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune,
   return ret;
 }
 
+/* Return TRUE if software prefetching is beneficial for the
+   given CPU. */
+
+static bool
+software_prefetching_beneficial_p (void)
+{
+  switch (ix86_tune)
+    {
+    case PROCESSOR_GEODE:
+    case PROCESSOR_K6:
+    case PROCESSOR_ATHLON:
+    case PROCESSOR_K8:
+    case PROCESSOR_AMDFAM10:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* Function that is callable from the debugger to print the current
    options.  */
 void
@@ -3531,6 +3551,13 @@ override_options (bool main_args_p)
   if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
     set_param_value ("l2-cache-size", ix86_cost->l2_cache_size);
 
+  /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
+  if (flag_prefetch_loop_arrays < 0
+      && HAVE_prefetch
+      && optimize >= 3
+      && software_prefetching_beneficial_p())
+    flag_prefetch_loop_arrays = 1;
+
   /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0)
      can be optimized to ap = __builtin_next_arg (0).  */
   if (!TARGET_64BIT)
diff --git a/gcc/toplev.c b/gcc/toplev.c
index ff4c850..369820b 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2078,13 +2078,13 @@ process_options (void)
     }
 
 #ifndef HAVE_prefetch
-  if (flag_prefetch_loop_arrays)
+  if (flag_prefetch_loop_arrays > 0)
     {
       warning (0, "-fprefetch-loop-arrays not supported for this target");
       flag_prefetch_loop_arrays = 0;
     }
 #else
-  if (flag_prefetch_loop_arrays && !HAVE_prefetch)
+  if (flag_prefetch_loop_arrays > 0 && !HAVE_prefetch)
     {
       warning (0, "-fprefetch-loop-arrays not supported for this target (try -march switches)");
       flag_prefetch_loop_arrays = 0;
@@ -2093,7 +2093,7 @@ process_options (void)
 
   /* This combination of options isn't handled for i386 targets and doesn't
      make much sense anyway, so don't allow it.  */
-  if (flag_prefetch_loop_arrays && optimize_size)
+  if (flag_prefetch_loop_arrays > 0 && optimize_size)
     {
       warning (0, "-fprefetch-loop-arrays is not supported with -Os");
       flag_prefetch_loop_arrays = 0;
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 344cfa8..c9c5bbd 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -600,7 +600,7 @@ tree_ssa_loop_prefetch (void)
 static bool
 gate_tree_ssa_loop_prefetch (void)
 {
-  return flag_prefetch_loop_arrays != 0;
+  return flag_prefetch_loop_arrays > 0;
 }
 
 struct gimple_opt_pass pass_loop_prefetch =
-- 
1.6.3.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-24 23:37                   ` Fang, Changpeng
@ 2010-06-25 11:48                     ` Richard Guenther
  2010-06-25 20:16                       ` Sebastian Pop
  2010-06-25 12:29                     ` Christian Borntraeger
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Guenther @ 2010-06-25 11:48 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: Mark Mitchell, Christian Borntraeger, gcc-patches, H.J. Lu,
	sebpop, Zdenek Dvorak, Maxim Kuvyrkov

On Thu, 24 Jun 2010, Fang, Changpeng wrote:

> Hi, 
> 
> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c 
> (override_options).
> 
> Is this OK to commit now?

Ok.

Thanks,
Richard.

> 
> Thanks, 
> 
> Changpeng
> 
> ________________________________________
> From: Mark Mitchell [mark@codesourcery.com]
> Sent: Saturday, June 19, 2010 3:05 PM
> To: Christian Borntraeger
> Cc: gcc-patches@gcc.gnu.org; H.J. Lu; Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com; Zdenek Dvorak; Maxim Kuvyrkov
> Subject: Re: [PATCH] Enabling Software Prefetching by Default at -O3
> 
> Christian Borntraeger wrote:
> 
> > It also might be worth to investigate if overriding the parameters per
> > -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> > that on s390 since the default values were not ideal
> 
> Yes, that might be a good idea for i7.
> 
> But, in the meantime, I think we should get a version of the patch that
> turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
> consistency for all CPUs and it clearly benefits the AMD CPUs.
> Changpeng, would you please submit a patch that activates this
> optimization only with tuning for AMD CPUs?
> 
> Thanks,
> 
> --
> Mark Mitchell
> CodeSourcery
> mark@codesourcery.com
> (650) 331-3385 x713
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-24 23:37                   ` Fang, Changpeng
  2010-06-25 11:48                     ` Richard Guenther
@ 2010-06-25 12:29                     ` Christian Borntraeger
  2010-07-02 12:46                       ` [PATCH] Enabling Software Prefetching by Default@-O3 Ulrich Weigand
  1 sibling, 1 reply; 31+ messages in thread
From: Christian Borntraeger @ 2010-06-25 12:29 UTC (permalink / raw)
  To: Fang, Changpeng, gcc-patches; +Cc: Andreas Krebbel, uweigand

[-- Attachment #1: Type: Text/Plain, Size: 612 bytes --]

Am Donnerstag 24 Juni 2010, 23:18:28 schrieb Fang, Changpeng:
> Hi, 
> 
> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c 
> (override_options).
> 
> Is this OK to commit now?

Looks good. As soon as this is committed, somebody  should commit the following fix
that makes fno-prefetch-loop-arrays work again on s390.

Andreas, Ulrich. Can you have a look and apply after Changpengs patch if ok.


[-- Attachment #2: fixups390.patch --]
[-- Type: text/x-patch, Size: 1185 bytes --]

2010-06-25  Christian Borntraeger  <borntraeger@de.ibm.com>
 
	* config/s390/s390.c (override_options): Adopt prefetching
        at -O3 to handle flag_prefetch_loop_arrays as a tristate.

Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** override_options (void)
*** 1675,1683 ****
      set_param_value ("simultaneous-prefetches", 6);
  
    /* This cannot reside in optimization_options since HAVE_prefetch
!      requires the arch flags to be evaluated already.  */
!   if (HAVE_prefetch && optimize >= 3)
!     flag_prefetch_loop_arrays = 1;
  }
  
  /* Map for smallest class containing reg regno.  */
--- 1675,1684 ----
      set_param_value ("simultaneous-prefetches", 6);
  
    /* This cannot reside in optimization_options since HAVE_prefetch
!      requires the arch flags to be evaluated already.  Since prefetching
!      is benefitial on s390, we enable it if available.  */
!   if (flag_prefetch_loop_arrays < 0 && HAVE_prefetch && optimize >= 3)
!       flag_prefetch_loop_arrays = 1;
  }
  
  /* Map for smallest class containing reg regno.  */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-25 11:48                     ` Richard Guenther
@ 2010-06-25 20:16                       ` Sebastian Pop
  2010-06-26 17:01                         ` Richard Guenther
  0 siblings, 1 reply; 31+ messages in thread
From: Sebastian Pop @ 2010-06-25 20:16 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Fang, Changpeng, Mark Mitchell, Christian Borntraeger,
	gcc-patches, H.J. Lu, Zdenek Dvorak, Maxim Kuvyrkov

On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote:
> On Thu, 24 Jun 2010, Fang, Changpeng wrote:
>
>> Hi,
>>
>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
>> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c
>> (override_options).
>>
>> Is this OK to commit now?
>
> Ok.
>

Committed r161391.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-25 20:16                       ` Sebastian Pop
@ 2010-06-26 17:01                         ` Richard Guenther
  2010-06-26 17:30                           ` Richard Guenther
  2010-06-26 17:38                           ` Jan Hubicka
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Guenther @ 2010-06-26 17:01 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: Richard Guenther, Fang, Changpeng, Mark Mitchell,
	Christian Borntraeger, gcc-patches, H.J. Lu, Zdenek Dvorak,
	Maxim Kuvyrkov

On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote:
>> On Thu, 24 Jun 2010, Fang, Changpeng wrote:
>>
>>> Hi,
>>>
>>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
>>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
>>> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c
>>> (override_options).
>>>
>>> Is this OK to commit now?
>>
>> Ok.
>>
>
> Committed r161391.

This may have regressed scimark sparse matmult by 20% on AMD Fam8
at -O3 -ffast-math -funroll-loops -march=native.

Richard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-26 17:01                         ` Richard Guenther
@ 2010-06-26 17:30                           ` Richard Guenther
  2010-06-26 17:38                           ` Jan Hubicka
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Guenther @ 2010-06-26 17:30 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: Richard Guenther, Fang, Changpeng, Mark Mitchell,
	Christian Borntraeger, gcc-patches, H.J. Lu, Zdenek Dvorak,
	Maxim Kuvyrkov

On Sat, Jun 26, 2010 at 4:21 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote:
>>> On Thu, 24 Jun 2010, Fang, Changpeng wrote:
>>>
>>>> Hi,
>>>>
>>>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
>>>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
>>>> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c
>>>> (override_options).
>>>>
>>>> Is this OK to commit now?
>>>
>>> Ok.
>>>
>>
>> Committed r161391.
>
> This may have regressed scimark sparse matmult by 20% on AMD Fam8
> at -O3 -ffast-math -funroll-loops -march=native.

It also looks like prefetching has a very negative impact on SPEC CPU 2000 FP
in 32bit mode: http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html
(that's Fam10).

Richard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-26 17:01                         ` Richard Guenther
  2010-06-26 17:30                           ` Richard Guenther
@ 2010-06-26 17:38                           ` Jan Hubicka
  2010-06-26 18:23                             ` Jan Hubicka
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Hubicka @ 2010-06-26 17:38 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Sebastian Pop, Richard Guenther, Fang, Changpeng, Mark Mitchell,
	Christian Borntraeger, gcc-patches, H.J. Lu, Zdenek Dvorak,
	Maxim Kuvyrkov

> On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> > On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote:
> >> On Thu, 24 Jun 2010, Fang, Changpeng wrote:
> >>
> >>> Hi,
> >>>
> >>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
> >>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
> >>> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c
> >>> (override_options).
> >>>
> >>> Is this OK to commit now?
> >>
> >> Ok.
> >>
> >
> > Committed r161391.
> 
> This may have regressed scimark sparse matmult by 20% on AMD Fam8
> at -O3 -ffast-math -funroll-loops -march=native.
... and also about 2.6% SPECFP regression along with 8% code size growth in 32bit mode
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/size.html

There was other changes today that might've caused the trouble.  In particular
my partial inlinig, Sebastians's ifcvt, Martin's ipa-cp improvements and Bernd's RA
changes.

Frescobaldi and Vangelish both test scimark and Frescobaldi does not show the regression
yet, tought it picked changes including my ipa-split code.  We might conclude that it
must be later patch that would leave Sebastian's and Martin's changes.  Those two seem
however unlikely candidates for such a flat differences everywhere IMO. 

I tested partial inlining in the form comitted to mainline on Frescobaldi on SPECint2000
and C++ benchmarks couple days ago and it dod not show the regressions, however I did
not test 32bit.  The change is quite target independent so I would be surprised if it
made 32bit code a lot worse and did not affect 64bit.

Some changes are also visible at our Itanium testers.

Hopefully we can resolve the slowdowns.  I also think that the change should be documented
in changes.html as well as we should enable prefetching with -fprofile-use
(and ensure that prefetcher uses profile info to figure out if loop has resonable trip
count for prefetching and is hot)

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default at -O3
  2010-06-26 17:38                           ` Jan Hubicka
@ 2010-06-26 18:23                             ` Jan Hubicka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2010-06-26 18:23 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Guenther, Sebastian Pop, Richard Guenther, Fang,
	Changpeng, Mark Mitchell, Christian Borntraeger, gcc-patches,
	H.J. Lu, Zdenek Dvorak, Maxim Kuvyrkov

> > On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> > > On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote:
> > >> On Thu, 24 Jun 2010, Fang, Changpeng wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
> > >>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
> > >>> If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c
> > >>> (override_options).
> > >>>
> > >>> Is this OK to commit now?
> > >>
> > >> Ok.
> > >>
> > >
> > > Committed r161391.
> > 
> > This may have regressed scimark sparse matmult by 20% on AMD Fam8
> > at -O3 -ffast-math -funroll-loops -march=native.
> ... and also about 2.6% SPECFP regression along with 8% code size growth in 32bit mode
> http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html
> http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/size.html

I think the reason why we see regression in 32bit run only is that 32bit run use -march=native
while 64bit run uses default arch.  So it indeed looks like prefetching change.

Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Enabling Software Prefetching by Default@-O3
  2010-06-25 12:29                     ` Christian Borntraeger
@ 2010-07-02 12:46                       ` Ulrich Weigand
  0 siblings, 0 replies; 31+ messages in thread
From: Ulrich Weigand @ 2010-07-02 12:46 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: Fang Changpeng, gcc-patches, Andreas Krebbel

Christian Borntraeger wrote:

> Looks good. As soon as this is committed, somebody  should commit the following fix
> that makes fno-prefetch-loop-arrays work again on s390.
> 
> Andreas, Ulrich. Can you have a look and apply after Changpengs patch if ok.

> 	* config/s390/s390.c (override_options): Adopt prefetching
>         at -O3 to handle flag_prefetch_loop_arrays as a tristate.

I've checked this in now ...

>     /* This cannot reside in optimization_options since HAVE_prefetch
> !      requires the arch flags to be evaluated already.  Since prefetching
> !      is benefitial on s390, we enable it if available.  */

... with this typo fixed: beneficial.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2010-07-02 12:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-15 22:21 [PATCH] Enabling Software Prefetching by Default at -O3 Fang, Changpeng
2010-06-15 22:24 ` H.J. Lu
2010-06-15 22:28   ` Fang, Changpeng
2010-06-15 22:44     ` H.J. Lu
2010-06-15 22:47       ` Fang, Changpeng
2010-06-15 23:42         ` Mark Mitchell
2010-06-16  0:30           ` H.J. Lu
2010-06-16  3:43             ` Mark Mitchell
2010-06-18  8:51               ` Christian Borntraeger
2010-06-18 18:00                 ` Fang, Changpeng
2010-06-19  0:30             ` H.J. Lu
2010-06-20  0:11               ` Christian Borntraeger
2010-06-20  2:34                 ` Mark Mitchell
2010-06-20  4:29                   ` Andi Kleen
2010-06-22 23:40                   ` Fang, Changpeng
2010-06-23  6:21                     ` Christian Borntraeger
2010-06-23  9:26                       ` Richard Guenther
2010-06-23 12:02                         ` Joseph S. Myers
2010-06-23 12:30                         ` Christian Borntraeger
2010-06-23 13:13                           ` Manuel López-Ibáñez
2010-06-24 23:37                   ` Fang, Changpeng
2010-06-25 11:48                     ` Richard Guenther
2010-06-25 20:16                       ` Sebastian Pop
2010-06-26 17:01                         ` Richard Guenther
2010-06-26 17:30                           ` Richard Guenther
2010-06-26 17:38                           ` Jan Hubicka
2010-06-26 18:23                             ` Jan Hubicka
2010-06-25 12:29                     ` Christian Borntraeger
2010-07-02 12:46                       ` [PATCH] Enabling Software Prefetching by Default@-O3 Ulrich Weigand
2010-06-18  9:13 ` [PATCH] Enabling Software Prefetching by Default at -O3 Andreas Krebbel
2010-06-18  9:26 ` Andreas Krebbel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).