Performance optimizations for Intel Core 2 and Core i7 processors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Performance optimizations for Intel Core 2 and Core i7 processors
@ 2010-05-17  7:42 Maxim Kuvyrkov
  2010-05-20 12:17 ` Steven Bosscher
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2010-05-17  7:42 UTC (permalink / raw)
  To: GCC, gcc-patches; +Cc: H.J. Lu

CodeSourcery is working on improving performance for Intel's Core 2 and 
Core i7 families of processors.

CodeSourcery plans to add support for unaligned vector instructions, to 
provide fine-tuned scheduling support and to update instruction 
selection and instruction cost models for Core i7 and Core 2 families of 
processors.

As usual, CodeSourcery will be contributing its work to GCC.  Currently, 
our target is the end of GCC 4.6 Stage1.

If your favorite benchmark significantly under-performs on Core 2 or 
Core i7 CPUs, don't hesitate asking us to take a look at it.

We appreciate Intel sponsoring this project.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-05-17  7:42 Performance optimizations for Intel Core 2 and Core i7 processors Maxim Kuvyrkov
@ 2010-05-20 12:17 ` Steven Bosscher
  2010-05-20 12:20   ` Maxim Kuvyrkov
  2010-05-21 17:19 ` Vladimir N. Makarov
  2010-08-13 19:50 ` Jack Howarth
  2 siblings, 1 reply; 7+ messages in thread
From: Steven Bosscher @ 2010-05-20 12:17 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC, gcc-patches, H.J. Lu

On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> CodeSourcery is working on improving performance for Intel's Core 2 and Core
> i7 families of processors.
>
> CodeSourcery plans to add support for unaligned vector instructions, to
> provide fine-tuned scheduling support and to update instruction selection
> and instruction cost models for Core i7 and Core 2 families of processors.
>
> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
> target is the end of GCC 4.6 Stage1.
>
> If your favorite benchmark significantly under-performs on Core 2 or Core i7
> CPUs, don't hesitate asking us to take a look at it.

I'd like to ask you to look at ffmpeg (missed core2 vectorization
opportunities), polyhedron (PR34501, like, duh! :-), and Apache
benchmark (-mtune=core2 results in lower scores).

You could check overall effects on an openly available benchmark suite
such as http://www.phoronix-test-suite.com/

Good luck with this project, it'll be great when -mtune=core2 actually
improves performance rather than degrading it!

Ciao!
Steven

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-05-20 12:17 ` Steven Bosscher
@ 2010-05-20 12:20   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2010-05-20 12:20 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC, gcc-patches, H.J. Lu

On 5/20/10 4:04 PM, Steven Bosscher wrote:
> On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and Core
>> i7 families of processors.
>>
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction selection
>> and instruction cost models for Core i7 and Core 2 families of processors.
>>
>> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
>> target is the end of GCC 4.6 Stage1.
>>
>> If your favorite benchmark significantly under-performs on Core 2 or Core i7
>> CPUs, don't hesitate asking us to take a look at it.
>
> I'd like to ask you to look at ffmpeg (missed core2 vectorization
> opportunities), polyhedron (PR34501, like, duh! :-), and Apache
> benchmark (-mtune=core2 results in lower scores).
>
> You could check overall effects on an openly available benchmark suite
> such as http://www.phoronix-test-suite.com/

Thank you for the pointers!

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-05-17  7:42 Performance optimizations for Intel Core 2 and Core i7 processors Maxim Kuvyrkov
  2010-05-20 12:17 ` Steven Bosscher
@ 2010-05-21 17:19 ` Vladimir N. Makarov
  2010-05-26 15:26   ` Maxim Kuvyrkov
  2010-08-13 19:50 ` Jack Howarth
  2 siblings, 1 reply; 7+ messages in thread
From: Vladimir N. Makarov @ 2010-05-21 17:19 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC, gcc-patches, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 2459 bytes --]

On 05/17/2010 02:44 AM, Maxim Kuvyrkov wrote:
> CodeSourcery is working on improving performance for Intel's Core 2 
> and Core i7 families of processors.
>
> CodeSourcery plans to add support for unaligned vector instructions, 
> to provide fine-tuned scheduling support and to update instruction 
> selection and instruction cost models for Core i7 and Core 2 families 
> of processors.
>
> As usual, CodeSourcery will be contributing its work to GCC.  
> Currently, our target is the end of GCC 4.6 Stage1.
>
> If your favorite benchmark significantly under-performs on Core 2 or 
> Core i7 CPUs, don't hesitate asking us to take a look at it.
What I saw is people complaining about -mtune=core2 for polyhedron

http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01272.html

The biggest complaint was on mdbx (about 16%).

   I analyzed the benchmark about 2 months ago and the patch in the
attachment solves the problem.  These parameters are close to ones in
Intel documentation (originally I used others because that time it 
worked better
for SPEC2000).

   The patch uses 4 for FADD/FSUB instead of 3 which is in Intel
documentation.  The reason for this in relation with branch cost.  The
different code is generated when <latency of FADD> = branch cost and
<latency of FADD> = branch cost + 1.  This difference is very
important for mdbx which has hot spot code like

if () a = b + c;
if () ...;
if () ...;

and the code after <if> is rarely executed. So it is important not
used conditional insns.  If branch cost is decreased, the overall
performance is worse on other benchmarks.  Changing the current code
generation model (to force generation of code as currently for
<latency of FADD> = branch cost + 1) is not safe because it might
affect other targets.

   Sometimes I have feeling that choosing parameters is close to black
magic, some benchmarks become better, other become worse.  IMHO, one
reason for this is not fully adequate code generation model.  It would be
nice to fix it but as I wrote it is too big project because it affects
other targets.

   Also I think it is important to have a pipeline description for
Core2/Core i7 or at least to use it from generic.

   I think with this patch and the right pipeline description you can
achieve better performance on most benchmarks than one for generic.

   I am glad that Intel is sponsoring this project.  I hope these notes will
be helpful to you, Maxim.  Good luck with this project.

[-- Attachment #2: core2.patch --]
[-- Type: text/plain, Size: 2013 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 157950)
+++ gcc/config/i386/i386.c	(working copy)
@@ -975,15 +975,15 @@
    COSTS_N_INSNS (3),			/*                               DI */
    COSTS_N_INSNS (3)},			/*                               other */
   0,					/* cost of multiply per each bit set */
-  {COSTS_N_INSNS (22),			/* cost of a divide/mod for QI */
-   COSTS_N_INSNS (22),			/*                          HI */
-   COSTS_N_INSNS (22),			/*                          SI */
+  {COSTS_N_INSNS (14),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (15),			/*                          HI */
+   COSTS_N_INSNS (17),			/*                          SI */
    COSTS_N_INSNS (22),			/*                          DI */
    COSTS_N_INSNS (22)},			/*                          other */
   COSTS_N_INSNS (1),			/* cost of movsx */
   COSTS_N_INSNS (1),			/* cost of movzx */
   8,					/* "large" insn */
-  16,					/* MOVE_RATIO */
+  17,					/* MOVE_RATIO */
   2,					/* cost for loading QImode using movzbl */
   {6, 6, 6},				/* cost of loading integer registers
 					   in QImode, HImode and SImode.
@@ -1010,12 +1010,12 @@
   128,					/* size of prefetch block */
   8,					/* number of parallel prefetches */
   3,					/* Branch cost */
-  COSTS_N_INSNS (3),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (4),			/* cost of FADD and FSUB insns.  */
   COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
-  COSTS_N_INSNS (32),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (6),			/* cost of FDIV instruction.  */
   COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
   COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
-  COSTS_N_INSNS (58),			/* cost of FSQRT instruction.  */
+  COSTS_N_INSNS (6),			/* cost of FSQRT instruction.  */
   {{libcall, {{11, loop}, {-1, rep_prefix_4_byte}}},
    {libcall, {{32, loop}, {64, rep_prefix_4_byte},
 	      {8192, rep_prefix_8_byte}, {-1, libcall}}}},

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-05-21 17:19 ` Vladimir N. Makarov
@ 2010-05-26 15:26   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2010-05-26 15:26 UTC (permalink / raw)
  To: Vladimir N. Makarov; +Cc: GCC, gcc-patches, H.J. Lu

On 5/21/10 9:06 PM, Vladimir N. Makarov wrote:
> On 05/17/2010 02:44 AM, Maxim Kuvyrkov wrote:
...
>> If your favorite benchmark significantly under-performs on Core 2 or
>> Core i7 CPUs, don't hesitate asking us to take a look at it.
> What I saw is people complaining about -mtune=core2 for polyhedron
>
> http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01272.html
>
> The biggest complaint was on mdbx (about 16%).

Thank you for the pointers and analysis!

...
> Also I think it is important to have a pipeline description for
> Core2/Core i7 or at least to use it from generic.

Right.  We will be adding a pipeline description for Core 2/i7.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-05-17  7:42 Performance optimizations for Intel Core 2 and Core i7 processors Maxim Kuvyrkov
  2010-05-20 12:17 ` Steven Bosscher
  2010-05-21 17:19 ` Vladimir N. Makarov
@ 2010-08-13 19:50 ` Jack Howarth
  2010-08-17 13:33   ` Maxim Kuvyrkov
  2 siblings, 1 reply; 7+ messages in thread
From: Jack Howarth @ 2010-08-13 19:50 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC, gcc-patches, H.J. Lu

On Mon, May 17, 2010 at 10:44:57AM +0400, Maxim Kuvyrkov wrote:
> CodeSourcery is working on improving performance for Intel's Core 2 and  
> Core i7 families of processors.
>
> CodeSourcery plans to add support for unaligned vector instructions, to  
> provide fine-tuned scheduling support and to update instruction  
> selection and instruction cost models for Core i7 and Core 2 families of  
> processors.
>
> As usual, CodeSourcery will be contributing its work to GCC.  Currently,  
> our target is the end of GCC 4.6 Stage1.
>
> If your favorite benchmark significantly under-performs on Core 2 or  
> Core i7 CPUs, don't hesitate asking us to take a look@it.
>
> We appreciate Intel sponsoring this project.

Maxim,
    Do you have any updates on the progress of this project? Since
it has been proposed to default intel darwin to -mtune=core2, it
would be very helpful to be able to test (benchmark) any proposed
changes on x86_64-apple-darwin10 with gcc trunk. Thanks in advance.
           Jack

>
>
> Thank you,
>
> -- 
> Maxim Kuvyrkov
> CodeSourcery
> maxim@codesourcery.com
> (650) 331-3385 x724

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance optimizations for Intel Core 2 and Core i7 processors
  2010-08-13 19:50 ` Jack Howarth
@ 2010-08-17 13:33   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2010-08-17 13:33 UTC (permalink / raw)
  To: Jack Howarth; +Cc: GCC, gcc-patches, H.J. Lu

On 8/13/10 11:40 PM, Jack Howarth wrote:
> On Mon, May 17, 2010 at 10:44:57AM +0400, Maxim Kuvyrkov wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and
>> Core i7 families of processors.
>>
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction
>> selection and instruction cost models for Core i7 and Core 2 families of
>> processors.
>>
>> As usual, CodeSourcery will be contributing its work to GCC.  Currently,
>> our target is the end of GCC 4.6 Stage1.
>>
>> If your favorite benchmark significantly under-performs on Core 2 or
>> Core i7 CPUs, don't hesitate asking us to take a look@it.
>>
>> We appreciate Intel sponsoring this project.
>
> Maxim,
>      Do you have any updates on the progress of this project? Since
> it has been proposed to default intel darwin to -mtune=core2, it
> would be very helpful to be able to test (benchmark) any proposed
> changes on x86_64-apple-darwin10 with gcc trunk. Thanks in advance.

Jack,

We will start posting patches very soon.  Bernd Schmidt has almost 
finished pipeline model for Core 2/i7, so that will be the first piece 
of work we'll post for upstream review.

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-08-17 13:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-17  7:42 Performance optimizations for Intel Core 2 and Core i7 processors Maxim Kuvyrkov
2010-05-20 12:17 ` Steven Bosscher
2010-05-20 12:20   ` Maxim Kuvyrkov
2010-05-21 17:19 ` Vladimir N. Makarov
2010-05-26 15:26   ` Maxim Kuvyrkov
2010-08-13 19:50 ` Jack Howarth
2010-08-17 13:33   ` Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).