public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Update x86-tune-costs.h for znver2
@ 2019-07-23  9:34 Jan Hubicka
  2019-07-30  8:10 ` Jan Hubicka
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hubicka @ 2019-07-23  9:34 UTC (permalink / raw)
  To: gcc-patches

Hi,
this patch updates znver2 costs to match reality.  In particular we
re-benchmarked memcpy strategies and it looks that glibc now wins even
for relatively small blocks. 
Moreover I updated costs of moves to reflect that znver2 has 256 vector
paths and faster multiplication.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

	* x86-tune-costs.h (znver2_memcpy): Update.
	(znver2_costs): Update 256 bit SSE costs and multiplication.
Index: config/i386/x86-tune-costs.h
===================================================================
--- config/i386/x86-tune-costs.h	(revision 273727)
+++ config/i386/x86-tune-costs.h	(working copy)
@@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
 static stringop_algs znver2_memcpy[2] = {
   {libcall, {{6, loop, false}, {14, unrolled_loop, false},
 	     {-1, rep_prefix_4_byte, false}}},
-  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
+  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
 	     {-1, libcall, false}}}};
 static stringop_algs znver2_memset[2] = {
   {libcall, {{8, loop, false}, {24, unrolled_loop, false},
 	     {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
-  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
+  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
 	     {-1, libcall, false}}}};
 
 struct processor_costs znver2_cost = {
@@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
 					   in SImode and DImode.  */
   {8, 8},				/* cost of storing MMX registers
 					   in SImode and DImode.  */
-  2, 3, 6,				/* cost of moving XMM,YMM,ZMM
+  2, 2, 3,				/* cost of moving XMM,YMM,ZMM
 					   register.  */
-  {6, 6, 6, 10, 20},			/* cost of loading SSE registers
+  {6, 6, 6, 6, 12},			/* cost of loading SSE registers
 					   in 32,64,128,256 and 512-bit.  */
-  {6, 6, 6, 10, 20},			/* cost of unaligned loads.  */
+  {6, 6, 6, 6, 12},			/* cost of unaligned loads.  */
   {8, 8, 8, 8, 16},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit.  */
   {8, 8, 8, 8, 16},			/* cost of unaligned stores.  */
@@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
   COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
   COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
   COSTS_N_INSNS (3),			/* cost of MULSS instruction.  */
-  COSTS_N_INSNS (4),			/* cost of MULSD instruction.  */
+  COSTS_N_INSNS (3),			/* cost of MULSD instruction.  */
   COSTS_N_INSNS (5),			/* cost of FMA SS instruction.  */
   COSTS_N_INSNS (5),			/* cost of FMA SD instruction.  */
   COSTS_N_INSNS (10),			/* cost of DIVSS instruction.  */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Update x86-tune-costs.h for znver2
  2019-07-23  9:34 Update x86-tune-costs.h for znver2 Jan Hubicka
@ 2019-07-30  8:10 ` Jan Hubicka
  2019-07-30  8:44   ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hubicka @ 2019-07-30  8:10 UTC (permalink / raw)
  To: gcc-patches

> Hi,
> this patch updates znver2 costs to match reality.  In particular we
> re-benchmarked memcpy strategies and it looks that glibc now wins even
> for relatively small blocks. 
> Moreover I updated costs of moves to reflect that znver2 has 256 vector
> paths and faster multiplication.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
> 	* x86-tune-costs.h (znver2_memcpy): Update.
> 	(znver2_costs): Update 256 bit SSE costs and multiplication.

Hello,
I have now backported the patch to gcc 9 branch.

Honza
> Index: config/i386/x86-tune-costs.h
> ===================================================================
> --- config/i386/x86-tune-costs.h	(revision 273727)
> +++ config/i386/x86-tune-costs.h	(working copy)
> @@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
>  static stringop_algs znver2_memcpy[2] = {
>    {libcall, {{6, loop, false}, {14, unrolled_loop, false},
>  	     {-1, rep_prefix_4_byte, false}}},
> -  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
> +  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
>  	     {-1, libcall, false}}}};
>  static stringop_algs znver2_memset[2] = {
>    {libcall, {{8, loop, false}, {24, unrolled_loop, false},
>  	     {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> -  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
> +  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
>  	     {-1, libcall, false}}}};
>  
>  struct processor_costs znver2_cost = {
> @@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
>  					   in SImode and DImode.  */
>    {8, 8},				/* cost of storing MMX registers
>  					   in SImode and DImode.  */
> -  2, 3, 6,				/* cost of moving XMM,YMM,ZMM
> +  2, 2, 3,				/* cost of moving XMM,YMM,ZMM
>  					   register.  */
> -  {6, 6, 6, 10, 20},			/* cost of loading SSE registers
> +  {6, 6, 6, 6, 12},			/* cost of loading SSE registers
>  					   in 32,64,128,256 and 512-bit.  */
> -  {6, 6, 6, 10, 20},			/* cost of unaligned loads.  */
> +  {6, 6, 6, 6, 12},			/* cost of unaligned loads.  */
>    {8, 8, 8, 8, 16},			/* cost of storing SSE registers
>  					   in 32,64,128,256 and 512-bit.  */
>    {8, 8, 8, 8, 16},			/* cost of unaligned stores.  */
> @@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
>    COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
>    COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
>    COSTS_N_INSNS (3),			/* cost of MULSS instruction.  */
> -  COSTS_N_INSNS (4),			/* cost of MULSD instruction.  */
> +  COSTS_N_INSNS (3),			/* cost of MULSD instruction.  */
>    COSTS_N_INSNS (5),			/* cost of FMA SS instruction.  */
>    COSTS_N_INSNS (5),			/* cost of FMA SD instruction.  */
>    COSTS_N_INSNS (10),			/* cost of DIVSS instruction.  */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Update x86-tune-costs.h for znver2
  2019-07-30  8:10 ` Jan Hubicka
@ 2019-07-30  8:44   ` Richard Biener
  2019-07-30  9:53     ` Jan Hubicka
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2019-07-30  8:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > Hi,
> > this patch updates znver2 costs to match reality.  In particular we
> > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > for relatively small blocks.
> > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > paths and faster multiplication.
> >
> > Bootstrapped/regtested x86_64-linux, comitted.
> >
> > Honza
> >
> >       * x86-tune-costs.h (znver2_memcpy): Update.
> >       (znver2_costs): Update 256 bit SSE costs and multiplication.
>
> Hello,
> I have now backported the patch to gcc 9 branch.

Thanks - can you please update changes.html for it in the 9.2 section?

Richard.

> Honza
> > Index: config/i386/x86-tune-costs.h
> > ===================================================================
> > --- config/i386/x86-tune-costs.h      (revision 273727)
> > +++ config/i386/x86-tune-costs.h      (working copy)
> > @@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
> >  static stringop_algs znver2_memcpy[2] = {
> >    {libcall, {{6, loop, false}, {14, unrolled_loop, false},
> >            {-1, rep_prefix_4_byte, false}}},
> > -  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
> > +  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
> >            {-1, libcall, false}}}};
> >  static stringop_algs znver2_memset[2] = {
> >    {libcall, {{8, loop, false}, {24, unrolled_loop, false},
> >            {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> > -  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
> > +  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
> >            {-1, libcall, false}}}};
> >
> >  struct processor_costs znver2_cost = {
> > @@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
> >                                          in SImode and DImode.  */
> >    {8, 8},                            /* cost of storing MMX registers
> >                                          in SImode and DImode.  */
> > -  2, 3, 6,                           /* cost of moving XMM,YMM,ZMM
> > +  2, 2, 3,                           /* cost of moving XMM,YMM,ZMM
> >                                          register.  */
> > -  {6, 6, 6, 10, 20},                 /* cost of loading SSE registers
> > +  {6, 6, 6, 6, 12},                  /* cost of loading SSE registers
> >                                          in 32,64,128,256 and 512-bit.  */
> > -  {6, 6, 6, 10, 20},                 /* cost of unaligned loads.  */
> > +  {6, 6, 6, 6, 12},                  /* cost of unaligned loads.  */
> >    {8, 8, 8, 8, 16},                  /* cost of storing SSE registers
> >                                          in 32,64,128,256 and 512-bit.  */
> >    {8, 8, 8, 8, 16},                  /* cost of unaligned stores.  */
> > @@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
> >    COSTS_N_INSNS (1),                 /* cost of cheap SSE instruction.  */
> >    COSTS_N_INSNS (3),                 /* cost of ADDSS/SD SUBSS/SD insns.  */
> >    COSTS_N_INSNS (3),                 /* cost of MULSS instruction.  */
> > -  COSTS_N_INSNS (4),                 /* cost of MULSD instruction.  */
> > +  COSTS_N_INSNS (3),                 /* cost of MULSD instruction.  */
> >    COSTS_N_INSNS (5),                 /* cost of FMA SS instruction.  */
> >    COSTS_N_INSNS (5),                 /* cost of FMA SD instruction.  */
> >    COSTS_N_INSNS (10),                        /* cost of DIVSS instruction.  */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Update x86-tune-costs.h for znver2
  2019-07-30  8:44   ` Richard Biener
@ 2019-07-30  9:53     ` Jan Hubicka
  2019-07-30  9:58       ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hubicka @ 2019-07-30  9:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

> On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > Hi,
> > > this patch updates znver2 costs to match reality.  In particular we
> > > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > > for relatively small blocks.
> > > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > > paths and faster multiplication.
> > >
> > > Bootstrapped/regtested x86_64-linux, comitted.
> > >
> > > Honza
> > >
> > >       * x86-tune-costs.h (znver2_memcpy): Update.
> > >       (znver2_costs): Update 256 bit SSE costs and multiplication.
> >
> > Hello,
> > I have now backported the patch to gcc 9 branch.
> 
> Thanks - can you please update changes.html for it in the 9.2 section?

There seems to be no GCC 9.2 section yet.

Index: gcc-9/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.72
diff -c -3 -p -r1.72 changes.html
*** gcc-9/changes.html	12 Jul 2019 15:55:50 -0000	1.72
--- gcc-9/changes.html	30 Jul 2019 09:32:17 -0000
*************** complete (that is, it is possible that s
*** 1095,1099 ****
--- 1095,1105 ----
  are not listed here).</p>
  
  <!-- .................................................................. -->
+ <h2 id="GCC9.2">GCC 9.2</h2>
+ <ul>
+   <li>IA-32/x86-64 backend tuning for <code>znver2</code> was improved based on benchmarks on real hardware.</li>
+ </uL>
+ 
+ <!-- .................................................................. -->
  </body>
  </html>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Update x86-tune-costs.h for znver2
  2019-07-30  9:53     ` Jan Hubicka
@ 2019-07-30  9:58       ` Richard Biener
  2019-08-18 10:36         ` Gerald Pfeifer
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2019-07-30  9:58 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Tue, Jul 30, 2019 at 11:33 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > > Hi,
> > > > this patch updates znver2 costs to match reality.  In particular we
> > > > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > > > for relatively small blocks.
> > > > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > > > paths and faster multiplication.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, comitted.
> > > >
> > > > Honza
> > > >
> > > >       * x86-tune-costs.h (znver2_memcpy): Update.
> > > >       (znver2_costs): Update 256 bit SSE costs and multiplication.
> > >
> > > Hello,
> > > I have now backported the patch to gcc 9 branch.
> >
> > Thanks - can you please update changes.html for it in the 9.2 section?
>
> There seems to be no GCC 9.2 section yet.

Yes.  Looks good to me btw.

Richard.

> Index: gcc-9/changes.html
> ===================================================================
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
> retrieving revision 1.72
> diff -c -3 -p -r1.72 changes.html
> *** gcc-9/changes.html  12 Jul 2019 15:55:50 -0000      1.72
> --- gcc-9/changes.html  30 Jul 2019 09:32:17 -0000
> *************** complete (that is, it is possible that s
> *** 1095,1099 ****
> --- 1095,1105 ----
>   are not listed here).</p>
>
>   <!-- .................................................................. -->
> + <h2 id="GCC9.2">GCC 9.2</h2>
> + <ul>
> +   <li>IA-32/x86-64 backend tuning for <code>znver2</code> was improved based on benchmarks on real hardware.</li>
> + </uL>
> +
> + <!-- .................................................................. -->
>   </body>
>   </html>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Update x86-tune-costs.h for znver2
  2019-07-30  9:58       ` Richard Biener
@ 2019-08-18 10:36         ` Gerald Pfeifer
  0 siblings, 0 replies; 6+ messages in thread
From: Gerald Pfeifer @ 2019-08-18 10:36 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: gcc-patches

On Tue, 30 Jul 2019, Jan Hubicka wrote:
>> Thanks - can you please update changes.html for it in the 9.2 section?
> There seems to be no GCC 9.2 section yet.

I see one now. 

On Tue, 30 Jul 2019, Richard Biener wrote:
> Yes.  Looks good to me btw.

Same here.  (I would have taken Richard's note as approval, though
as maintainer over that area you don't even need any.)

For the benefit of the doubt, though: okay, thanks. :-)

Gerald

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-18  8:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23  9:34 Update x86-tune-costs.h for znver2 Jan Hubicka
2019-07-30  8:10 ` Jan Hubicka
2019-07-30  8:44   ` Richard Biener
2019-07-30  9:53     ` Jan Hubicka
2019-07-30  9:58       ` Richard Biener
2019-08-18 10:36         ` Gerald Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).