public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jan Hubicka <hubicka@ucw.cz>
To: gcc-patches@gcc.gnu.org
Subject: generic retuning part 1 - x86-tune-costs update
Date: Thu, 30 Nov 2017 09:54:00 -0000	[thread overview]
Message-ID: <20171130094030.GA2770@kam.mff.cuni.cz> (raw)

Hi,
this patch makes costs in generic to math better modern chips (core, haswell,
buldozer and zen).  The only important change is to drop cost of unaligned loads
and stores becuase all modern chips handle it well.  This makes vectorizer to
not peel for alignment and saves a lot of code size without sacrifying
performance.

I have benchmarked it on zen and skylake and it is small but almost consistent
win in performance too.  Notable regression is fma3d regressing aprox. 5%
on zen.  This is the case of native tuning as well, so I will look into it
incrementally.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

	PR target/81616
	* x86-tnue-costs.h (generic_cost): Revise for modern CPUs
	* gcc.target/i386/l_fma_double_1.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_2.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_3.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_4.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_5.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_6.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_1.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_2.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_3.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_4.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_5.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_6.c: Update count of fma instructions.
Index: config/i386/x86-tune-costs.h
===================================================================
--- config/i386/x86-tune-costs.h	(revision 255252)
+++ config/i386/x86-tune-costs.h	(working copy)
@@ -2243,11 +2243,11 @@ struct processor_costs generic_cost = {
    COSTS_N_INSNS (4),			/*				 HI */
    COSTS_N_INSNS (3),			/*				 SI */
    COSTS_N_INSNS (4),			/*				 DI */
-   COSTS_N_INSNS (2)},			/*			      other */
+   COSTS_N_INSNS (4)},			/*			      other */
   0,					/* cost of multiply per each bit set */
-  {COSTS_N_INSNS (18),			/* cost of a divide/mod for QI */
-   COSTS_N_INSNS (26),			/*			    HI */
-   COSTS_N_INSNS (42),			/*			    SI */
+  {COSTS_N_INSNS (16),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (22),			/*			    HI */
+   COSTS_N_INSNS (30),			/*			    SI */
    COSTS_N_INSNS (74),			/*			    DI */
    COSTS_N_INSNS (74)},			/*			    other */
   COSTS_N_INSNS (1),			/* cost of movsx */
@@ -2275,13 +2275,13 @@ struct processor_costs generic_cost = {
   2, 3, 4,				/* cost of moving XMM,YMM,ZMM register */
   {6, 6, 6, 10, 15},			/* cost of loading SSE registers
 					   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},			/* cost of unaligned loads.  */
+  {6, 6, 6, 10, 15},			/* cost of unaligned loads.  */
   {6, 6, 6, 10, 15},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},			/* cost of unaligned storess.  */
-  20, 20,				/* SSE->integer and integer->SSE moves */
-  6, 6,					/* Gather load static, per_elt.  */
-  6, 6,					/* Gather store static, per_elt.  */
+  {6, 6, 6, 10, 15},			/* cost of unaligned storess.  */
+  6, 6,					/* SSE->integer and integer->SSE moves */
+  18, 6,				/* Gather load static, per_elt.  */
+  18, 6,				/* Gather store static, per_elt.  */
   32,					/* size of l1 cache.  */
   512,					/* size of l2 cache.  */
   64,					/* size of prefetch block */
@@ -2290,11 +2290,11 @@ struct processor_costs generic_cost = {
      value is increased to perhaps more appropriate value of 5.  */
   3,					/* Branch cost */
   COSTS_N_INSNS (3),			/* cost of FADD and FSUB insns.  */
-  COSTS_N_INSNS (3),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
   COSTS_N_INSNS (20),			/* cost of FDIV instruction.  */
   COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
   COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
-  COSTS_N_INSNS (40),			/* cost of FSQRT instruction.  */
+  COSTS_N_INSNS (20),			/* cost of FSQRT instruction.  */
 
   COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
   COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
@@ -2306,7 +2306,7 @@ struct processor_costs generic_cost = {
   COSTS_N_INSNS (32),			/* cost of DIVSD instruction.  */
   COSTS_N_INSNS (30),			/* cost of SQRTSS instruction.  */
   COSTS_N_INSNS (58),			/* cost of SQRTSD instruction.  */
-  1, 2, 1, 1,				/* reassoc int, fp, vec_int, vec_fp.  */
+  1, 4, 3, 3,				/* reassoc int, fp, vec_int, vec_fp.  */
   generic_memcpy,
   generic_memset,
   COSTS_N_INSNS (3),			/* cond_taken_branch_cost.  */
Index: testsuite/gcc.target/i386/l_fma_double_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_1.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_1.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_2.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_2.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_3.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_3.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_4.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_4.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_5.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_5.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_6.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_6.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_float_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_1.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_1.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_2.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_2.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_3.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_3.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_4.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_4.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_5.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_5.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_6.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_6.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */

             reply	other threads:[~2017-11-30  9:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30  9:54 Jan Hubicka [this message]
2017-11-30 11:03 ` Richard Biener
2017-11-30 15:09   ` Jan Hubicka
2017-11-30 18:20     ` Jan Hubicka
2017-11-30 18:43       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171130094030.GA2770@kam.mff.cuni.cz \
    --to=hubicka@ucw.cz \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).