public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* generic retuning part 1 - x86-tune-costs update
@ 2017-11-30  9:54 Jan Hubicka
  2017-11-30 11:03 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2017-11-30  9:54 UTC (permalink / raw)
  To: gcc-patches

Hi,
this patch makes costs in generic to math better modern chips (core, haswell,
buldozer and zen).  The only important change is to drop cost of unaligned loads
and stores becuase all modern chips handle it well.  This makes vectorizer to
not peel for alignment and saves a lot of code size without sacrifying
performance.

I have benchmarked it on zen and skylake and it is small but almost consistent
win in performance too.  Notable regression is fma3d regressing aprox. 5%
on zen.  This is the case of native tuning as well, so I will look into it
incrementally.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

	PR target/81616
	* x86-tnue-costs.h (generic_cost): Revise for modern CPUs
	* gcc.target/i386/l_fma_double_1.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_2.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_3.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_4.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_5.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_double_6.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_1.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_2.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_3.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_4.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_5.c: Update count of fma instructions.
	* gcc.target/i386/l_fma_float_6.c: Update count of fma instructions.
Index: config/i386/x86-tune-costs.h
===================================================================
--- config/i386/x86-tune-costs.h	(revision 255252)
+++ config/i386/x86-tune-costs.h	(working copy)
@@ -2243,11 +2243,11 @@ struct processor_costs generic_cost = {
    COSTS_N_INSNS (4),			/*				 HI */
    COSTS_N_INSNS (3),			/*				 SI */
    COSTS_N_INSNS (4),			/*				 DI */
-   COSTS_N_INSNS (2)},			/*			      other */
+   COSTS_N_INSNS (4)},			/*			      other */
   0,					/* cost of multiply per each bit set */
-  {COSTS_N_INSNS (18),			/* cost of a divide/mod for QI */
-   COSTS_N_INSNS (26),			/*			    HI */
-   COSTS_N_INSNS (42),			/*			    SI */
+  {COSTS_N_INSNS (16),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (22),			/*			    HI */
+   COSTS_N_INSNS (30),			/*			    SI */
    COSTS_N_INSNS (74),			/*			    DI */
    COSTS_N_INSNS (74)},			/*			    other */
   COSTS_N_INSNS (1),			/* cost of movsx */
@@ -2275,13 +2275,13 @@ struct processor_costs generic_cost = {
   2, 3, 4,				/* cost of moving XMM,YMM,ZMM register */
   {6, 6, 6, 10, 15},			/* cost of loading SSE registers
 					   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},			/* cost of unaligned loads.  */
+  {6, 6, 6, 10, 15},			/* cost of unaligned loads.  */
   {6, 6, 6, 10, 15},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},			/* cost of unaligned storess.  */
-  20, 20,				/* SSE->integer and integer->SSE moves */
-  6, 6,					/* Gather load static, per_elt.  */
-  6, 6,					/* Gather store static, per_elt.  */
+  {6, 6, 6, 10, 15},			/* cost of unaligned storess.  */
+  6, 6,					/* SSE->integer and integer->SSE moves */
+  18, 6,				/* Gather load static, per_elt.  */
+  18, 6,				/* Gather store static, per_elt.  */
   32,					/* size of l1 cache.  */
   512,					/* size of l2 cache.  */
   64,					/* size of prefetch block */
@@ -2290,11 +2290,11 @@ struct processor_costs generic_cost = {
      value is increased to perhaps more appropriate value of 5.  */
   3,					/* Branch cost */
   COSTS_N_INSNS (3),			/* cost of FADD and FSUB insns.  */
-  COSTS_N_INSNS (3),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
   COSTS_N_INSNS (20),			/* cost of FDIV instruction.  */
   COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
   COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
-  COSTS_N_INSNS (40),			/* cost of FSQRT instruction.  */
+  COSTS_N_INSNS (20),			/* cost of FSQRT instruction.  */
 
   COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
   COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
@@ -2306,7 +2306,7 @@ struct processor_costs generic_cost = {
   COSTS_N_INSNS (32),			/* cost of DIVSD instruction.  */
   COSTS_N_INSNS (30),			/* cost of SQRTSS instruction.  */
   COSTS_N_INSNS (58),			/* cost of SQRTSD instruction.  */
-  1, 2, 1, 1,				/* reassoc int, fp, vec_int, vec_fp.  */
+  1, 4, 3, 3,				/* reassoc int, fp, vec_int, vec_fp.  */
   generic_memcpy,
   generic_memset,
   COSTS_N_INSNS (3),			/* cond_taken_branch_cost.  */
Index: testsuite/gcc.target/i386/l_fma_double_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_1.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_1.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_2.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_2.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_3.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_3.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_4.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_4.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_5.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_5.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_double_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_6.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_double_6.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
Index: testsuite/gcc.target/i386/l_fma_float_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_1.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_1.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_2.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_2.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_3.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_3.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_4.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_4.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_5.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_5.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
Index: testsuite/gcc.target/i386/l_fma_float_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_6.c	(revision 255252)
+++ testsuite/gcc.target/i386/l_fma_float_6.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: generic retuning part 1 - x86-tune-costs update
  2017-11-30  9:54 generic retuning part 1 - x86-tune-costs update Jan Hubicka
@ 2017-11-30 11:03 ` Richard Biener
  2017-11-30 15:09   ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2017-11-30 11:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Thu, Nov 30, 2017 at 10:40 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch makes costs in generic to math better modern chips (core, haswell,
> buldozer and zen).  The only important change is to drop cost of unaligned loads
> and stores becuase all modern chips handle it well.  This makes vectorizer to
> not peel for alignment and saves a lot of code size without sacrifying
> performance.
>
> I have benchmarked it on zen and skylake and it is small but almost consistent
> win in performance too.  Notable regression is fma3d regressing aprox. 5%
> on zen.  This is the case of native tuning as well, so I will look into it
> incrementally.
>
> Bootstrapped/regtested x86_64-linux, comitted.

The question is how we cost such things as store bandwith where IIRC
an unaligned store counts 'two' entries in the pipelines store buffers.
Likewise unaligned loads do usually still have a penalty.

What changed is that when the loads/stores happen to be aligned
using the unaligned instruction variant doesn't have a penalty.

So I'm not sure peeling for alignmend isn't a win, it just depends more
on the number of memory streams involved.

Richard.

> Honza
>
>         PR target/81616
>         * x86-tnue-costs.h (generic_cost): Revise for modern CPUs
>         * gcc.target/i386/l_fma_double_1.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_double_2.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_double_3.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_double_4.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_double_5.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_double_6.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_1.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_2.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_3.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_4.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_5.c: Update count of fma instructions.
>         * gcc.target/i386/l_fma_float_6.c: Update count of fma instructions.
> Index: config/i386/x86-tune-costs.h
> ===================================================================
> --- config/i386/x86-tune-costs.h        (revision 255252)
> +++ config/i386/x86-tune-costs.h        (working copy)
> @@ -2243,11 +2243,11 @@ struct processor_costs generic_cost = {
>     COSTS_N_INSNS (4),                  /*                               HI */
>     COSTS_N_INSNS (3),                  /*                               SI */
>     COSTS_N_INSNS (4),                  /*                               DI */
> -   COSTS_N_INSNS (2)},                 /*                            other */
> +   COSTS_N_INSNS (4)},                 /*                            other */
>    0,                                   /* cost of multiply per each bit set */
> -  {COSTS_N_INSNS (18),                 /* cost of a divide/mod for QI */
> -   COSTS_N_INSNS (26),                 /*                          HI */
> -   COSTS_N_INSNS (42),                 /*                          SI */
> +  {COSTS_N_INSNS (16),                 /* cost of a divide/mod for QI */
> +   COSTS_N_INSNS (22),                 /*                          HI */
> +   COSTS_N_INSNS (30),                 /*                          SI */
>     COSTS_N_INSNS (74),                 /*                          DI */
>     COSTS_N_INSNS (74)},                        /*                          other */
>    COSTS_N_INSNS (1),                   /* cost of movsx */
> @@ -2275,13 +2275,13 @@ struct processor_costs generic_cost = {
>    2, 3, 4,                             /* cost of moving XMM,YMM,ZMM register */
>    {6, 6, 6, 10, 15},                   /* cost of loading SSE registers
>                                            in 32,64,128,256 and 512-bit */
> -  {10, 10, 10, 15, 20},                        /* cost of unaligned loads.  */
> +  {6, 6, 6, 10, 15},                   /* cost of unaligned loads.  */
>    {6, 6, 6, 10, 15},                   /* cost of storing SSE registers
>                                            in 32,64,128,256 and 512-bit */
> -  {10, 10, 10, 15, 20},                        /* cost of unaligned storess.  */
> -  20, 20,                              /* SSE->integer and integer->SSE moves */
> -  6, 6,                                        /* Gather load static, per_elt.  */
> -  6, 6,                                        /* Gather store static, per_elt.  */
> +  {6, 6, 6, 10, 15},                   /* cost of unaligned storess.  */
> +  6, 6,                                        /* SSE->integer and integer->SSE moves */
> +  18, 6,                               /* Gather load static, per_elt.  */
> +  18, 6,                               /* Gather store static, per_elt.  */
>    32,                                  /* size of l1 cache.  */
>    512,                                 /* size of l2 cache.  */
>    64,                                  /* size of prefetch block */
> @@ -2290,11 +2290,11 @@ struct processor_costs generic_cost = {
>       value is increased to perhaps more appropriate value of 5.  */
>    3,                                   /* Branch cost */
>    COSTS_N_INSNS (3),                   /* cost of FADD and FSUB insns.  */
> -  COSTS_N_INSNS (3),                   /* cost of FMUL instruction.  */
> +  COSTS_N_INSNS (5),                   /* cost of FMUL instruction.  */
>    COSTS_N_INSNS (20),                  /* cost of FDIV instruction.  */
>    COSTS_N_INSNS (1),                   /* cost of FABS instruction.  */
>    COSTS_N_INSNS (1),                   /* cost of FCHS instruction.  */
> -  COSTS_N_INSNS (40),                  /* cost of FSQRT instruction.  */
> +  COSTS_N_INSNS (20),                  /* cost of FSQRT instruction.  */
>
>    COSTS_N_INSNS (1),                   /* cost of cheap SSE instruction.  */
>    COSTS_N_INSNS (3),                   /* cost of ADDSS/SD SUBSS/SD insns.  */
> @@ -2306,7 +2306,7 @@ struct processor_costs generic_cost = {
>    COSTS_N_INSNS (32),                  /* cost of DIVSD instruction.  */
>    COSTS_N_INSNS (30),                  /* cost of SQRTSS instruction.  */
>    COSTS_N_INSNS (58),                  /* cost of SQRTSD instruction.  */
> -  1, 2, 1, 1,                          /* reassoc int, fp, vec_int, vec_fp.  */
> +  1, 4, 3, 3,                          /* reassoc int, fp, vec_int, vec_fp.  */
>    generic_memcpy,
>    generic_memset,
>    COSTS_N_INSNS (3),                   /* cond_taken_branch_cost.  */
> Index: testsuite/gcc.target/i386/l_fma_double_1.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_1.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_1.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_2.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_2.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_2.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_3.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_3.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_3.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_4.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_4.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_4.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_5.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_5.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_5.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_6.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_6.c  (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_double_6.c  (working copy)
> @@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 32 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 32 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_1.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_1.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_1.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_2.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_2.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_2.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_3.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_3.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_3.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_4.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_4.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_4.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_5.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_5.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_5.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_6.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_6.c   (revision 255252)
> +++ testsuite/gcc.target/i386/l_fma_float_6.c   (working copy)
> @@ -12,7 +12,7 @@
>  /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
>  /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
> +/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 64 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 64 } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: generic retuning part 1 - x86-tune-costs update
  2017-11-30 11:03 ` Richard Biener
@ 2017-11-30 15:09   ` Jan Hubicka
  2017-11-30 18:20     ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2017-11-30 15:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

> On Thu, Nov 30, 2017 at 10:40 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> > Hi,
> > this patch makes costs in generic to math better modern chips (core, haswell,
> > buldozer and zen).  The only important change is to drop cost of unaligned loads
> > and stores becuase all modern chips handle it well.  This makes vectorizer to
> > not peel for alignment and saves a lot of code size without sacrifying
> > performance.
> >
> > I have benchmarked it on zen and skylake and it is small but almost consistent
> > win in performance too.  Notable regression is fma3d regressing aprox. 5%
> > on zen.  This is the case of native tuning as well, so I will look into it
> > incrementally.
> >
> > Bootstrapped/regtested x86_64-linux, comitted.
> 
> The question is how we cost such things as store bandwith where IIRC
> an unaligned store counts 'two' entries in the pipelines store buffers.
> Likewise unaligned loads do usually still have a penalty.
> 
> What changed is that when the loads/stores happen to be aligned
> using the unaligned instruction variant doesn't have a penalty.
> 
> So I'm not sure peeling for alignmend isn't a win, it just depends more
> on the number of memory streams involved.

I have benchmarked this quite thoroughly while switching the defaults for Core
and Zen.  Disabling alignment is pretty much consistent for specfp/int for
2000,2006 and 2017 (last is Zen only) and quite consistent one.  I will analyze
fma3d. It did not show with the alignment change alone, so it may be related
to other costs or just bad luck.

For sure, there may be specific loops where alignment wins, but it seems bad idea
to enable it by default just in case it is needed when it makes regression on
SPECfp.  Lets see if we can identify them and be more careful about alignment
decision.

I also checked that disabling alingment prologues is win even for string
operations and there it is disabled for couple releases already. (I think
since initial tuning for core and Buldozer came in)
Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: generic retuning part 1 - x86-tune-costs update
  2017-11-30 15:09   ` Jan Hubicka
@ 2017-11-30 18:20     ` Jan Hubicka
  2017-11-30 18:43       ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2017-11-30 18:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

> 
> I have benchmarked this quite thoroughly while switching the defaults for Core
> and Zen.  Disabling alignment is pretty much consistent for specfp/int for
> 2000,2006 and 2017 (last is Zen only) and quite consistent one.  I will analyze
> fma3d. It did not show with the alignment change alone, so it may be related
> to other costs or just bad luck.

I have opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83232 for that.
The problem is that SLP gives up vectorization when it sees vectorized used
in the same basic block. Preivously there was alignment prologue while now
we fully unroll the loop. Missed SLP introduce memory mismatch stall.

I am bit lost on why SLP gives up though.

Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: generic retuning part 1 - x86-tune-costs update
  2017-11-30 18:20     ` Jan Hubicka
@ 2017-11-30 18:43       ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2017-11-30 18:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On November 30, 2017 7:14:12 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 
>> I have benchmarked this quite thoroughly while switching the defaults
>for Core
>> and Zen.  Disabling alignment is pretty much consistent for
>specfp/int for
>> 2000,2006 and 2017 (last is Zen only) and quite consistent one.  I
>will analyze
>> fma3d. It did not show with the alignment change alone, so it may be
>related
>> to other costs or just bad luck.
>
>I have opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83232 for
>that.
>The problem is that SLP gives up vectorization when it sees vectorized
>used
>in the same basic block. Preivously there was alignment prologue while
>now
>we fully unroll the loop. Missed SLP introduce memory mismatch stall.
>
>I am bit lost on why SLP gives up though.

I will have a look - it shouldn't give up so easily. Maybe it's a costing issue. 

Richard. 

>Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-30 18:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30  9:54 generic retuning part 1 - x86-tune-costs update Jan Hubicka
2017-11-30 11:03 ` Richard Biener
2017-11-30 15:09   ` Jan Hubicka
2017-11-30 18:20     ` Jan Hubicka
2017-11-30 18:43       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).