public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Return mask <-> integer cost for non-AVX512 micro-architecture.
@ 2020-09-15  3:00 Hongtao Liu
  2020-09-15  8:17 ` Uros Bizjak
  0 siblings, 1 reply; 2+ messages in thread
From: Hongtao Liu @ 2020-09-15  3:00 UTC (permalink / raw)
  To: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

Hi:
  This patch would avoid spill gprs to mask registers for non-AVX512
micro-architecture and fix regression in PR96744.

  Bootstrap is ok, regression test for i386/x86-64 backend is ok.
  No big performance impact on SPEC2017.

gcc/ChangeLog:

        PR taregt/96744
        * config/i386/x86-tune-costs.h (struct processor_costs):
        Increase mask <-> integer cost for non AVX512 target to avoid
        spill gpr to mask. Also retune mask <-> integer and
        mask_load/store for skylake_cost.


-- 
BR,
Hongtao

[-- Attachment #2: 0001-Retune-mask-integer-cost-for-non-AVX512-micro-archit.patch --]
[-- Type: text/x-patch, Size: 14000 bytes --]

From 66549572467fe5dc5c4221e7857f3051d4f51554 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 24 Aug 2020 20:36:52 +0800
Subject: [PATCH] Retune mask <->integer cost for non-AVX512
 micro-architecture.

gcc/ChangeLog:

	PR taregt/96744
	* config/i386/x86-tune-costs.h (struct processor_costs):
	Increase mask <-> integer cost for non AVX512 target to avoid
	spill gpr to mask. Also retune mask <-> integer and
	mask_load/store for skylake_cost.
---
 gcc/config/i386/x86-tune-costs.h | 88 ++++++++++++++++----------------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index a782a9dd9e3..0ad4b28903c 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -58,8 +58,8 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */
 					   in 32,64,128,256 and 512-bit */
   {3, 3, 3, 3, 3},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {2, 2, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 2, 2},				/* cost if storing mask register
@@ -169,8 +169,8 @@ struct processor_costs i386_cost = {	/* 386 specific costs */
 					   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {2, 4, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 4, 2},				/* cost if storing mask register
@@ -277,8 +277,8 @@ struct processor_costs i486_cost = {	/* 486 specific costs */
 					   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {2, 4, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 4, 2},				/* cost if storing mask register
@@ -387,8 +387,8 @@ struct processor_costs pentium_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {2, 4, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 4, 2},				/* cost if storing mask register
@@ -488,8 +488,8 @@ struct processor_costs lakemont_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {2, 4, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 4, 2},				/* cost if storing mask register
@@ -604,8 +604,8 @@ struct processor_costs pentiumpro_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {4, 4, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 2, 2},				/* cost if storing mask register
@@ -711,8 +711,8 @@ struct processor_costs geode_cost = {
 					   in 32,64,128,256 and 512-bit */
   {2, 2, 8, 16, 32},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  6, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  6, 6,				/* mask->integer and integer->mask moves */
   {2, 2, 2},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 2, 2},				/* cost if storing mask register
@@ -818,8 +818,8 @@ struct processor_costs k6_cost = {
 					   in 32,64,128,256 and 512-bit */
   {2, 2, 8, 16, 32},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  6, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  6, 6,				/* mask->integer and integer->mask moves */
   {4, 5, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 3, 2},				/* cost if storing mask register
@@ -931,8 +931,8 @@ struct processor_costs athlon_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 4, 10, 10, 20},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  5, 5,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  5, 5,				/* SSE->integer and integer->SSE moves */
+  5, 5,				/* mask->integer and integer->mask moves */
   {3, 4, 3},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {3, 4, 3},				/* cost if storing mask register
@@ -1046,8 +1046,8 @@ struct processor_costs k8_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 4, 10, 10, 20},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  5, 5,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  5, 5,				/* SSE->integer and integer->SSE moves */
+  5, 5,				/* mask->integer and integer->mask moves */
   {3, 4, 3},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {3, 4, 3},				/* cost if storing mask register
@@ -1165,8 +1165,8 @@ struct processor_costs amdfam10_cost = {
 					   in 32,64,128,256 and 512-bit */
   {4, 4, 5, 10, 20},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  3, 3,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  3, 3,				/* SSE->integer and integer->SSE moves */
+  3, 3,				/* mask->integer and integer->mask moves */
   {3, 4, 3},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {3, 4, 3},				/* cost if storing mask register
@@ -1295,7 +1295,7 @@ const struct processor_costs bdver_cost = {
   {10, 10, 10, 40, 60},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   16, 20,				/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  16, 20,				/* mask->integer and integer->mask moves */
   {8, 8, 8},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {8, 8, 8},				/* cost if storing mask register
@@ -1431,8 +1431,8 @@ struct processor_costs znver1_cost = {
 					   in 32,64,128,256 and 512-bit.  */
   {8, 8, 8, 16, 32},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit.  */
-  6, 6,					/* SSE->integer and integer->SSE moves.  */
-  2, 2,				/* mask->integer and integer->mask moves */
+  6, 6,				/* SSE->integer and integer->SSE moves.  */
+  8, 8,				/* mask->integer and integer->mask moves */
   {6, 6, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {8, 8, 8},				/* cost if storing mask register
@@ -1587,7 +1587,7 @@ struct processor_costs znver2_cost = {
 					   in 32,64,128,256 and 512-bit.  */
   6, 6,					/* SSE->integer and integer->SSE
 					   moves.  */
-  2, 2,				/* mask->integer and integer->mask moves */
+  8, 8,				/* mask->integer and integer->mask moves */
   {6, 6, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {8, 8, 8},				/* cost if storing mask register
@@ -1726,11 +1726,11 @@ struct processor_costs skylake_cost = {
 					   in 32,64,128,256 and 512-bit */
   {8, 8, 8, 12, 24},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  6, 6,					/* SSE->integer and integer->SSE moves */
-  4, 6,				/* mask->integer and integer->mask moves */
-  {6, 6, 6},				/* cost of loading mask register
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  5, 5,				/* mask->integer and integer->mask moves */
+  {8, 8, 8},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
-  {8, 8, 8},				/* cost if storing mask register
+  {6, 6, 6},				/* cost if storing mask register
 					   in QImode, HImode, SImode.  */
   3,					/* cost of moving mask register.  */
   /* End of register allocator costs.  */
@@ -1841,7 +1841,7 @@ const struct processor_costs btver1_cost = {
   {10, 10, 12, 48, 96},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   14, 14,				/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  14, 14,				/* mask->integer and integer->mask moves */
   {6, 8, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 8, 6},				/* cost if storing mask register
@@ -1951,7 +1951,7 @@ const struct processor_costs btver2_cost = {
   {10, 10, 12, 48, 96},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   14, 14,				/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  14, 14,				/* mask->integer and integer->mask moves */
   {8, 8, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {8, 8, 6},				/* cost if storing mask register
@@ -2060,7 +2060,7 @@ struct processor_costs pentium4_cost = {
   {16, 16, 16, 32, 64},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   20, 12,				/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  20, 12,				/* mask->integer and integer->mask moves */
   {4, 5, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {2, 3, 2},				/* cost if storing mask register
@@ -2172,7 +2172,7 @@ struct processor_costs nocona_cost = {
   {12, 12, 12, 24, 48},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   20, 12,				/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  20, 12,				/* mask->integer and integer->mask moves */
   {4, 4, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {4, 4, 4},				/* cost if storing mask register
@@ -2281,8 +2281,8 @@ struct processor_costs atom_cost = {
 					   in 32,64,128,256 and 512-bit */
   {8, 8, 8, 16, 32},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  8, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  8, 6,				/* SSE->integer and integer->SSE moves */
+  8, 6,				/* mask->integer and integer->mask moves */
   {6, 6, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 6, 6},			/* cost if storing mask register
@@ -2391,8 +2391,8 @@ struct processor_costs slm_cost = {
 					   in 32,64,128,256 and 512-bit */
   {8, 8, 8, 16, 32},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  8, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  8, 6,				/* SSE->integer and integer->SSE moves */
+  8, 6,				/* mask->integer and integer->mask moves */
   {8, 8, 8},			/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 6, 6},			/* cost if storing mask register
@@ -2501,8 +2501,8 @@ struct processor_costs intel_cost = {
 					   in 32,64,128,256 and 512-bit */
   {6, 6, 6, 6, 6},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  4, 4,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  4, 4,				/* SSE->integer and integer->SSE moves */
+  4, 4,				/* mask->integer and integer->mask moves */
   {4, 4, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 6, 6},				/* cost if storing mask register
@@ -2615,8 +2615,8 @@ struct processor_costs generic_cost = {
 					   in 32,64,128,256 and 512-bit */
   {6, 6, 6, 10, 15},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  6, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  6, 6,				/* mask->integer and integer->mask moves */
   {6, 6, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 6, 6},			/* cost if storing mask register
@@ -2734,8 +2734,8 @@ struct processor_costs core_cost = {
 					   in 32,64,128,256 and 512-bit */
   {6, 6, 6, 6, 12},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  6, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  6, 6,				/* mask->integer and integer->mask moves */
   {4, 4, 4},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
   {6, 6, 6},				/* cost if storing mask register
-- 
2.18.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Return mask <-> integer cost for non-AVX512 micro-architecture.
  2020-09-15  3:00 [PATCH] Return mask <-> integer cost for non-AVX512 micro-architecture Hongtao Liu
@ 2020-09-15  8:17 ` Uros Bizjak
  0 siblings, 0 replies; 2+ messages in thread
From: Uros Bizjak @ 2020-09-15  8:17 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu

On Tue, Sep 15, 2020 at 4:59 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   This patch would avoid spill gprs to mask registers for non-AVX512
> micro-architecture and fix regression in PR96744.
>
>   Bootstrap is ok, regression test for i386/x86-64 backend is ok.
>   No big performance impact on SPEC2017.
>
> gcc/ChangeLog:
>
>         PR taregt/96744
>         * config/i386/x86-tune-costs.h (struct processor_costs):
>         Increase mask <-> integer cost for non AVX512 target to avoid
>         spill gpr to mask. Also retune mask <-> integer and
>         mask_load/store for skylake_cost.

LGTM.

Thanks,
Uros.

>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-15  8:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15  3:00 [PATCH] Return mask <-> integer cost for non-AVX512 micro-architecture Hongtao Liu
2020-09-15  8:17 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).