public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Disable partial reg dependencies for haswell+
@ 2017-10-27 10:46 Jan Hubicka
  0 siblings, 0 replies; only message in thread
From: Jan Hubicka @ 2017-10-27 10:46 UTC (permalink / raw)
  To: gcc-patches, ubizjak, kirill.yukhin, hjl.tools

Hi,
while looking for x86 tuning issues I noticed PR81614 about partial register
stalls on core.  We currently support two schemes for our of order CPUs -
partial register dependencies where registers are renamed always as whole
and thus it is important to always write complete register at the begginnig
of dependency chain and partial register stalls where registers are renamed
by parts and it is important to not read full size after partial store.

Core renames partial registers, like pentiumPro+ but it is currently set
to partial reg dependency.

PR log also claims that there was change in Haswell that avoids the partial
register stalls completely (how?). This is per Agner Fog manual and I have
verified that dropping partial register dependncy on haswell produce no
regressions and slighly reduce code size.

I plan to experiment with switching pre-Haswell cores to partial register stals
but need to find set up for benchmarking (Vladimir is still running regular
tester on Conroe). But I plan to do that incrementally.

Because AMD chips are all partial reg dependency, we will probably need to
find a way to avoid both on the code sequences mentioned in the PRs. This
is another incremental step.

Bootstrapped/regtested x86_64-linux and benchmarked on Haswell on Spec2k,
spec2k6, C++ benchmarks, polyhedron and my own microbenchmarks which I developed
for partial register stalls/dependencies at the PPro/K7 times.

I have also noticed:
#define m_CORE_ALL (m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_HASWELL)
#define m_SKYLAKE_AVX512 (1U<<PROCESSOR_SKYLAKE_AVX512)

notice that skylake512 is thus not included in core tuning which seems wrong.
However because of
{"skylake-avx512", PROCESSOR_HASWELL, CPU_HASWELL, PTA_SKYLAKE_AVX512},
I think PROCESSOR_SKYLAKE_AVX512 is never set.  It is used though:
    case PROCESSOR_SKYLAKE_AVX512:
      def_or_undef (parse_in, "__skylake_avx512");
      def_or_undef (parse_in, "__skylake_avx512__");
      break;

How this is supposed to work?

I will commit the patch tonight if there are no complains.

Honza

	* x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY, X86_TUNE_MOVX):
	disable for Haswell and newer revisions of core.
Index: x86-tune.def
===================================================================
--- x86-tune.def	(revision 254073)
+++ x86-tune.def	(working copy)
@@ -48,7 +48,8 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
    over partial stores.  For example preffer MOVZBL or MOVQ to load 8bit
    value over movb.  */
 DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
-          m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL
+          m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
+	  | m_BONNELL | m_SILVERMONT | m_INTEL
 	  | m_KNL | m_KNM | m_AMD_MULTIPLE | m_GENERIC)
 
 /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: This knob promotes all store
@@ -84,8 +85,9 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STAL
 /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
    partial dependencies.  */
 DEF_TUNE (X86_TUNE_MOVX, "movx",
-          m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
-	  | m_KNL | m_KNM | m_INTEL | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC)
+          m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
+	  | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
+	  | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC)
 
 /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
    full sized loads.  */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2017-10-27 10:39 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27 10:46 Disable partial reg dependencies for haswell+ Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).