public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters
@ 2015-06-23  8:49 James Greenhalgh
  2015-06-23  8:49 ` [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property James Greenhalgh
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: James Greenhalgh @ 2015-06-23  8:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 5637 bytes --]

Hi,

This patch set adds support for a new command line option "-moverride".
The purpose of this command line is to allow expert-level users of the
compiler, and those comfortable with experimenting with the compiler,
*unsupported* full access to the tuning structures used in the AArch64
back-end.

For now, we only enable command-line access to the fusion pairs to
enable and whether or not to use the Cortex-A57 FMA register renaming
pass. Though in future we can expand this further.

With this patch, you might write something like:

  -moverride=fuse=adrp+add.cmp+branch:tune=rename_fma_regs

To enable fusion of adrp+add and cmp+branch and to enable the
cortex-a57-fma-steering pass.

The registration of a new sub-option is table driven, you add an
option name and a function which mutates the tuning parameters having
parsed the string you are given to aarch64_tuning_override_functions.

Expanding this for some of the other options (or groups of options) is
therefore fairly easy, but I haven't done it yet.

The patch set first refactors the fusion and pass tuning structures
to drive them through definitions in tables
( config/aarch64/aarch64-fusion-pairs.def,
  config/aarch64/aarch64-tuning-flags.def ). We then de-constify the
tune_params structure, as it can now modify. Finally we wire up the
new option, and add the parsing code to give the desired behaviour.

I've bootstrapped and tested the patch set on aarch64-none-linux-gnu
with BOOT_CFLAGS set to the example string above, and again in the
standard configuration with no issues.

OK for trunk?

Thanks,
James

---
[Patch AArch64 1/4] Define candidates for instruction fusion in a .def file

gcc/

2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-fusion-pairs.def: New.
	* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs): New.
	* config/aarch64/aarch64.c (AARCH64_FUSE_NOTHING): Move to
	aarch64_fusion_pairs.
	(AARCH64_FUSE_MOV_MOVK): Likewise.
	(AARCH64_FUSE_ADRP_ADD): Likewise.
	(AARCH64_FUSE_MOVK_MOVK): Likewise.
	(AARCH64_FUSE_ADRP_LDR): Likewise.
	(AARCH64_FUSE_CMP_BRANCH): Likewise.

---
[Patch AArch64 2/4] Control the FMA steering pass in tuning
 structures rather than as core property

gcc/

2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.h (AARCH64_FL_USE_FMA_STEERING_PASS): Delete.
	(aarch64_tune_flags): Likewise.
	(AARCH64_TUNE_FMA_STEERING): Likewise.
	* config/aarch64/aarch64-cores.def (cortex-a57): Remove reference
	to AARCH64_FL_USE_FMA_STEERING_PASS.
	(cortex-a57.cortex-a53): Likewise.
	(cortex-a72): Use cortexa72_tunings.
	(cortex-a72.cortex-a53): Likewise.
	(exynos-m1): Likewise.
	* config/aarch64/aarch64-protos.h (tune_params): Add
	a field: extra_tuning_flags.
	* config/aarch64/aarch64-tuning-flags.def: New.
	* config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNING_OPTION): New.
	(aarch64_extra_tuning_flags): Likewise.
	(aarch64_tune_params): Declare here.
	* config/aarch64/aarch64.c (generic_tunings): Set extra_tuning_flags.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(cortexa72_tunings): New.
	* config/aarch64/cortex-a57-fma-steering.c: Include aarch64-protos.h.
	 (gate): Check against aarch64_tune_params.
	* config/aarch64/t-aarch64 (cortex-a57-fma-steering.o): Depend on
	aarch64-protos.h.

---
[Patch AArch64 3/4] De-const-ify struct tune_params

gcc/

2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-protos.h (tune_params): Remove
	const from members.
	(aarch64_tune_params): Remove const, change to no longer be
	a pointer.
	* config/aarch64/aarch64.c (aarch64_tune_params): Remove const,
	change to no longer be a pointer, initialize to generic_tunings.
	(aarch64_min_divisions_for_recip_mul): Change dereference of
	aarch64_tune_params to member access.
	(aarch64_reassociation_width): Likewise.
	(aarch64_rtx_mult_cost): Likewise.
	(aarch64_address_cost): Likewise.
	(aarch64_branch_cost): Likewise.
	(aarch64_rtx_costs): Likewise.
	(aarch64_register_move_cost): Likewise.
	(aarch64_memory_move_cost): Likewise.
	(aarch64_sched_issue_rate): Likewise.
	(aarch64_builtin_vectorization_cost): Likewise.
	(aarch64_override_options): Take a copy of the selected tuning
	struct in to aarch64_tune_params, rather than just setting
	a pointer, change dereferences of aarch64_tune_params to member
	accesses.
	(aarch64_override_options_after_change): Change dereferences of
	aarch64_tune_params to member access.
	(aarch64_macro_fusion_p): Likewise.
	(aarch_macro_fusion_pair_p): Likewise.
	* config/aarch64/cortex-a57-fma-steering.c (gate): Likewise.

---

[Patch AArch64 4/4] Add -moverride tuning command, and wire it up for
 control of fusion and fma-steering

gcc/

2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.opt: (override): New.
	* doc/invoke.texi (override): Document.
	* config/aarch64/aarch64.c (aarch64_flag_desc): New
	(aarch64_fusible_pairs): Likewise.
	(aarch64_tuning_flags): Likewise.
	(aarch64_tuning_override_function): Likewise.
	(aarch64_tuning_override_functions): Likewise.
	(aarch64_parse_one_option_token): Likewise.
	(aarch64_parse_boolean_options): Likewise.
	(aarch64_parse_fuse_string): Likewise.
	(aarch64_parse_tune_string): Likewise.
	(aarch64_parse_one_override_token): Likewise.
	(aarch64_parse_override_string): Likewise.
	(aarch64_override_options): Parse the -override string if it
	is present.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property
  2015-06-23  8:49 [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters James Greenhalgh
@ 2015-06-23  8:49 ` James Greenhalgh
  2015-06-26 12:41   ` Marcus Shawcroft
  2015-06-23  8:50 ` [Patch AArch64 3/4] De-const-ify struct tune_params James Greenhalgh
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: James Greenhalgh @ 2015-06-23  8:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]


Hi,

The FMA steering pass should be enabled through the tuning structures
rather than be an intrinsic property of the core.  This patch moves
the control of the pass to the tuning structures - turning it off for
everything other than a Cortex-A57 system (i.e. -mcpu=cortex-a57
or -mcpu=cortex-a57.cortex-a53).

Some CPU's share the cortexa57 tuning structs, but do not use this
steering pass. For those I've taken a copy of the cortexa57 tuning
structures and called it cortexa72.

Tested with a compiler build and all known values of -mcpu to make sure
the pass runs in the expected configurations.

OK?

Thanks,
James

---
2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.h (AARCH64_FL_USE_FMA_STEERING_PASS): Delete.
	(aarch64_tune_flags): Likewise.
	(AARCH64_TUNE_FMA_STEERING): Likewise.
	* config/aarch64/aarch64-cores.def (cortex-a57): Remove reference
	to AARCH64_FL_USE_FMA_STEERING_PASS.
	(cortex-a57.cortex-a53): Likewise.
	(cortex-a72): Use cortexa72_tunings.
	(cortex-a72.cortex-a53): Likewise.
	(exynos-m1): Likewise.
	* config/aarch64/aarch64-protos.h (tune_params): Add
	a field: extra_tuning_flags.
	* config/aarch64/aarch64-tuning-flags.def: New.
	* config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNING_OPTION): New.
	(aarch64_extra_tuning_flags): Likewise.
	(aarch64_tune_params): Declare here.
	* config/aarch64/aarch64.c (generic_tunings): Set extra_tuning_flags.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(cortexa72_tunings): New.
	* config/aarch64/cortex-a57-fma-steering.c: Include aarch64-protos.h.
	 (gate): Check against aarch64_tune_params.
	* config/aarch64/t-aarch64 (cortex-a57-fma-steering.o): Depend on
	aarch64-protos.h.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-Patch-AArch64-2-4-Control-the-FMA-steering-pass-in-t.patch --]
[-- Type: text/x-patch;  name=0002-Patch-AArch64-2-4-Control-the-FMA-steering-pass-in-t.patch, Size: 10941 bytes --]

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index dfc9cc8..c4e22fe 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -40,13 +40,13 @@
 /* V8 Architecture Processors.  */
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
-AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07")
-AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08")
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001")
+AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
+AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
+AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
 AARCH64_CORE("thunderx",    thunderx,  thunderx,  8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
 
 /* V8 big.LITTLE implementations.  */
 
-AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07.0xd03")
-AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03")
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
+AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 4bdcc46..7ece346 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -187,6 +187,7 @@ struct tune_params
   const int vec_reassoc_width;
   const int min_div_recip_mul_sf;
   const int min_div_recip_mul_df;
+  const unsigned int extra_tuning_flags;
 };
 
 #define AARCH64_FUSION_PAIR(x, name, index) \
@@ -209,6 +210,26 @@ enum aarch64_fusion_pairs
 };
 #undef AARCH64_FUSION_PAIR
 
+#define AARCH64_EXTRA_TUNING_OPTION(x, name, index) \
+  AARCH64_EXTRA_TUNE_##name = (1 << index),
+/* Supported tuning flags.  */
+enum aarch64_extra_tuning_flags
+{
+  AARCH64_EXTRA_TUNE_NONE = 0,
+#include "aarch64-tuning-flags.def"
+
+/* Hacky macro to build the "all" flag mask.
+   Expands to 0 | AARCH64_TUNE_index0 | AARCH64_TUNE_index1 , etc.  */
+#undef AARCH64_EXTRA_TUNING_OPTION
+#define AARCH64_EXTRA_TUNING_OPTION(x, name, y) \
+  | AARCH64_EXTRA_TUNE_##name
+  AARCH64_EXTRA_TUNE_ALL = 0
+#include "aarch64-tuning-flags.def"
+};
+#undef AARCH64_EXTRA_TUNING_OPTION
+
+extern const struct tune_params *aarch64_tune_params;
+
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
new file mode 100644
index 0000000..01aaca8
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -0,0 +1,34 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Additional control over certain tuning parameters.  Before including
+   this file, define a macro:
+
+     AARCH64_EXTRA_TUNING_OPTION (name, internal_name, index_bit)
+
+   Where:
+
+     NAME is a string giving a friendly name for the tuning flag.
+     INTERNAL_NAME gives the internal name suitable for appending to
+     AARCH64_TUNE_ to give an enum name.
+     INDEX_BIT is the bit to set in the bitmask of supported tuning
+     flags.  */
+
+AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS, 0)
+
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5fe487b..96327a2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -343,7 +343,8 @@ static const struct tune_params generic_tunings =
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
-  2	/* min_div_recip_mul_df.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -364,7 +365,8 @@ static const struct tune_params cortexa53_tunings =
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
-  2	/* min_div_recip_mul_df.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -385,7 +387,30 @@ static const struct tune_params cortexa57_tunings =
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
-  2	/* min_div_recip_mul_df.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
+};
+
+static const struct tune_params cortexa72_tunings =
+{
+  &cortexa57_extra_costs,
+  &cortexa57_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  4, /* memmov_cost  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
+  16,	/* function_align.  */
+  8,	/* jump_align.  */
+  4,	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
 static const struct tune_params thunderx_tunings =
@@ -405,7 +430,8 @@ static const struct tune_params thunderx_tunings =
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
-  2	/* min_div_recip_mul_df.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
 static const struct tune_params xgene1_tunings =
@@ -425,7 +451,8 @@ static const struct tune_params xgene1_tunings =
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
-  2	/* min_div_recip_mul_df.  */
+  2,	/* min_div_recip_mul_df.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
 /* A processor implementing AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index a22c6e4..a99beaf 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -199,13 +199,11 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FP         (1 << 1)	/* Has FP.  */
 #define AARCH64_FL_CRYPTO     (1 << 2)	/* Has crypto.  */
 #define AARCH64_FL_CRC        (1 << 3)	/* Has CRC.  */
-/* Has static dispatch of FMA.  */
-#define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 4)
 /* ARMv8.1 architecture extensions.  */
-#define AARCH64_FL_LSE	      (1 << 5)  /* Has Large System Extensions.  */
-#define AARCH64_FL_PAN	      (1 << 6)  /* Has Privileged Access Never.  */
-#define AARCH64_FL_LOR	      (1 << 7)  /* Has Limited Ordering regions.  */
-#define AARCH64_FL_RDMA	      (1 << 8)  /* Has ARMv8.1 Adv.SIMD.  */
+#define AARCH64_FL_LSE	      (1 << 4)  /* Has Large System Extensions.  */
+#define AARCH64_FL_PAN	      (1 << 5)  /* Has Privileged Access Never.  */
+#define AARCH64_FL_LOR	      (1 << 6)  /* Has Limited Ordering regions.  */
+#define AARCH64_FL_RDMA	      (1 << 7)  /* Has ARMv8.1 Adv.SIMD.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD     (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -226,11 +224,6 @@ extern unsigned long aarch64_isa_flags;
 #define AARCH64_ISA_FP             (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD           (aarch64_isa_flags & AARCH64_FL_SIMD)
 
-/* Macros to test tuning flags.  */
-extern unsigned long aarch64_tune_flags;
-#define AARCH64_TUNE_FMA_STEERING \
-  (aarch64_tune_flags & AARCH64_FL_USE_FMA_STEERING_PASS)
-
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
 
diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.c b/gcc/config/aarch64/cortex-a57-fma-steering.c
index 648a88c..07bf8de 100644
--- a/gcc/config/aarch64/cortex-a57-fma-steering.c
+++ b/gcc/config/aarch64/cortex-a57-fma-steering.c
@@ -43,6 +43,7 @@
 #include "tree-pass.h"
 #include "regrename.h"
 #include "cortex-a57-fma-steering.h"
+#include "aarch64-protos.h"
 
 #include <list>
 
@@ -1051,7 +1052,9 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return AARCH64_TUNE_FMA_STEERING && optimize >= 2;
+      return (aarch64_tune_params->extra_tuning_flags
+	      & AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)
+	      && optimize >= 2;
     }
 
   virtual unsigned int execute (function *)
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 0371203..af154f4 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -53,7 +53,8 @@ cortex-a57-fma-steering.o: $(srcdir)/config/aarch64/cortex-a57-fma-steering.c \
     dominance.h cfg.h cfganal.h $(BASIC_BLOCK_H) $(INSN_ATTR_H) $(RECOG_H) \
     output.h hash-map.h $(DF_H) $(OBSTACK_H) $(TARGET_H) $(RTL_H) \
     $(CONTEXT_H) $(TREE_PASS_H) regrename.h \
-    $(srcdir)/config/aarch64/cortex-a57-fma-steering.h
+    $(srcdir)/config/aarch64/cortex-a57-fma-steering.h \
+    $(srcdir)/config/aarch64/aarch64-protos.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/aarch64/cortex-a57-fma-steering.c
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file
  2015-06-23  8:49 [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters James Greenhalgh
  2015-06-23  8:49 ` [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property James Greenhalgh
  2015-06-23  8:50 ` [Patch AArch64 3/4] De-const-ify struct tune_params James Greenhalgh
@ 2015-06-23  8:50 ` James Greenhalgh
  2015-06-26 12:41   ` Marcus Shawcroft
  2015-06-23  8:52 ` [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering James Greenhalgh
  3 siblings, 1 reply; 9+ messages in thread
From: James Greenhalgh @ 2015-06-23  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]


Hi,

This patch moves the instruction fusion pairs from a set of #defines
to an enum which we can generate from a .def file.

We'll use that .def file again, and the friendly names it introduces
shortly.

OK?

Thanks,
James

---
2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-fusion-pairs.def: New.
	* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs): New.
	* config/aarch64/aarch64.c (AARCH64_FUSE_NOTHING): Move to
	aarch64_fusion_pairs.
	(AARCH64_FUSE_MOV_MOVK): Likewise.
	(AARCH64_FUSE_ADRP_ADD): Likewise.
	(AARCH64_FUSE_MOVK_MOVK): Likewise.
	(AARCH64_FUSE_ADRP_LDR): Likewise.
	(AARCH64_FUSE_CMP_BRANCH): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-AArch64-1-4-Define-candidates-for-instruction-.patch --]
[-- Type: text/x-patch;  name=0001-Patch-AArch64-1-4-Define-candidates-for-instruction-.patch, Size: 3443 bytes --]

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
new file mode 100644
index 0000000..a7b00f6
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -0,0 +1,38 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Pairs of instructions which can be fused. before including this file,
+   define a macro:
+
+     AARCH64_FUSION_PAIR (name, internal_name, index_bit)
+
+   Where:
+
+     NAME is a string giving a friendly name for the instructions to fuse.
+     INTERNAL_NAME gives the internal name suitable for appending to
+     AARCH64_FUSE_ to give an enum name.
+     INDEX_BIT is the bit to set in the bitmask of supported fusion
+     operations.  */
+
+AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK, 0)
+AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD, 1)
+AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK, 2)
+AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR, 3)
+AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH, 4)
+
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 965a11b..4bdcc46 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -189,6 +189,26 @@ struct tune_params
   const int min_div_recip_mul_df;
 };
 
+#define AARCH64_FUSION_PAIR(x, name, index) \
+  AARCH64_FUSE_##name = (1 << index),
+/* Supported fusion operations.  */
+enum aarch64_fusion_pairs
+{
+  AARCH64_FUSE_NOTHING = 0,
+#include "aarch64-fusion-pairs.def"
+
+/* Hacky macro to build AARCH64_FUSE_ALL.  The sequence below expands
+   to:
+   AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
+#undef AARCH64_FUSION_PAIR
+#define AARCH64_FUSION_PAIR(x, name, y) \
+  | AARCH64_FUSE_##name
+
+  AARCH64_FUSE_ALL = 0
+#include "aarch64-fusion-pairs.def"
+};
+#undef AARCH64_FUSION_PAIR
+
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 17bae08..5fe487b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -319,13 +319,6 @@ static const struct cpu_vector_cost xgene1_vector_cost =
   1 /* cond_not_taken_branch_cost  */
 };
 
-#define AARCH64_FUSE_NOTHING	(0)
-#define AARCH64_FUSE_MOV_MOVK	(1 << 0)
-#define AARCH64_FUSE_ADRP_ADD	(1 << 1)
-#define AARCH64_FUSE_MOVK_MOVK	(1 << 2)
-#define AARCH64_FUSE_ADRP_LDR	(1 << 3)
-#define AARCH64_FUSE_CMP_BRANCH	(1 << 4)
-
 /* Generic costs for branch instructions.  */
 static const struct cpu_branch_cost generic_branch_cost =
 {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch AArch64 3/4] De-const-ify struct tune_params
  2015-06-23  8:49 [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters James Greenhalgh
  2015-06-23  8:49 ` [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property James Greenhalgh
@ 2015-06-23  8:50 ` James Greenhalgh
  2015-06-26 12:44   ` Marcus Shawcroft
  2015-06-23  8:50 ` [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file James Greenhalgh
  2015-06-23  8:52 ` [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering James Greenhalgh
  3 siblings, 1 reply; 9+ messages in thread
From: James Greenhalgh @ 2015-06-23  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]


Hi,

If we want to overwrite parts of this structure, we're going to need it
to be more malleable than it is presently.

Run through and remove const from each of the members, create a non-const
tuning structure we can modify, and set aarch64_tune_params to always
point to this new structure. Change the -mtune parsing code to take a
copy of the tuning structure in use rather than just taking the
reference from within the processor struct. Change all the current
users of aarch64_tune_params which no longer need to dereference a
pointer.

Checked on aarch64-none-linux-gnueabi with no issues.

OK?

Thanks,
James

---
2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-protos.h (tune_params): Remove
	const from members.
	(aarch64_tune_params): Remove const, change to no longer be
	a pointer.
	* config/aarch64/aarch64.c (aarch64_tune_params): Remove const,
	change to no longer be a pointer, initialize to generic_tunings.
	(aarch64_min_divisions_for_recip_mul): Change dereference of
	aarch64_tune_params to member access.
	(aarch64_reassociation_width): Likewise.
	(aarch64_rtx_mult_cost): Likewise.
	(aarch64_address_cost): Likewise.
	(aarch64_branch_cost): Likewise.
	(aarch64_rtx_costs): Likewise.
	(aarch64_register_move_cost): Likewise.
	(aarch64_memory_move_cost): Likewise.
	(aarch64_sched_issue_rate): Likewise.
	(aarch64_builtin_vectorization_cost): Likewise.
	(aarch64_override_options): Take a copy of the selected tuning
	struct in to aarch64_tune_params, rather than just setting
	a pointer, change dereferences of aarch64_tune_params to member
	accesses.
	(aarch64_override_options_after_change): Change dereferences of
	aarch64_tune_params to member access.
	(aarch64_macro_fusion_p): Likewise.
	(aarch_macro_fusion_pair_p): Likewise.
	* config/aarch64/cortex-a57-fma-steering.c (gate): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Patch-AArch64-3-4-De-const-ify-struct-tune_params.patch --]
[-- Type: text/x-patch;  name=0003-Patch-AArch64-3-4-De-const-ify-struct-tune_params.patch, Size: 11866 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7ece346..09e3077 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -171,23 +171,23 @@ struct cpu_branch_cost
 
 struct tune_params
 {
-  const struct cpu_cost_table *const insn_extra_cost;
-  const struct cpu_addrcost_table *const addr_cost;
-  const struct cpu_regmove_cost *const regmove_cost;
-  const struct cpu_vector_cost *const vec_costs;
-  const struct cpu_branch_cost *const branch_costs;
-  const int memmov_cost;
-  const int issue_rate;
-  const unsigned int fusible_ops;
-  const int function_align;
-  const int jump_align;
-  const int loop_align;
-  const int int_reassoc_width;
-  const int fp_reassoc_width;
-  const int vec_reassoc_width;
-  const int min_div_recip_mul_sf;
-  const int min_div_recip_mul_df;
-  const unsigned int extra_tuning_flags;
+  const struct cpu_cost_table *insn_extra_cost;
+  const struct cpu_addrcost_table *addr_cost;
+  const struct cpu_regmove_cost *regmove_cost;
+  const struct cpu_vector_cost *vec_costs;
+  const struct cpu_branch_cost *branch_costs;
+  int memmov_cost;
+  int issue_rate;
+  unsigned int fusible_ops;
+  int function_align;
+  int jump_align;
+  int loop_align;
+  int int_reassoc_width;
+  int fp_reassoc_width;
+  int vec_reassoc_width;
+  int min_div_recip_mul_sf;
+  int min_div_recip_mul_df;
+  unsigned int extra_tuning_flags;
 };
 
 #define AARCH64_FUSION_PAIR(x, name, index) \
@@ -228,7 +228,7 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
-extern const struct tune_params *aarch64_tune_params;
+extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 96327a2..aa457db 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -164,9 +164,6 @@ unsigned aarch64_architecture_version;
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
-/* The current tuning set.  */
-const struct tune_params *aarch64_tune_params;
-
 /* Mask to specify which instructions we are allowed to generate.  */
 unsigned long aarch64_isa_flags = 0;
 
@@ -493,6 +490,9 @@ static const struct processor *selected_arch;
 static const struct processor *selected_cpu;
 static const struct processor *selected_tune;
 
+/* The current tuning set.  */
+struct tune_params aarch64_tune_params = generic_tunings;
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -544,8 +544,8 @@ static unsigned int
 aarch64_min_divisions_for_recip_mul (enum machine_mode mode)
 {
   if (GET_MODE_UNIT_SIZE (mode) == 4)
-    return aarch64_tune_params->min_div_recip_mul_sf;
-  return aarch64_tune_params->min_div_recip_mul_df;
+    return aarch64_tune_params.min_div_recip_mul_sf;
+  return aarch64_tune_params.min_div_recip_mul_df;
 }
 
 static int
@@ -553,11 +553,11 @@ aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
 			     enum machine_mode mode)
 {
   if (VECTOR_MODE_P (mode))
-    return aarch64_tune_params->vec_reassoc_width;
+    return aarch64_tune_params.vec_reassoc_width;
   if (INTEGRAL_MODE_P (mode))
-    return aarch64_tune_params->int_reassoc_width;
+    return aarch64_tune_params.int_reassoc_width;
   if (FLOAT_MODE_P (mode))
-    return aarch64_tune_params->fp_reassoc_width;
+    return aarch64_tune_params.fp_reassoc_width;
   return 1;
 }
 
@@ -5204,7 +5204,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 {
   rtx op0, op1;
   const struct cpu_cost_table *extra_cost
-    = aarch64_tune_params->insn_extra_cost;
+    = aarch64_tune_params.insn_extra_cost;
   int cost = 0;
   bool compound_p = (outer == PLUS || outer == MINUS);
   machine_mode mode = GET_MODE (x);
@@ -5336,7 +5336,7 @@ aarch64_address_cost (rtx x,
 		      bool speed)
 {
   enum rtx_code c = GET_CODE (x);
-  const struct cpu_addrcost_table *addr_cost = aarch64_tune_params->addr_cost;
+  const struct cpu_addrcost_table *addr_cost = aarch64_tune_params.addr_cost;
   struct aarch64_address_info info;
   int cost = 0;
   info.shift = 0;
@@ -5433,7 +5433,7 @@ aarch64_branch_cost (bool speed_p, bool predictable_p)
 {
   /* When optimizing for speed, use the cost of unpredictable branches.  */
   const struct cpu_branch_cost *branch_costs =
-    aarch64_tune_params->branch_costs;
+    aarch64_tune_params.branch_costs;
 
   if (!speed_p || predictable_p)
     return branch_costs->predictable;
@@ -5616,7 +5616,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 {
   rtx op0, op1, op2;
   const struct cpu_cost_table *extra_cost
-    = aarch64_tune_params->insn_extra_cost;
+    = aarch64_tune_params.insn_extra_cost;
   machine_mode mode = GET_MODE (x);
 
   /* By default, assume that everything has equivalent cost to the
@@ -6776,7 +6776,7 @@ aarch64_register_move_cost (machine_mode mode,
   enum reg_class from = (enum reg_class) from_i;
   enum reg_class to = (enum reg_class) to_i;
   const struct cpu_regmove_cost *regmove_cost
-    = aarch64_tune_params->regmove_cost;
+    = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
   if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
@@ -6831,14 +6831,14 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 			  reg_class_t rclass ATTRIBUTE_UNUSED,
 			  bool in ATTRIBUTE_UNUSED)
 {
-  return aarch64_tune_params->memmov_cost;
+  return aarch64_tune_params.memmov_cost;
 }
 
 /* Return the number of instructions that can be issued per cycle.  */
 static int
 aarch64_sched_issue_rate (void)
 {
-  return aarch64_tune_params->issue_rate;
+  return aarch64_tune_params.issue_rate;
 }
 
 static int
@@ -6862,44 +6862,44 @@ aarch64_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
   switch (type_of_cost)
     {
       case scalar_stmt:
-	return aarch64_tune_params->vec_costs->scalar_stmt_cost;
+	return aarch64_tune_params.vec_costs->scalar_stmt_cost;
 
       case scalar_load:
-	return aarch64_tune_params->vec_costs->scalar_load_cost;
+	return aarch64_tune_params.vec_costs->scalar_load_cost;
 
       case scalar_store:
-	return aarch64_tune_params->vec_costs->scalar_store_cost;
+	return aarch64_tune_params.vec_costs->scalar_store_cost;
 
       case vector_stmt:
-	return aarch64_tune_params->vec_costs->vec_stmt_cost;
+	return aarch64_tune_params.vec_costs->vec_stmt_cost;
 
       case vector_load:
-	return aarch64_tune_params->vec_costs->vec_align_load_cost;
+	return aarch64_tune_params.vec_costs->vec_align_load_cost;
 
       case vector_store:
-	return aarch64_tune_params->vec_costs->vec_store_cost;
+	return aarch64_tune_params.vec_costs->vec_store_cost;
 
       case vec_to_scalar:
-	return aarch64_tune_params->vec_costs->vec_to_scalar_cost;
+	return aarch64_tune_params.vec_costs->vec_to_scalar_cost;
 
       case scalar_to_vec:
-	return aarch64_tune_params->vec_costs->scalar_to_vec_cost;
+	return aarch64_tune_params.vec_costs->scalar_to_vec_cost;
 
       case unaligned_load:
-	return aarch64_tune_params->vec_costs->vec_unalign_load_cost;
+	return aarch64_tune_params.vec_costs->vec_unalign_load_cost;
 
       case unaligned_store:
-	return aarch64_tune_params->vec_costs->vec_unalign_store_cost;
+	return aarch64_tune_params.vec_costs->vec_unalign_store_cost;
 
       case cond_branch_taken:
-	return aarch64_tune_params->vec_costs->cond_taken_branch_cost;
+	return aarch64_tune_params.vec_costs->cond_taken_branch_cost;
 
       case cond_branch_not_taken:
-	return aarch64_tune_params->vec_costs->cond_not_taken_branch_cost;
+	return aarch64_tune_params.vec_costs->cond_not_taken_branch_cost;
 
       case vec_perm:
       case vec_promote_demote:
-	return aarch64_tune_params->vec_costs->vec_stmt_cost;
+	return aarch64_tune_params.vec_costs->vec_stmt_cost;
 
       case vec_construct:
         elements = TYPE_VECTOR_SUBPARTS (vectype);
@@ -7201,7 +7201,9 @@ aarch64_override_options (void)
 
   aarch64_tune_flags = selected_tune->flags;
   aarch64_tune = selected_tune->core;
-  aarch64_tune_params = selected_tune->tune;
+  /* Make a copy of the tuning parameters attached to the core, which
+     we may later overwrite.  */
+  aarch64_tune_params = *(selected_tune->tune);
   aarch64_architecture_version = selected_cpu->architecture_version;
 
   if (aarch64_fix_a53_err835769 == 2)
@@ -7233,11 +7235,11 @@ aarch64_override_options_after_change (void)
   if (!optimize_size)
     {
       if (align_loops <= 0)
-	align_loops = aarch64_tune_params->loop_align;
+	align_loops = aarch64_tune_params.loop_align;
       if (align_jumps <= 0)
-	align_jumps = aarch64_tune_params->jump_align;
+	align_jumps = aarch64_tune_params.jump_align;
       if (align_functions <= 0)
-	align_functions = aarch64_tune_params->function_align;
+	align_functions = aarch64_tune_params.function_align;
     }
 }
 
@@ -10937,7 +10939,7 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx *gen_seq, rtx prev, int cmp_code,
 static bool
 aarch64_macro_fusion_p (void)
 {
-  return aarch64_tune_params->fusible_ops != AARCH64_FUSE_NOTHING;
+  return aarch64_tune_params.fusible_ops != AARCH64_FUSE_NOTHING;
 }
 
 
@@ -10957,7 +10959,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
     return false;
 
   if (simple_sets_p
-      && (aarch64_tune_params->fusible_ops & AARCH64_FUSE_MOV_MOVK))
+      && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_MOV_MOVK))
     {
       /* We are trying to match:
          prev (mov)  == (set (reg r0) (const_int imm16))
@@ -10982,7 +10984,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
     }
 
   if (simple_sets_p
-      && (aarch64_tune_params->fusible_ops & AARCH64_FUSE_ADRP_ADD))
+      && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_ADRP_ADD))
     {
 
       /*  We're trying to match:
@@ -11008,7 +11010,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
     }
 
   if (simple_sets_p
-      && (aarch64_tune_params->fusible_ops & AARCH64_FUSE_MOVK_MOVK))
+      && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_MOVK_MOVK))
     {
 
       /* We're trying to match:
@@ -11037,7 +11039,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 
     }
   if (simple_sets_p
-      && (aarch64_tune_params->fusible_ops & AARCH64_FUSE_ADRP_LDR))
+      && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_ADRP_LDR))
     {
       /* We're trying to match:
           prev (adrp) == (set (reg r0)
@@ -11068,7 +11070,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
         }
     }
 
-  if ((aarch64_tune_params->fusible_ops & AARCH64_FUSE_CMP_BRANCH)
+  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
       && any_condjump_p (curr))
     {
       enum attr_type prev_type = get_attr_type (prev);
diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.c b/gcc/config/aarch64/cortex-a57-fma-steering.c
index 07bf8de..a0b2969 100644
--- a/gcc/config/aarch64/cortex-a57-fma-steering.c
+++ b/gcc/config/aarch64/cortex-a57-fma-steering.c
@@ -1052,7 +1052,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (aarch64_tune_params->extra_tuning_flags
+      return (aarch64_tune_params.extra_tuning_flags
 	      & AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)
 	      && optimize >= 2;
     }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering
  2015-06-23  8:49 [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters James Greenhalgh
                   ` (2 preceding siblings ...)
  2015-06-23  8:50 ` [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file James Greenhalgh
@ 2015-06-23  8:52 ` James Greenhalgh
  2015-06-26 12:52   ` Marcus Shawcroft
  3 siblings, 1 reply; 9+ messages in thread
From: James Greenhalgh @ 2015-06-23  8:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]


Hi,

This final patch adds support for the new command line option
"-moverride". The purpose of this command line is to allow expert-level users
of the compiler, and those comfortable with experimenting with the compiler,
*unsupported* full access to the tuning structures used in the AArch64
back-end.

For now, we only enable command-line access to the fusion pairs to
enable and whether or not to use the Cortex-A57 FMA register renaming
pass. Though in future we can expand this further.

With this patch, you might write something like:

  -moverride=fuse=adrp+add.cmp+branch:tune=rename_fma_regs

To enable fusion of adrp+add and cmp+branch and to enable the
fma-rename pass.

I've bootstrapped and tested the patch set on aarch64-none-linux-gnu
with BOOT_CFLAGS set to the example string above, and again in the
standard configuration with no issues.

OK?

Thanks,
James

---
2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.opt: (override): New.
	* doc/invoke.texi (override): Document.
	* config/aarch64/aarch64.c (aarch64_flag_desc): New
	(aarch64_fusible_pairs): Likewise.
	(aarch64_tuning_flags): Likewise.
	(aarch64_tuning_override_function): Likewise.
	(aarch64_tuning_override_functions): Likewise.
	(aarch64_parse_one_option_token): Likewise.
	(aarch64_parse_boolean_options): Likewise.
	(aarch64_parse_fuse_string): Likewise.
	(aarch64_parse_tune_string): Likewise.
	(aarch64_parse_one_override_token): Likewise.
	(aarch64_parse_override_string): Likewise.
	(aarch64_override_options): Parse the -override string if it
	is present.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-Patch-AArch64-4-4-Add-moverride-tuning-command-and-w.patch --]
[-- Type: text/x-patch;  name=0004-Patch-AArch64-4-4-Add-moverride-tuning-command-and-w.patch, Size: 9135 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aa457db..207c18b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -170,6 +170,36 @@ unsigned long aarch64_isa_flags = 0;
 /* Mask to specify which instruction scheduling options should be used.  */
 unsigned long aarch64_tune_flags = 0;
 
+/* Support for command line parsing of boolean flags in the tuning
+   structures.  */
+struct aarch64_flag_desc
+{
+  const char* name;
+  unsigned int flag;
+};
+
+#define AARCH64_FUSION_PAIR(name, internal_name, y) \
+  { name, AARCH64_FUSE_##internal_name },
+static const struct aarch64_flag_desc aarch64_fusible_pairs[] =
+{
+  { "none", AARCH64_FUSE_NOTHING },
+#include "aarch64-fusion-pairs.def"
+  { "all", AARCH64_FUSE_ALL },
+  { NULL, AARCH64_FUSE_NOTHING }
+};
+#undef AARCH64_FUION_PAIR
+
+#define AARCH64_EXTRA_TUNING_OPTION(name, internal_name, y) \
+  { name, AARCH64_EXTRA_TUNE_##internal_name },
+static const struct aarch64_flag_desc aarch64_tuning_flags[] =
+{
+  { "none", AARCH64_EXTRA_TUNE_NONE },
+#include "aarch64-tuning-flags.def"
+  { "all", AARCH64_EXTRA_TUNE_ALL },
+  { NULL, AARCH64_EXTRA_TUNE_NONE }
+};
+#undef AARCH64_EXTRA_TUNING_OPTION
+
 /* Tuning parameters.  */
 
 static const struct cpu_addrcost_table generic_addrcost_table =
@@ -452,6 +482,24 @@ static const struct tune_params xgene1_tunings =
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
+/* Support for fine-grained override of the tuning structures.  */
+struct aarch64_tuning_override_function
+{
+  const char* name;
+  void (*parse_override)(const char*, struct tune_params*);
+};
+
+static void aarch64_parse_fuse_string (const char*, struct tune_params*);
+static void aarch64_parse_tune_string (const char*, struct tune_params*);
+
+static const struct aarch64_tuning_override_function
+  aarch64_tuning_override_functions[] =
+{
+  { "fuse", aarch64_parse_fuse_string },
+  { "tune", aarch64_parse_tune_string },
+  { NULL, NULL }
+};
+
 /* A processor implementing AArch64.  */
 struct processor
 {
@@ -7142,6 +7190,178 @@ aarch64_parse_tune (void)
   return;
 }
 
+/* Parse TOKEN, which has length LENGTH to see if it is an option
+   described in FLAG.  If it is, return the index bit for that fusion type.
+   If not, error (printing OPTION_NAME) and return zero.  */
+
+static unsigned int
+aarch64_parse_one_option_token (const char *token,
+				size_t length,
+				const struct aarch64_flag_desc *flag,
+				const char *option_name)
+{
+  for (; flag->name != NULL; flag++)
+    {
+      if (length == strlen (flag->name)
+	  && !strncmp (flag->name, token, length))
+	return flag->flag;
+    }
+
+  error ("unknown flag passed in -moverride=%s (%s)", option_name, token);
+  return 0;
+}
+
+/* Parse OPTION which is a comma-separated list of flags to enable.
+   FLAGS gives the list of flags we understand, INITIAL_STATE gives any
+   default state we inherit from the CPU tuning structures.  OPTION_NAME
+   gives the top-level option we are parsing in the -moverride string,
+   for use in error messages.  */
+
+static unsigned int
+aarch64_parse_boolean_options (const char *option,
+			       const struct aarch64_flag_desc *flags,
+			       unsigned int initial_state,
+			       const char *option_name)
+{
+  const char separator = '.';
+  const char* specs = option;
+  const char* ntoken = option;
+  unsigned int found_flags = initial_state;
+
+  while ((ntoken = strchr (specs, separator)))
+    {
+      size_t token_length = ntoken - specs;
+      unsigned token_ops = aarch64_parse_one_option_token (specs,
+							   token_length,
+							   flags,
+							   option_name);
+      /* If we find "none" (or, for simplicity's sake, an error) anywhere
+	 in the token stream, reset the supported operations.  So:
+
+	   adrp+add.cmp+branch.none.adrp+add
+
+	   would have the result of turning on only adrp+add fusion.  */
+      if (!token_ops)
+	found_flags = 0;
+
+      found_flags |= token_ops;
+      specs = ++ntoken;
+    }
+
+  /* We ended with a comma, print something.  */
+  if (!(*specs))
+    {
+      error ("%s string ill-formed\n", option_name);
+      return 0;
+    }
+
+  /* We still have one more token to parse.  */
+  size_t token_length = strlen (specs);
+  unsigned token_ops = aarch64_parse_one_option_token (specs,
+						       token_length,
+						       flags,
+						       option_name);
+   if (!token_ops)
+     found_flags = 0;
+
+  found_flags |= token_ops;
+  return found_flags;
+}
+
+/* Support for overriding instruction fusion.  */
+
+static void
+aarch64_parse_fuse_string (const char *fuse_string,
+			    struct tune_params *tune)
+{
+  tune->fusible_ops = aarch64_parse_boolean_options (fuse_string,
+						     aarch64_fusible_pairs,
+						     tune->fusible_ops,
+						     "fuse=");
+}
+
+/* Support for overriding other tuning flags.  */
+
+static void
+aarch64_parse_tune_string (const char *tune_string,
+			    struct tune_params *tune)
+{
+  tune->extra_tuning_flags
+    = aarch64_parse_boolean_options (tune_string,
+				     aarch64_tuning_flags,
+				     tune->extra_tuning_flags,
+				     "tune=");
+}
+
+/* Parse TOKEN, which has length LENGTH to see if it is a tuning option
+   we understand.  If it is, extract the option string and handoff to
+   the appropriate function.  */
+
+void
+aarch64_parse_one_override_token (const char* token,
+				  size_t length,
+				  struct tune_params *tune)
+{
+  const struct aarch64_tuning_override_function *fn
+    = aarch64_tuning_override_functions;
+
+  const char *option_part = strchr (token, '=');
+  if (!option_part)
+    {
+      error ("tuning string missing in option (%s)", token);
+      return;
+    }
+
+  /* Get the length of the option name.  */
+  length = option_part - token;
+  /* Skip the '=' to get to the option string.  */
+  option_part++;
+
+  for (; fn->name != NULL; fn++)
+    {
+      if (!strncmp (fn->name, token, length))
+	{
+	  fn->parse_override (option_part, tune);
+	  return;
+	}
+    }
+
+  error ("unknown tuning option (%s)",token);
+  return;
+}
+
+/* Parse STRING looking for options in the format:
+     string	:: option:string
+     option	:: name=substring
+     name	:: {a-z}
+     substring	:: defined by option.  */
+
+static void
+aarch64_parse_override_string (const char* input_string,
+			       struct tune_params* tune)
+{
+  const char separator = ':';
+  size_t string_length = strlen (input_string) + 1;
+  char *string_root = (char *) xmalloc (sizeof (*string_root) * string_length);
+  char *string = string_root;
+  strncpy (string, input_string, string_length);
+  string[string_length - 1] = '\0';
+
+  char* ntoken = string;
+
+  while ((ntoken = strchr (string, separator)))
+    {
+      size_t token_length = ntoken - string;
+      /* Make this substring look like a string.  */
+      *ntoken = '\0';
+      aarch64_parse_one_override_token (string, token_length, tune);
+      string = ++ntoken;
+    }
+
+  /* One last option to parse.  */
+  aarch64_parse_one_override_token (string, strlen (string), tune);
+  free (string_root);
+}
 
 /* Implement TARGET_OPTION_OVERRIDE.  */
 
@@ -7206,6 +7426,10 @@ aarch64_override_options (void)
   aarch64_tune_params = *(selected_tune->tune);
   aarch64_architecture_version = selected_cpu->architecture_version;
 
+  if (aarch64_override_tune_string)
+    aarch64_parse_override_string (aarch64_override_tune_string,
+				   &aarch64_tune_params);
+
   if (aarch64_fix_a53_err835769 == 2)
     {
 #ifdef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 6d72ac2..98ef9f6 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -111,6 +111,10 @@ mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)
 -mabi=ABI	Generate code that conforms to the specified ABI
 
+moverride=
+Target RejectNegative ToLower Joined Var(aarch64_override_tune_string)
+-moverride=STRING	Power users only! Override CPU optimization parameters
+
 Enum
 Name(aarch64_abi) Type(int)
 Known AArch64 ABIs (for use with the -mabi= option):
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b99ab1c..3e77036 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12520,6 +12520,15 @@ Enable Privileged Access Never support.
 Enable Limited Ordering Regions support.
 @item rdma
 Enable ARMv8.1 Advanced SIMD instructions.
+
+@item -moverride=@var{string}
+@opindex master
+Override tuning decisions made by the back-end in response to a
+@option{-mtune=} switch.  The syntax, semantics, and accepted values
+for @var{string} in this option are not guaranteed to be consistent
+across releases.
+
+This option is only intended to be useful when developing GCC.
 @end table
 
 @node Adapteva Epiphany Options

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property
  2015-06-23  8:49 ` [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property James Greenhalgh
@ 2015-06-26 12:41   ` Marcus Shawcroft
  0 siblings, 0 replies; 9+ messages in thread
From: Marcus Shawcroft @ 2015-06-26 12:41 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw

On 23 June 2015 at 09:49, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> The FMA steering pass should be enabled through the tuning structures
> rather than be an intrinsic property of the core.  This patch moves
> the control of the pass to the tuning structures - turning it off for
> everything other than a Cortex-A57 system (i.e. -mcpu=cortex-a57
> or -mcpu=cortex-a57.cortex-a53).
>
> Some CPU's share the cortexa57 tuning structs, but do not use this
> steering pass. For those I've taken a copy of the cortexa57 tuning
> structures and called it cortexa72.
>
> Tested with a compiler build and all known values of -mcpu to make sure
> the pass runs in the expected configurations.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.h (AARCH64_FL_USE_FMA_STEERING_PASS): Delete.
>         (aarch64_tune_flags): Likewise.
>         (AARCH64_TUNE_FMA_STEERING): Likewise.
>         * config/aarch64/aarch64-cores.def (cortex-a57): Remove reference
>         to AARCH64_FL_USE_FMA_STEERING_PASS.
>         (cortex-a57.cortex-a53): Likewise.
>         (cortex-a72): Use cortexa72_tunings.
>         (cortex-a72.cortex-a53): Likewise.
>         (exynos-m1): Likewise.
>         * config/aarch64/aarch64-protos.h (tune_params): Add
>         a field: extra_tuning_flags.
>         * config/aarch64/aarch64-tuning-flags.def: New.
>         * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNING_OPTION): New.
>         (aarch64_extra_tuning_flags): Likewise.
>         (aarch64_tune_params): Declare here.
>         * config/aarch64/aarch64.c (generic_tunings): Set extra_tuning_flags.
>         (cortexa53_tunings): Likewise.
>         (cortexa57_tunings): Likewise.
>         (thunderx_tunings): Likewise.
>         (xgene1_tunings): Likewise.
>         (cortexa72_tunings): New.
>         * config/aarch64/cortex-a57-fma-steering.c: Include aarch64-protos.h.
>          (gate): Check against aarch64_tune_params.
>         * config/aarch64/t-aarch64 (cortex-a57-fma-steering.o): Depend on
>         aarch64-protos.h.
>

OK /Marcus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file
  2015-06-23  8:50 ` [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file James Greenhalgh
@ 2015-06-26 12:41   ` Marcus Shawcroft
  0 siblings, 0 replies; 9+ messages in thread
From: Marcus Shawcroft @ 2015-06-26 12:41 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 23 June 2015 at 09:49, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch moves the instruction fusion pairs from a set of #defines
> to an enum which we can generate from a .def file.
>
> We'll use that .def file again, and the friendly names it introduces
> shortly.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-fusion-pairs.def: New.
>         * config/aarch64/aarch64-protos.h (aarch64_fusion_pairs): New.
>         * config/aarch64/aarch64.c (AARCH64_FUSE_NOTHING): Move to
>         aarch64_fusion_pairs.
>         (AARCH64_FUSE_MOV_MOVK): Likewise.
>         (AARCH64_FUSE_ADRP_ADD): Likewise.
>         (AARCH64_FUSE_MOVK_MOVK): Likewise.
>         (AARCH64_FUSE_ADRP_LDR): Likewise.
>         (AARCH64_FUSE_CMP_BRANCH): Likewise.
>

OK /Marcus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch AArch64 3/4] De-const-ify struct tune_params
  2015-06-23  8:50 ` [Patch AArch64 3/4] De-const-ify struct tune_params James Greenhalgh
@ 2015-06-26 12:44   ` Marcus Shawcroft
  0 siblings, 0 replies; 9+ messages in thread
From: Marcus Shawcroft @ 2015-06-26 12:44 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 23 June 2015 at 09:49, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> If we want to overwrite parts of this structure, we're going to need it
> to be more malleable than it is presently.
>
> Run through and remove const from each of the members, create a non-const
> tuning structure we can modify, and set aarch64_tune_params to always
> point to this new structure. Change the -mtune parsing code to take a
> copy of the tuning structure in use rather than just taking the
> reference from within the processor struct. Change all the current
> users of aarch64_tune_params which no longer need to dereference a
> pointer.
>
> Checked on aarch64-none-linux-gnueabi with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-protos.h (tune_params): Remove
>         const from members.
>         (aarch64_tune_params): Remove const, change to no longer be
>         a pointer.
>         * config/aarch64/aarch64.c (aarch64_tune_params): Remove const,
>         change to no longer be a pointer, initialize to generic_tunings.
>         (aarch64_min_divisions_for_recip_mul): Change dereference of
>         aarch64_tune_params to member access.
>         (aarch64_reassociation_width): Likewise.
>         (aarch64_rtx_mult_cost): Likewise.
>         (aarch64_address_cost): Likewise.
>         (aarch64_branch_cost): Likewise.
>         (aarch64_rtx_costs): Likewise.
>         (aarch64_register_move_cost): Likewise.
>         (aarch64_memory_move_cost): Likewise.
>         (aarch64_sched_issue_rate): Likewise.
>         (aarch64_builtin_vectorization_cost): Likewise.
>         (aarch64_override_options): Take a copy of the selected tuning
>         struct in to aarch64_tune_params, rather than just setting
>         a pointer, change dereferences of aarch64_tune_params to member
>         accesses.
>         (aarch64_override_options_after_change): Change dereferences of
>         aarch64_tune_params to member access.
>         (aarch64_macro_fusion_p): Likewise.
>         (aarch_macro_fusion_pair_p): Likewise.
>         * config/aarch64/cortex-a57-fma-steering.c (gate): Likewise.
>

OK
/Marcus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering
  2015-06-23  8:52 ` [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering James Greenhalgh
@ 2015-06-26 12:52   ` Marcus Shawcroft
  0 siblings, 0 replies; 9+ messages in thread
From: Marcus Shawcroft @ 2015-06-26 12:52 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 23 June 2015 at 09:49, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This final patch adds support for the new command line option
> "-moverride". The purpose of this command line is to allow expert-level users
> of the compiler, and those comfortable with experimenting with the compiler,
> *unsupported* full access to the tuning structures used in the AArch64
> back-end.
>
> For now, we only enable command-line access to the fusion pairs to
> enable and whether or not to use the Cortex-A57 FMA register renaming
> pass. Though in future we can expand this further.
>
> With this patch, you might write something like:
>
>   -moverride=fuse=adrp+add.cmp+branch:tune=rename_fma_regs
>
> To enable fusion of adrp+add and cmp+branch and to enable the
> fma-rename pass.
>
> I've bootstrapped and tested the patch set on aarch64-none-linux-gnu
> with BOOT_CFLAGS set to the example string above, and again in the
> standard configuration with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-06-23  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.opt: (override): New.
>         * doc/invoke.texi (override): Document.
>         * config/aarch64/aarch64.c (aarch64_flag_desc): New
>         (aarch64_fusible_pairs): Likewise.
>         (aarch64_tuning_flags): Likewise.
>         (aarch64_tuning_override_function): Likewise.
>         (aarch64_tuning_override_functions): Likewise.
>         (aarch64_parse_one_option_token): Likewise.
>         (aarch64_parse_boolean_options): Likewise.
>         (aarch64_parse_fuse_string): Likewise.
>         (aarch64_parse_tune_string): Likewise.
>         (aarch64_parse_one_override_token): Likewise.
>         (aarch64_parse_override_string): Likewise.
>         (aarch64_override_options): Parse the -override string if it
>         is present.
>

+static const struct aarch64_tuning_override_function
+  aarch64_tuning_override_functions[] =

The indentation looks odd here, but otherwise OK /Marcus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-26 12:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-23  8:49 [Patch AArch64 0/4] Add "-moverride" option for overriding tuning parameters James Greenhalgh
2015-06-23  8:49 ` [Patch AArch64 2/4] Control the FMA steering pass in tuning structures rather than as core property James Greenhalgh
2015-06-26 12:41   ` Marcus Shawcroft
2015-06-23  8:50 ` [Patch AArch64 3/4] De-const-ify struct tune_params James Greenhalgh
2015-06-26 12:44   ` Marcus Shawcroft
2015-06-23  8:50 ` [Patch AArch64 1/4] Define candidates for instruction fusion in a .def file James Greenhalgh
2015-06-26 12:41   ` Marcus Shawcroft
2015-06-23  8:52 ` [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering James Greenhalgh
2015-06-26 12:52   ` Marcus Shawcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).