public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [AArch64 00/14] Pipeline-independent changes for XGene-1
@ 2014-02-18 21:10 Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 04/14] Correct the maximum shift amount for shifted operands Philipp Tomsich
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

The following patch-set contains the pipeline-independent changes to gcc
to support the APM XGene-1 and contains various enhancements derived from
real-world applications and benchmarks running on XGene-1.

As the pipeline model has not been fully adapted to the new instruction
typing shared between the ARM backend and the AArch64 backend, it is not
yet contained in these patches.

The most controversial part of these patches will likely consist in the
new cost-model, which has intentionally been provided as a "hook" that
intercepts the current cost-model when compiling for XGene-1. Given that
the matching/structure of this cost-model is different from the existing
implementation, we've chosen to keep this in a separate function for the
time being.


Philipp Tomsich (14):
  Use "generic" target, if no other default.
  Add "xgene1" core identifier.
  Retrieve BRANCH_COST from tuning structure.
  Correct the maximum shift amount for shifted operands.
  Add AArch64 'prefetch'-pattern.
  Extend '*tb<optab><mode>1'.
  Define additional patterns for adds/subs.
  Define a variant of cmp for the CC_NZ case.
  Add special cases of zero-extend w/ compare operations.
  Add mov<mode>cc definition for GPF case.
  Optimize and(s) patterns for HI/QI operands.
  Generate 'bics', when only interested in CC_NZ.
  Initial tuning description for XGene-1 core.
  Add cost-model for XGene-1.

 gcc/config/aarch64/aarch64-cores.def |   1 +
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.c         | 922 ++++++++++++++++++++++++++++++++++-
 gcc/config/aarch64/aarch64.h         |  10 +-
 gcc/config/aarch64/aarch64.md        | 246 +++++++++-
 gcc/config/aarch64/iterators.md      |   2 +
 gcc/config/arm/types.md              |   2 +
 8 files changed, 1172 insertions(+), 15 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 06/14] Extend '*tb<optab><mode>1'.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (2 preceding siblings ...)
  2014-02-18 21:10 ` [AArch64 02/14] Add "xgene1" core identifier Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:19   ` Andrew Pinski
  2014-02-18 21:10 ` [AArch64 07/14] Define additional patterns for adds/subs Philipp Tomsich
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

The '*tb<optab><mode>1' can safely be extended to match operands of
any size, as long as the immediate operand (i.e. the bits tested)
match the size of the register operand.

This removes unnecessary zero-extension operations from the
generated instruction stream.
---
 gcc/config/aarch64/aarch64.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b972a1b..90f1ee9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -382,14 +382,14 @@
 
 (define_insn "*tb<optab><mode>1"
   [(set (pc) (if_then_else
-	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
+	      (EQL (zero_extract:DI (match_operand:ALLI 0 "register_operand" "r")
 				    (const_int 1)
 				    (match_operand 1 "const_int_operand" "n"))
 		   (const_int 0))
 	     (label_ref (match_operand 2 "" ""))
 	     (pc)))
    (clobber (match_scratch:DI 3 "=r"))]
-  ""
+  "(UINTVAL(operands[1]) < GET_MODE_BITSIZE(<MODE>mode))"
   "*
   if (get_attr_length (insn) == 8)
     return \"ubfx\\t%<w>3, %<w>0, %1, #1\;<cbz>\\t%<w>3, %l2\";
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 03/14] Retrieve BRANCH_COST from tuning structure.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 04/14] Correct the maximum shift amount for shifted operands Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 02/14] Add "xgene1" core identifier Philipp Tomsich
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

The BRANCH_COST affects whether conditional instructions (e.g.
conditional moves) will be used in transforms in the middle-end.
This change makes the branch_cost configurable from within the
target tuning structure.
---
 gcc/config/aarch64/aarch64-protos.h |  2 ++
 gcc/config/aarch64/aarch64.c        | 13 +++++++++++--
 gcc/config/aarch64/aarch64.h        |  6 ++++--
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 5542f02..185bc64 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -157,6 +157,7 @@ struct tune_params
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
   const int issue_rate;
+  const int branch_cost;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -227,6 +228,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 void aarch64_init_expanders (void);
 void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);
+int aarch64_branch_cost (int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 70dda00..43e4612 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -222,7 +222,8 @@ static const struct tune_params generic_tunings =
   &generic_regmove_cost,
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (branch_cost, 2)
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -232,7 +233,8 @@ static const struct tune_params cortexa53_tunings =
   &generic_regmove_cost,
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (branch_cost, 2)
 };
 
 /* A processor implementing AArch64.  */
@@ -4891,6 +4893,13 @@ aarch64_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
   return regmove_cost->FP2FP;
 }
 
+int
+aarch64_branch_cost(int speed_p, int predictable_p)
+{
+  return (!(speed_p) ? 2 : (predictable_p) ? 0 : aarch64_tune_params->branch_cost);
+}
+
+
 static int
 aarch64_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
 			  reg_class_t rclass ATTRIBUTE_UNUSED,
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b66a6b4..fbdf745 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -765,8 +765,10 @@ do {									     \
 #define MEMORY_MOVE_COST(M, CLASS, IN) \
   (GET_MODE_SIZE (M) < 8 ? 8 : GET_MODE_SIZE (M))
 
-/* To start with.  */
-#define BRANCH_COST(SPEED_P, PREDICTABLE_P) 2
+/* A C expression for the cost of a branch instruction.  A value of 1
+   is the default; other values are interpreted relative to that.  */
+#define BRANCH_COST(speed_p, predictable_p) \
+  (aarch64_branch_cost(speed_p, predictable_p))
 \f
 
 /* Assembly output.  */
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 01/14] Use "generic" target, if no other default.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (4 preceding siblings ...)
  2014-02-18 21:10 ` [AArch64 07/14] Define additional patterns for adds/subs Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-21 14:02   ` Kyrill Tkachov
  2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

The default target should be "generic", as Cortex-A53 includes
optional ISA features (CRC and CRYPTO) that are not required for
architectural compliance. The key difference between generic (which
already uses the cortexa53 pipeline model for scheduling) is the
absence of any optional ISA features in the "generic" target.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 gcc/config/aarch64/aarch64.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 784bfa3..70dda00 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5244,7 +5244,7 @@ aarch64_override_options (void)
 
   /* If the user did not specify a processor, choose the default
      one for them.  This will be the CPU set during configuration using
-     --with-cpu, otherwise it is "cortex-a53".  */
+     --with-cpu, otherwise it is "generic".  */
   if (!selected_cpu)
     {
       selected_cpu = &all_cores[TARGET_CPU_DEFAULT & 0x3f];
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 13c424c..b66a6b4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -472,10 +472,10 @@ enum target_cpus
   TARGET_CPU_generic
 };
 
-/* If there is no CPU defined at configure, use "cortex-a53" as default.  */
+/* If there is no CPU defined at configure, use "generic" as default.  */
 #ifndef TARGET_CPU_DEFAULT
 #define TARGET_CPU_DEFAULT \
-  (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
+  (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
 #endif
 
 /* The processor for which instructions should be scheduled.  */
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 02/14] Add "xgene1" core identifier.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 04/14] Correct the maximum shift amount for shifted operands Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 03/14] Retrieve BRANCH_COST from tuning structure Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:10 ` [AArch64 06/14] Extend '*tb<optab><mode>1' Philipp Tomsich
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

* aarch64/aarch64-cores.def: Add "xgene1".
---
 gcc/config/aarch64/aarch64-cores.def | 1 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 1039660..b4f6c16 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -36,6 +36,7 @@
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa53)
 AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, generic)
+AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FPSIMD, generic)
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index b7e40e0..a79d403 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa15,cortexa57cortexa53"
+	"cortexa53,cortexa15,xgene1,cortexa57cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 07/14] Define additional patterns for adds/subs.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (3 preceding siblings ...)
  2014-02-18 21:10 ` [AArch64 06/14] Extend '*tb<optab><mode>1' Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:19   ` Andrew Pinski
  2014-02-18 21:10 ` [AArch64 01/14] Use "generic" target, if no other default Philipp Tomsich
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

---
 gcc/config/aarch64/aarch64.md | 49 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 90f1ee9..13a75d3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1138,6 +1138,22 @@
   [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
 )
 
+;; alternative using the condition output only
+(define_insn "*add<mode>3_compare0_internal2"
+  [(set (reg:CC_NZ CC_REGNUM)
+        (compare:CC_NZ
+         (plus:GPI (match_operand:GPI 1 "register_operand" "%r,r,r")
+                   (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
+	 (const_int 0)))
+   (clobber (match_scratch:GPI 0 "=r,r,r"))]
+  ""
+  "@
+  adds\\t%<w>0, %<w>1, %<w>2
+  adds\\t%<w>0, %<w>1, %<w>2
+  subs\\t%<w>0, %<w>1, #%n2"
+  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
+)
+
 ;; zero_extend version of above
 (define_insn "*addsi3_compare0_uxtw"
   [(set (reg:CC_NZ CC_REGNUM)
@@ -1155,6 +1171,39 @@
   [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
 )
 
+;; variant of the above using a swapped condition/comparator
+(define_insn "*addsi3_compare0_uxtw_zeswp"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+        (compare:CC_ZESWP
+         (plus:SI (match_operand:SI 1 "register_operand" "%r,r,r")
+                  (match_operand:SI 2 "aarch64_plus_operand" "r,I,J"))
+	 (const_int 0)))
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
+        (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
+  ""
+  "@
+  adds\\t%w0, %w1, %w2
+  adds\\t%w0, %w1, %w2
+  subs\\t%w0, %w1, #%n2"
+  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
+)
+
+;; alternative using the condition output only
+(define_insn "*addsi3_compare0_uxtw_zeswp_internal2"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+         (compare:CC_ZESWP
+	 (plus:SI (match_operand:SI 1 "register_operand" "%r,r,r")
+                  (match_operand:SI 2 "aarch64_plus_operand" "r,I,J"))
+	 (const_int 0)))
+   (clobber (match_scratch:DI 0 "=r,r,r"))]
+  ""
+  "@
+  adds\\t%w0, %w1, %w2
+  adds\\t%w0, %w1, %w2
+  subs\\t%w0, %w1, #%n2"
+  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
+)
+
 (define_insn "*adds_mul_imm_<mode>"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (5 preceding siblings ...)
  2014-02-18 21:10 ` [AArch64 01/14] Use "generic" target, if no other default Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:18   ` Andrew Pinski
                     ` (2 more replies)
  2014-02-18 21:26 ` [AArch64 10/14] Add mov<mode>cc definition for GPF case Philipp Tomsich
                   ` (8 subsequent siblings)
  15 siblings, 3 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

---
 gcc/config/aarch64/aarch64.md | 17 +++++++++++++++++
 gcc/config/arm/types.md       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99a6ac8..b972a1b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -293,6 +293,23 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "register_operand" "r")
+	     (match_operand:QI 1 "const_int_operand" "n")
+	     (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  if (INTVAL(operands[2]) == 0)
+     /* no temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : \"prfm\\tPLDL1KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index cc39cd1..1d1280d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -117,6 +117,7 @@
 ; mvn_shift_reg      inverting move instruction, shifted operand by a register.
 ; no_insn            an insn which does not represent an instruction in the
 ;                    final output, thus having no impact on scheduling.
+; prefetch	     a prefetch instruction
 ; rbit               reverse bits.
 ; rev                reverse bytes.
 ; sdiv               signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 04/14] Correct the maximum shift amount for shifted operands.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
@ 2014-02-18 21:10 ` Philipp Tomsich
  2014-02-18 21:20   ` Andrew Pinski
  2014-02-18 21:10 ` [AArch64 03/14] Retrieve BRANCH_COST from tuning structure Philipp Tomsich
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

---
 gcc/config/aarch64/aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 43e4612..4327eb3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4409,7 +4409,7 @@ aarch64_output_casesi (rtx *operands)
 int
 aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
 {
-  if (shift >= 0 && shift <= 3)
+  if (shift >= 0 && shift <= 4)
     {
       int size;
       for (size = 8; size <= 32; size *= 2)
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
@ 2014-02-18 21:18   ` Andrew Pinski
  2014-02-28  8:58   ` Gopalasubramanian, Ganesh
  2014-02-28  9:14   ` Gopalasubramanian, Ganesh
  2 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:18 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:

Can you add a testcase or two for this?

Thanks,
Andrew


> ---
>  gcc/config/aarch64/aarch64.md | 17 +++++++++++++++++
>  gcc/config/arm/types.md       |  2 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 99a6ac8..b972a1b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -293,6 +293,23 @@
>    [(set_attr "type" "no_insn")]
>  )
>
> +(define_insn "prefetch"
> +  [(prefetch (match_operand:DI 0 "register_operand" "r")
> +            (match_operand:QI 1 "const_int_operand" "n")
> +            (match_operand:QI 2 "const_int_operand" "n"))]
> +  ""
> +  "*
> +{
> +  if (INTVAL(operands[2]) == 0)
> +     /* no temporal locality */
> +     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
> +
> +  /* temporal locality */
> +  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : \"prfm\\tPLDL1KEEP, [%0, #0]\";
> +}"
> +  [(set_attr "type" "prefetch")]
> +)
> +
>  (define_insn "trap"
>    [(trap_if (const_int 1) (const_int 8))]
>    ""
> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index cc39cd1..1d1280d 100644
> --- a/gcc/config/arm/types.md
> +++ b/gcc/config/arm/types.md
> @@ -117,6 +117,7 @@
>  ; mvn_shift_reg      inverting move instruction, shifted operand by a register.
>  ; no_insn            an insn which does not represent an instruction in the
>  ;                    final output, thus having no impact on scheduling.
> +; prefetch          a prefetch instruction
>  ; rbit               reverse bits.
>  ; rev                reverse bytes.
>  ; sdiv               signed division.
> @@ -553,6 +554,7 @@
>    call,\
>    clz,\
>    no_insn,\
> +  prefetch,\
>    csel,\
>    crc,\
>    extend,\
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 07/14] Define additional patterns for adds/subs.
  2014-02-18 21:10 ` [AArch64 07/14] Define additional patterns for adds/subs Philipp Tomsich
@ 2014-02-18 21:19   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:19 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:

Can you add a testcase or two for this?  This should show why they are
not matching before hand.

Thanks,
Andrew


> ---
>  gcc/config/aarch64/aarch64.md | 49 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 90f1ee9..13a75d3 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1138,6 +1138,22 @@
>    [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
>  )
>
> +;; alternative using the condition output only
> +(define_insn "*add<mode>3_compare0_internal2"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +        (compare:CC_NZ
> +         (plus:GPI (match_operand:GPI 1 "register_operand" "%r,r,r")
> +                   (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
> +        (const_int 0)))
> +   (clobber (match_scratch:GPI 0 "=r,r,r"))]
> +  ""
> +  "@
> +  adds\\t%<w>0, %<w>1, %<w>2
> +  adds\\t%<w>0, %<w>1, %<w>2
> +  subs\\t%<w>0, %<w>1, #%n2"
> +  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*addsi3_compare0_uxtw"
>    [(set (reg:CC_NZ CC_REGNUM)
> @@ -1155,6 +1171,39 @@
>    [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
>  )
>
> +;; variant of the above using a swapped condition/comparator
> +(define_insn "*addsi3_compare0_uxtw_zeswp"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +        (compare:CC_ZESWP
> +         (plus:SI (match_operand:SI 1 "register_operand" "%r,r,r")
> +                  (match_operand:SI 2 "aarch64_plus_operand" "r,I,J"))
> +        (const_int 0)))
> +   (set (match_operand:DI 0 "register_operand" "=r,r,r")
> +        (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
> +  ""
> +  "@
> +  adds\\t%w0, %w1, %w2
> +  adds\\t%w0, %w1, %w2
> +  subs\\t%w0, %w1, #%n2"
> +  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
> +)
> +
> +;; alternative using the condition output only
> +(define_insn "*addsi3_compare0_uxtw_zeswp_internal2"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +         (compare:CC_ZESWP
> +        (plus:SI (match_operand:SI 1 "register_operand" "%r,r,r")
> +                  (match_operand:SI 2 "aarch64_plus_operand" "r,I,J"))
> +        (const_int 0)))
> +   (clobber (match_scratch:DI 0 "=r,r,r"))]
> +  ""
> +  "@
> +  adds\\t%w0, %w1, %w2
> +  adds\\t%w0, %w1, %w2
> +  subs\\t%w0, %w1, #%n2"
> +  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
> +)
> +
>  (define_insn "*adds_mul_imm_<mode>"
>    [(set (reg:CC_NZ CC_REGNUM)
>         (compare:CC_NZ
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 06/14] Extend '*tb<optab><mode>1'.
  2014-02-18 21:10 ` [AArch64 06/14] Extend '*tb<optab><mode>1' Philipp Tomsich
@ 2014-02-18 21:19   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:19 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> The '*tb<optab><mode>1' can safely be extended to match operands of
> any size, as long as the immediate operand (i.e. the bits tested)
> match the size of the register operand.
>
> This removes unnecessary zero-extension operations from the
> generated instruction stream.

Can you add a testcase or two for this?

Thanks,
Andrew

> ---
>  gcc/config/aarch64/aarch64.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b972a1b..90f1ee9 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -382,14 +382,14 @@
>
>  (define_insn "*tb<optab><mode>1"
>    [(set (pc) (if_then_else
> -             (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
> +             (EQL (zero_extract:DI (match_operand:ALLI 0 "register_operand" "r")
>                                     (const_int 1)
>                                     (match_operand 1 "const_int_operand" "n"))
>                    (const_int 0))
>              (label_ref (match_operand 2 "" ""))
>              (pc)))
>     (clobber (match_scratch:DI 3 "=r"))]
> -  ""
> +  "(UINTVAL(operands[1]) < GET_MODE_BITSIZE(<MODE>mode))"
>    "*
>    if (get_attr_length (insn) == 8)
>      return \"ubfx\\t%<w>3, %<w>0, %1, #1\;<cbz>\\t%<w>3, %l2\";
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 04/14] Correct the maximum shift amount for shifted operands.
  2014-02-18 21:10 ` [AArch64 04/14] Correct the maximum shift amount for shifted operands Philipp Tomsich
@ 2014-02-18 21:20   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:20 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:

Can you add a testcase or two for this?

Thanks,
Andrew


> ---
>  gcc/config/aarch64/aarch64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 43e4612..4327eb3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4409,7 +4409,7 @@ aarch64_output_casesi (rtx *operands)
>  int
>  aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
>  {
> -  if (shift >= 0 && shift <= 3)
> +  if (shift >= 0 && shift <= 4)
>      {
>        int size;
>        for (size = 8; size <= 32; size *= 2)
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 10/14] Add mov<mode>cc definition for GPF case.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (6 preceding siblings ...)
  2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
@ 2014-02-18 21:26 ` Philipp Tomsich
  2014-02-18 21:40   ` Andrew Pinski
  2014-02-18 21:27 ` [AArch64 11/14] Optimize and(s) patterns for HI/QI operands Philipp Tomsich
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

---
 gcc/config/aarch64/aarch64.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c72d123..b6453b6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2460,6 +2460,25 @@
   }
 )
 
+(define_expand "mov<mode>cc"
+  [(set (match_operand:GPF 0 "register_operand" "")
+  (if_then_else:GPF (match_operand 1 "aarch64_comparison_operator" "")
+                    (match_operand:GPF 2 "register_operand" "")
+                    (match_operand:GPF 3 "register_operand" "")))]
+  ""
+  {
+    rtx ccreg;
+    enum rtx_code code = GET_CODE (operands[1]);
+
+    if (code == UNEQ || code == LTGT)
+      FAIL;
+
+    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+    	      			        XEXP (operands[1], 1));
+    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+  }
+)
+
 (define_expand "mov<GPF:mode><GPI:mode>cc"
   [(set (match_operand:GPI 0 "register_operand" "")
 	(if_then_else:GPI (match_operand 1 "aarch64_comparison_operator" "")
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 11/14] Optimize and(s) patterns for HI/QI operands.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (7 preceding siblings ...)
  2014-02-18 21:26 ` [AArch64 10/14] Add mov<mode>cc definition for GPF case Philipp Tomsich
@ 2014-02-18 21:27 ` Philipp Tomsich
  2014-02-18 21:41   ` Andrew Pinski
  2014-02-18 21:28 ` [AArch64 14/14] Add cost-model for XGene-1 Philipp Tomsich
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

HImode and QImode operands can be handled in a more optimal way for
logical AND than for logical OR operations. An AND will never set
bits that are not already set in its operands, so the resulting
mode/precision depends on the least precision of its operands with
an implicit zero-extension to any larger precision.

These patterns help to avoid unnecessary zero-extension operations
on benchmarks, including some SPEC workloads.
---
 gcc/config/aarch64/aarch64.md   | 62 ++++++++++++++++++++++++++++++++++++++---
 gcc/config/aarch64/iterators.md |  2 ++
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b6453b6..6feedd3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2551,8 +2551,8 @@
 
 (define_insn "<optab><mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r,rk")
-	(LOGICAL:GPI (match_operand:GPI 1 "register_operand" "%r,r")
-		     (match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>")))]
+	(OR:GPI (match_operand:GPI 1 "register_operand" "%r,r")
+		(match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>")))]
   ""
   "<logical>\\t%<w>0, %<w>1, %<w>2"
   [(set_attr "type" "logic_reg,logic_imm")]
@@ -2569,6 +2569,27 @@
   [(set_attr "type" "logic_reg,logic_imm")]
 )
 
+;; specialized form of AND for HI and QI
+(define_insn "and<mode>3"
+  [(set (match_operand:ALLI 0 "register_operand" "=r,rk")
+        (and:ALLI (match_operand:ALLI 1 "register_operand" "%r,r")
+                  (match_operand:ALLI 2 "aarch64_logical_operand" "r,<andconst>")))]
+  ""
+  "and\\t%<w>0, %<w>1, %<w>2"
+  [(set_attr "type" "logic_reg,logic_imm")]
+)
+
+;; zero_extend version of above
+(define_insn "*and<mode>3_zeroextend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (zero_extend:GPI
+          (and:ALLX (match_operand:ALLX 1 "register_operand" "r")
+                    (match_operand:ALLX 2 "const_int_operand" "<andconst>"))))]
+  ""
+  "and\\t%w0, %w1, %w2"
+  [(set_attr "type" "logic_imm")]
+)
+
 (define_insn "*and<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
@@ -2582,12 +2603,28 @@
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
+;; special variant for HI and QI operators (implicitly zero-extending)
+(define_insn "*and<mode>3_compare0"
+  [(set (reg:CC_NZ CC_REGNUM)
+        (compare:CC_NZ
+                (and:GPI (match_operand:SHORT 1 "register_operand" "%r,r")
+                         (match_operand:SHORT 2 "aarch64_logical_operand" "r,<andconst>"))
+                (const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r,r")
+        (and:GPI (match_dup 1) (match_dup 2)))]
+  ""
+  "@
+   ands\\t%<w>0, %<w>1, %<w>2
+   ands\\t%<w>0, %<w>1, %2"
+  [(set_attr "type" "logic_reg,logic_imm")]
+)
+
 ;; zero_extend version of above
 (define_insn "*andsi3_compare0_uxtw"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
-	 (and:SI (match_operand:SI 1 "register_operand" "%r,r")
-		 (match_operand:SI 2 "aarch64_logical_operand" "r,K"))
+	 (and:SI (match_operand:ALLX 1 "register_operand" "%r,r")
+		 (match_operand:ALLX 2 "aarch64_logical_operand" "r,K"))
 	 (const_int 0)))
    (set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
@@ -2628,6 +2665,23 @@
   [(set_attr "type" "logics_shift_imm")]
 )
 
+;; specialized form for bitfield tests
+(define_insn "*ands<mode>3_zeroextract_internal2"
+  [(set (reg:CC_NZ CC_REGNUM)
+        (compare:CC_NZ
+         (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
+                           (match_operand 1 "const_int_operand" "n")
+                           (match_operand 2 "const_int_operand" "n"))
+         (const_int 0)))]
+  "aarch64_bitmask_imm((((HOST_WIDE_INT)1 << (UINTVAL(operands[1]))) - 1) << UINTVAL(operands[2]), <MODE>mode)"
+  "*
+  {
+    operands[3] = GEN_INT((((HOST_WIDE_INT)1 << (UINTVAL(operands[1]))) - 1) << UINTVAL(operands[2]));
+    return \"ands\\t<w>zr, %<w>0, %<w>3\";
+  }"
+  [(set_attr "type" "logics_reg")]
+)
+
 (define_insn "*<LOGICAL:optab>_<SHIFT:optab><mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(LOGICAL:GPI (SHIFT:GPI
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index f1339b8..edba829 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -341,6 +341,7 @@
 
 ;; Attribute to describe constants acceptable in logical operations
 (define_mode_attr lconst [(SI "K") (DI "L")])
+(define_mode_attr andconst [(QI "K") (HI "K") (SI "K") (DI "L")])
 
 ;; Map a mode to a specific constraint character.
 (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
@@ -627,6 +628,7 @@
 
 ;; Code iterator for logical operations
 (define_code_iterator LOGICAL [and ior xor])
+(define_code_iterator OR [ior xor])
 
 ;; Code iterator for sign/zero extension
 (define_code_iterator ANY_EXTEND [sign_extend zero_extend])
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 14/14] Add cost-model for XGene-1.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (8 preceding siblings ...)
  2014-02-18 21:27 ` [AArch64 11/14] Optimize and(s) patterns for HI/QI operands Philipp Tomsich
@ 2014-02-18 21:28 ` Philipp Tomsich
  2014-02-18 21:28 ` [AArch64 13/14] Initial tuning description for XGene-1 core Philipp Tomsich
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

This completely rewritten cost-model provides a like-for-like benefit
of approx. 3% on CoreMark.
---
 gcc/config/aarch64/aarch64.c | 885 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 881 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4c06f9b..d5bdc9e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -142,6 +142,8 @@ static bool aarch64_const_vec_all_same_int_p (rtx,
 static bool aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 						 const unsigned char *sel);
 
+static bool xgene1_rtx_costs (rtx, int, int, int, int*, bool);
+
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
@@ -198,8 +200,8 @@ static const struct cpu_regmove_cost xgene1_regmove_cost =
   /* We want all GP2FP and FP2GP moves to be handled by a reload.
      A direct move instruction will have similar microarchitectural
      cost to a store/load combination.  */
-  NAMED_PARAM (GP2FP, 4),
-  NAMED_PARAM (FP2GP, 4),
+  NAMED_PARAM (GP2FP, 8),
+  NAMED_PARAM (FP2GP, 8),
   /* We currently do not provide direct support for TFmode Q->Q move.
      Therefore we need to raise the cost above 2 in order to have
      reload handle the situation.  */
@@ -252,8 +254,7 @@ static const struct tune_params cortexa53_tunings =
 };
 
 /* We can't model the microarchitectural costs on XGene using  the default
-   cost model for AArch64.  So we leave the extra cost structure pointing
-   to the default cost model for the time being.  */
+   cost model for AArch64.  */
 static const struct tune_params xgene1_tunings =
 {
   &cortexa53_extra_costs,
@@ -4546,6 +4547,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
   const struct cpu_cost_table *extra_cost
     = aarch64_tune_params->insn_extra_cost;
 
+  if (selected_cpu->core == xgene1)
+    return xgene1_rtx_costs(x, code, outer, param, cost, speed);
+
   switch (code)
     {
     case SET:
@@ -8331,6 +8335,879 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
   return true;
 }
 
+/* This function aids the processing of an add/sub instruction that
+   may use the "extended register" or "shifted register" form.  For
+   many such cases, we can simply process the extend/shift as if it
+   were a separate isntruction, since the op cost is the same.
+   However, certain cases must be handled separately when the ops are
+   integrated into a single instruction.
+
+   Returns the inner operand if successful, or the original expression
+   on failure.  Also updates the cost if successful.  */
+static rtx
+xgene1_strip_extended_register (rtx op, int *cost, bool speed ATTRIBUTE_UNUSED, bool separate)
+{
+  /* If the operand is zero-extended from 32-bits, it is free. */
+  if (!separate
+      && GET_CODE (op) == ZERO_EXTEND
+      && GET_MODE (XEXP (op, 0)) == SImode)
+    return XEXP (op, 0);
+
+  /*  A stand-alone multiply costs 4 or 5, so GCC will choose a cheaper
+      shift if it can.  But GCC will not transform a multiply embedded
+      inside another operation such as (plus (mult X const)).  Instead,
+      aarch64.md recognizes it as an operation with an embedded shift,
+      and we charge a cost accordingly. */
+  if (GET_CODE (op) == MULT)
+    {
+      rtx op0 = XEXP (op, 0);
+      rtx op1 = XEXP (op, 1);
+
+      if (CONST_INT_P (op1)
+          && exact_log2 (INTVAL (op1)) > 0)
+        {
+          if (exact_log2 (INTVAL (op1)) <= 4)
+            {
+              *cost += COSTS_N_INSNS(1);
+
+              /* The extended register form can include a zero-
+                 or sign-extend for free. */
+              if (GET_CODE (op0) == ZERO_EXTEND
+                  || GET_CODE (op1) == SIGN_EXTEND)
+                return XEXP (op0, 0);
+              else
+                return op0;
+            }
+          else
+            {
+              /* The shifted register form can support a larger
+                 left shift, but cannot include a free extend. */
+              *cost += COSTS_N_INSNS(2);
+              return op0;
+            }
+        }
+    }
+
+  /* No candidates found.  Return op unchanged. */
+  return op;
+}
+
+/* Calculate the cost of calculating X, storing it in *COST.  Result
+   is true if the total cost of the operation has now been calculated.  */
+static bool
+xgene1_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
+                  int param ATTRIBUTE_UNUSED, int *cost, bool speed)
+{
+  rtx op0, op1, op2, addr;
+  int n_minus_1;
+  enum machine_mode mode;
+
+  /* Throw away the default cost and start over.  */
+  /* A size N times larger than UNITS_PER_WORD (rounded up) probably
+     needs N times as many ops, so it executes in N-1 extra
+     cycles.  */
+  mode = GET_MODE (x);
+  n_minus_1 = (GET_MODE_SIZE (mode) - 1) / UNITS_PER_WORD;
+  /* If the mode size is less than UNITS_PER_WORD, then n_minus_1 is
+     0, and the starting cost is 0.  This the default.  Instructions
+     then add cost above and beyond that value.  */
+  *cost = COSTS_N_INSNS(n_minus_1);
+
+  switch (code)
+    {
+    case REG:
+      /* Warning: rtx_cost won't actually ask for the cost of a
+         register.  It just assumes that the cost is 0.  So this code
+         may be useless.  */
+      /* a register has zero cost when used as part of an expression,
+         but it has a cost when copied to another register.  */
+      if (outer != SET)
+        *cost = 0;
+      else if (FLOAT_MODE_P (mode))
+        *cost += COSTS_N_INSNS(3); /* base cost */
+      else if (VECTOR_MODE_P (mode))
+        *cost += COSTS_N_INSNS(2); /* base cost */
+      else
+        *cost += COSTS_N_INSNS(1); /* base cost */
+      return true;
+
+    case CONST_INT:
+      /* If an instruction can incorporate a constant within the
+         instruction, the instruction's expression avoids calling
+         rtx_cost() on the constant.  If rtx_cost() is called on a
+         constant, then it's usually because the constant must be
+         moved into a register by one or more instructions.
+
+         The exception is constant 0, which usually can be expressed
+         as XZR/WZR with zero cost.  const0 occasionally has positive
+         cost, but we can't tell that here.  In particular, setting a
+         register to const0 costs an instruction, but that case
+         doesn't call this function anyway.  One compelling reason to
+         pretend that setting a register to 0 costs nothing is to get
+         the desired results in synth_mult() in expmed.c.  */
+      if (x == const0_rtx)
+        *cost = 0;
+      else
+        *cost += COSTS_N_INSNS(1); /* base cost */
+      return true;
+
+    case CONST_DOUBLE:
+      if (aarch64_float_const_representable_p(x))
+        *cost += COSTS_N_INSNS(3); /* MOVI when used by FP */
+      else
+        *cost += COSTS_N_INSNS(5); /* GCC loads the constant from
+				      memory.  */
+      return true;
+
+    case SET:
+      op0 = SET_DEST (x);
+      op1 = SET_SRC (x);
+
+      switch (GET_CODE (op0))
+	{
+	case MEM:
+          /* If the store data is not already in a register, get the
+             cost to prepare it.  */
+          *cost += rtx_cost (op1, SET, 1, speed);
+
+          /* Add the cost of complex addressing modes.  */
+          addr = XEXP(op0, 0);
+          *cost += aarch64_address_cost(addr, word_mode, 0, speed);
+	  return true;
+
+	case SUBREG:
+	  if (! REG_P (SUBREG_REG (op0)))
+	    *cost += rtx_cost (SUBREG_REG (op0), SET, 0, speed);
+	  /* Fall through. */
+
+	case REG:
+          if (GET_CODE (op1) == REG)
+            {
+              /* The cost is 1 per register copied.  */
+              /* Note that SET does not itself have a mode, so the
+                 previously calculated value of n_minus_1 is not
+                 useful.  */
+              n_minus_1 = (GET_MODE_SIZE (GET_MODE (SET_DEST (x))) - 1) / UNITS_PER_WORD;
+              *cost = COSTS_N_INSNS(n_minus_1 + 16);
+              return true;
+            }
+          else
+            {
+              /* Cost is just the cost of the RHS of the set (min 1).  */
+              *cost = rtx_cost (op1, SET, 0, speed);
+              return true;
+            }
+
+	case ZERO_EXTRACT: 
+	  /* Bit-field insertion.  */
+	  /* Strip any redundant widening of the RHS to meet the width
+	     of the target.  */
+	  if (GET_CODE (op1) == SUBREG)
+	    op1 = SUBREG_REG (op1);
+	  if ((GET_CODE (op1) == ZERO_EXTEND
+	       || GET_CODE (op1) == SIGN_EXTEND)
+	      && GET_CODE (XEXP (op0, 1)) == CONST_INT
+	      && (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
+		  >= INTVAL (XEXP (op0, 1))))
+	    op1 = XEXP (op1, 0);
+
+          if (CONST_INT_P (op1))
+            {
+              /* It must be a MOVK.  */
+              *cost += COSTS_N_INSNS(1);
+              return true;
+            }
+          else
+            {
+              /* It must be a BFM.  */
+              *cost += COSTS_N_INSNS(2);
+              *cost += rtx_cost (op1, ZERO_EXTRACT, 1, speed);
+              return true;
+            }
+
+	default:
+          *cost += COSTS_N_INSNS(1); /* default cost */
+          return false;
+	}
+
+    case MEM:
+      /* The base cost is the load latency.  */
+      if (GET_MODE_CLASS(GET_MODE(x)) == MODE_INT)
+        {
+          *cost += COSTS_N_INSNS(5);
+        }
+      else if (GET_MODE_CLASS(GET_MODE(x)) == MODE_FLOAT)
+        {
+          *cost += COSTS_N_INSNS(10);
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(8); /* default cost */
+        }
+
+      /* Add the cost of complex addressing modes.  */
+      addr = XEXP(x, 0);
+      *cost += aarch64_address_cost(addr, word_mode, 0, speed);
+      return true;
+
+    case COMPARE:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      /* We only get here if the compare is being used to set the CC
+         flags.  Compares within other instructions (e.g. cbz) are
+         subexpressions of if_then_else and are handled there.  */
+
+      if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT)
+        {
+          /* A write to the CC flags costs extra.  */
+          *cost += COSTS_N_INSNS(2); /* base cost */
+
+          /* CC_ZESWPmode supports zero extend for free.  */
+          if (GET_MODE (x) == CC_ZESWPmode && GET_CODE (op0) == ZERO_EXTEND)
+            op0 = XEXP (op0, 0);
+
+          /* Support for ANDS.  */
+          if (GET_CODE (op0) == AND)
+            {
+              x = op0;
+              goto cost_logic;
+            }
+
+          /* Support for TST that looks like zero extract.  */
+          if (GET_CODE (op0) == ZERO_EXTRACT)
+            {
+              *cost += rtx_cost (XEXP (op0, 0), ZERO_EXTRACT, 1, speed);
+              return true;
+            }
+
+          /* Support for ADDS (and CMN alias).  */
+          if (GET_CODE (op0) == PLUS)
+            {
+              x = op0;
+              goto cost_plus;
+            }
+
+          /* Support for SUBS.  */
+          if (GET_CODE (op0) == MINUS)
+            {
+              x = op0;
+              goto cost_minus;
+            }
+
+          /* Support for CMN.  */
+          if (GET_CODE (op1) == NEG)
+            {
+              *cost += rtx_cost (op0, COMPARE, 0, speed);
+              *cost += rtx_cost (XEXP (op1, 0), ZERO_EXTRACT, 1, speed);
+              return true;
+            }
+
+          /* Support for CMP (integer) */
+          /* Compare can freely swap the order of operands, and
+             canonicalization puts the more complex operation first.
+             But the integer MINUS logic expects the shift/extend
+             operation in op1.  */
+          if (! (REG_P (op0)
+                 || (GET_CODE (op0) == SUBREG && REG_P (SUBREG_REG (op0)))))
+          {
+            op0 = XEXP (x, 1);
+            op1 = XEXP (x, 0);
+          }
+          goto cost_minus_int;
+        }
+      
+      /* Support for CMP (FP) */
+      if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_FLOAT)
+        {
+          *cost += COSTS_N_INSNS(11);
+          if (aarch64_float_const_zero_rtx_p (op1))
+            {
+              /* fcmp supports constant 0.0 for no extra cost. */
+              return true;
+            }
+          return false;
+        }
+
+      *cost += COSTS_N_INSNS(2); /* default cost */
+      return false;
+
+    case NEG:
+      op0 = XEXP (x, 0);
+
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	{
+          *cost += COSTS_N_INSNS(1); /* base cost */
+
+          if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
+              || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
+            {
+              /* This looks like CSETM. */
+              *cost += rtx_cost (XEXP (op0, 0), NEG, 0, speed);
+              return true;
+            }
+
+          op0 = CONST0_RTX (GET_MODE (x));
+          op1 = XEXP (x, 0);
+          goto cost_minus_int;
+        }
+
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+        {
+          /* Support (neg(fma...)) as a single instruction only if
+             sign of zeros is unimportant.  This matches the decision
+             making in aarch64.md.  */
+          if (GET_CODE (op0) == FMA && !HONOR_SIGNED_ZEROS (GET_MODE (op0)))
+            {
+              *cost += rtx_cost (op0, NEG, 0, speed);
+              return true;
+            }
+
+          *cost += COSTS_N_INSNS(3); /* FNEG when used by FP */
+          return false;
+        }
+
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      return false;
+
+    case MINUS:
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	{
+          *cost += COSTS_N_INSNS(1); /* base cost */
+
+        cost_minus: /* the base cost must be set before entry here */
+          op0 = XEXP (x, 0);
+          op1 = XEXP (x, 1);
+
+        cost_minus_int: /* the base cost must be set before entry here */
+	  if (CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1)))
+	    {
+              /* A SUB instruction cannot combine a shift/extend
+                 operation with an immediate, so we assume that the
+                 shift/extend is a separate instruction.  */
+              *cost += rtx_cost (op0, MINUS, 1, speed);
+              return true;
+	    }
+
+          /* Unlike ADD, we normally expect MINUS to have the
+             shift/extend operand in op1.  */
+          op1 = xgene1_strip_extended_register (op1, cost, speed, false);
+
+          /* However, expmed.c performs some cost tests of shifted
+             register minus register.  Since this will require the
+             shift to take place in a separate instruction, we'd
+             normally evaluate the cost of the shift subexpression
+             independently.  However, expmed codes the shift as a
+             multiply, and we don't want to change the cost of an
+             indepedent multiply.  So instead we treat it as an
+             integrated subexpression, with the caveat that zero
+             extend is not free.  */
+          op0 = xgene1_strip_extended_register (op0, cost, speed, true);
+
+          *cost += rtx_cost (op0, PLUS, 0, speed);
+          *cost += rtx_cost (op1, PLUS, 1, speed);
+          return true;
+	}
+
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+	{
+          *cost += COSTS_N_INSNS(5); /* base cost */
+	  return false;
+	}
+
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      return false;
+
+    case PLUS:
+      if (FLOAT_MODE_P (mode))
+        {
+          *cost += COSTS_N_INSNS(5); /* base cost */
+          return false;
+        }
+      else if (VECTOR_MODE_P (mode))
+        {
+          *cost += COSTS_N_INSNS(3); /* base cost */
+          return false;
+        }
+      if (SCALAR_INT_MODE_P(mode))
+	{
+          *cost += COSTS_N_INSNS(1); /* base cost */
+
+        cost_plus: /* the base cost must be set before entry here */
+          op0 = XEXP (x, 0);
+          op1 = XEXP (x, 1);
+
+          if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
+              || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
+            {
+              /* This looks like CINC.  */
+              *cost += rtx_cost (XEXP (op0, 0), PLUS, 0, speed);
+              *cost += rtx_cost (op1, PLUS, 1, speed);
+              return true;
+            }
+
+	  if (CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1)))
+	    {
+              /* An ADD instruction cannot combine a shift/extend
+                 operation with an immediate, so we assume that the
+                 shift/extend is a separate instruction.  */
+              *cost += rtx_cost (op0, PLUS, 0, speed);
+              return true;
+	    }
+
+          /* We could handle multiply-add here, but the cost is the
+             same as handling them separately.  (At least, it is for
+             integers.)  */
+
+          op0 = xgene1_strip_extended_register (op0, cost, speed, false);
+
+          *cost += rtx_cost (op0, PLUS, 0, speed);
+          *cost += rtx_cost (op1, PLUS, 1, speed);
+          return true;
+        }
+
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      return false;
+
+    case XOR:
+    case AND:
+      *cost += COSTS_N_INSNS(1); /* base cost */
+
+    cost_logic: /* the base cost must be set before entry here */
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      /* Depending on the immediates, (and (mult X mult_imm) and_imm)
+         may be translated to UBFM/SBFM, so we set the cost
+         accordingly.  */
+      if (code == AND
+          && GET_CODE (op0) == MULT
+          && CONST_INT_P (XEXP (op0, 1))
+          && CONST_INT_P (op1)
+          && aarch64_uxt_size (exact_log2 (INTVAL (XEXP (op0, 1))),
+                               INTVAL (op1)) != 0)
+        {
+          /* This UBFM/SBFM form can be implemented with a
+	     single-cycle op.  */
+          *cost += rtx_cost (XEXP (op0, 0), ZERO_EXTRACT, 0, speed);
+          return true;
+        }
+
+      if (CONST_INT_P (op1)
+          && aarch64_bitmask_imm (INTVAL (op1), GET_MODE (x)))
+        {
+          /* A logical instruction cannot combine a NOT operation with
+             an immediate, so we assume that the NOT operation is a
+             separate instruction.  */
+          *cost += rtx_cost (op0, AND, 0, speed);
+          return true;
+        }
+
+      /* Handle ORN, EON, or BIC.  */
+      if (GET_CODE (op0) == NOT)
+        op0 = XEXP (op0, 0);
+
+      /* The logical instruction could have the shifted register form,
+         but the cost is the same if the shift is processed as a
+         separate instruction, so we don't bother with it here.  */
+
+      *cost += rtx_cost (op0, AND, 0, speed);
+      *cost += rtx_cost (op1, AND, 1, speed);
+      return true;
+
+    case NOT:
+      *cost += COSTS_N_INSNS(1); /* default cost */
+
+      /* The logical instruction could have the shifted register form,
+         but the cost is the same if the shift is processed as a separate
+         instruction, so we don't bother with it here.  */
+      return false;
+
+    case ZERO_EXTEND:
+      if (GET_MODE (x) == DImode
+          && GET_MODE (XEXP (x, 0)) == SImode
+          && outer == SET)
+	{
+          /* All ops that produce a 32-bit result can zero extend to 64-bits for free
+             when writing to a register.  */
+	  *cost = rtx_cost (XEXP (x, 0), SET, param, speed);
+
+          /* If we're simply zero extending a register,
+             that still costs a minimum of one instruction.  */
+          if (*cost == 0) *cost = COSTS_N_INSNS(1);
+	  return true;
+	}
+      else if (GET_CODE (XEXP (x, 0)) == MEM)
+	{
+          /* All loads can zero extend to any size for free.  */
+	  *cost = rtx_cost (XEXP (x, 0), SET, param, speed);
+	  return true;
+	}
+      else
+        {
+          *cost += COSTS_N_INSNS(1); /* base cost */
+          return false;
+        }
+
+    case SIGN_EXTEND:
+      /* If sign extension isn't under a shift operation and thus
+         handled specially, then the sign extension always requires a
+         separate 1-cycle op.  */
+      *cost += COSTS_N_INSNS(1); /* base cost */
+      return false;
+
+    case ASHIFT:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      /* (ashift (extend X) shift_imm)
+         may be translated to UBFM/SBFM which has additional powers.  */
+      if (CONST_INT_P (op1))
+        {
+          if (INTVAL (op1) <= 4)
+            *cost += COSTS_N_INSNS(1); /* base cost */
+          else
+            *cost += COSTS_N_INSNS(2); /* base cost */
+
+          /* UBFM/SBFM can incorporate zero/sign extend for free.  */
+          if (GET_CODE (op0) == ZERO_EXTEND
+              || GET_CODE (op1) == SIGN_EXTEND)
+            op0 = XEXP (op0, 0);
+
+          *cost += rtx_cost (op0, ASHIFT, 0, speed);
+          return true;
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(2); /* base cost */
+          return false;
+        }
+
+    case ROTATE:
+    case ROTATERT:
+    case LSHIFTRT:
+    case ASHIFTRT:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      *cost += COSTS_N_INSNS(2); /* base cost */
+
+      if (CONST_INT_P (op1))
+        {
+          *cost += rtx_cost (op0, ASHIFT, 0, speed);
+          return true;
+        }
+      else
+        {
+          return false;
+        }
+
+    case HIGH:
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      if (!CONSTANT_P (XEXP (x, 0)))
+	*cost += rtx_cost (XEXP (x, 0), HIGH, 0, speed);
+      return true;
+
+    case LO_SUM:
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      if (!CONSTANT_P (XEXP (x, 1)))
+	*cost += rtx_cost (XEXP (x, 1), LO_SUM, 1, speed);
+      *cost += rtx_cost (XEXP (x, 0), LO_SUM, 0, speed);
+      return true;
+
+    case ZERO_EXTRACT:
+    case SIGN_EXTRACT:
+      /* (extract (mult X mult_imm) extract_imm (const_int 0))
+         may be translated to UBFM/SBFM depending on the respective immediates.  */
+      /* For whatever reason, I never see this stand-alone, and I never see it
+         with zero_extract.  But "(sign_extract (mult ..." sometimes shows up
+         as part of a larger expression, e.g. under "(plus ...".  This includes
+         using it as part of memory addressing.  */
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op2 = XEXP (x, 2);
+      if (GET_CODE (op0) == MULT
+          && CONST_INT_P (op1)
+          && op2 == const0_rtx)
+        {
+          rtx mult_reg = XEXP (op0, 0);
+          rtx mult_imm = XEXP (op0, 1);
+          if (CONST_INT_P (mult_imm)
+              && aarch64_is_extend_from_extract (GET_MODE (x),
+                                                 mult_imm,
+                                                 op1))
+            {
+              /* This UBFM/SBFM form can be implemented with a single-cycle op.  */
+              *cost += COSTS_N_INSNS(1); /* base cost */
+              *cost += rtx_cost (mult_reg, ZERO_EXTRACT, 0, speed);
+              return true;
+            }
+        }
+
+      if (CONST_INT_P (op1)
+          && CONST_INT_P (op2))
+        {
+          /* This can be implemented with a UBFM/SBFM.  If it was a simple
+             zero- or sign-extend, then it would use code ZERO_EXTEND or
+             SIGN_EXTEND.  Since it doesn't, it must be something more
+             complex, so it requires 2-cycle latency.  */
+          *cost += COSTS_N_INSNS(2); /* base cost */
+          *cost += rtx_cost (XEXP (x, 0), ZERO_EXTRACT, 0, speed);
+          return true;
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(2); /* default cost */
+          return false;
+        }
+
+    case MULT:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+        {
+          /* FP multiply */
+          *cost += COSTS_N_INSNS(5); /* base cost */
+
+          /* FNMUL is free.  */
+          if (GET_CODE (op0) == NEG)
+            op0 = XEXP (op0, 0);
+
+          *cost += rtx_cost (op0, MULT, 0, speed);
+          *cost += rtx_cost (op1, MULT, 1, speed);
+          return true;
+        }
+      else if (GET_MODE (x) == DImode)
+        {
+          if (((GET_CODE (op0) == ZERO_EXTEND
+                && GET_CODE (op1) == ZERO_EXTEND)
+               || (GET_CODE (op0) == SIGN_EXTEND
+                   && GET_CODE (op1) == SIGN_EXTEND))
+              && GET_MODE (XEXP (op0, 0)) == SImode
+              && GET_MODE (XEXP (op1, 0)) == SImode)
+            {
+              /* 32-bit integer multiply with 64-bit result */
+              *cost += COSTS_N_INSNS(4);
+              *cost += rtx_cost (XEXP (op0, 0), MULT, 0, speed);
+              *cost += rtx_cost (XEXP (op1, 0), MULT, 1, speed);
+              return true;
+            }
+
+          if (GET_CODE (op0) == NEG
+              && ((GET_CODE (XEXP (op0, 0)) == ZERO_EXTEND
+                   && GET_CODE (op1) == ZERO_EXTEND)
+                  || (GET_CODE (XEXP (op0, 0)) == SIGN_EXTEND
+                      && GET_CODE (op1) == SIGN_EXTEND))
+              && GET_MODE (XEXP (XEXP (op0, 0), 0)) == SImode
+              && GET_MODE (XEXP (op1, 0)) == SImode)
+            {
+              /* 32-bit integer multiply with negated 64-bit result */
+              *cost += COSTS_N_INSNS(5);
+              *cost += rtx_cost (XEXP (XEXP (op0, 0), 0), MULT, 0, speed);
+              *cost += rtx_cost (XEXP (op1, 0), MULT, 1, speed);
+              return true;
+            }
+
+          /* 64-bit integer multiply */
+          *cost += COSTS_N_INSNS(5); /* base cost */
+        }
+      else if (GET_MODE (x) == SImode)
+        {
+          /* 32-bit integer multiply */
+          *cost += COSTS_N_INSNS(4); /* base cost */
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(5); /* default cost */
+        }
+      return false; /* All arguments need to be in registers.  */
+
+    case MOD:
+    case UMOD:
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+        {
+          /* integer mod = divide + mult + sub */
+          /* See DIV for notes on variable-latency divide.  */
+          if (GET_MODE (x) == SImode)
+            *cost += COSTS_N_INSNS (16 + 4 + 1);
+          else
+            *cost += COSTS_N_INSNS (16 + 5 + 1);
+        }
+      else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+        {
+          /* FP mod = divide + round + mult-sub */
+          if (GET_MODE (x) == SFmode)
+            *cost += COSTS_N_INSNS (22+1 + 5+1 + 5);
+          else
+            *cost += COSTS_N_INSNS (28+1 + 5+1 + 5);
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(16 + 5 + 1); /* default cost */
+        }
+      return false; /* All arguments need to be in registers.  */
+
+    case DIV:
+    case UDIV:
+    case SQRT:
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+        {
+          /* There is no integer SQRT, so only DIV and UDIV can get
+	     here.  */
+          /* Integer divide of a register is variable latency.  
+             Without data, I assume an average of 16 cycles.  */
+          /* Integer divide of a constant has known latency
+             depending on the constant.  However, GCC can't
+             won't pick a different instruction based on the
+             cost, so whatever.  */
+          *cost += COSTS_N_INSNS (16);
+        }
+      else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+        {
+          if (GET_MODE (x) == SFmode)
+            *cost += COSTS_N_INSNS (22+1);
+          else
+            *cost += COSTS_N_INSNS (28+1);
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS(16); /* default cost */
+        }
+      return false; /* All arguments need to be in registers.  */
+
+    case IF_THEN_ELSE:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op2 = XEXP (x, 2);
+
+      if (GET_CODE (op1) == PC || GET_CODE (op2) == PC)
+        {
+          /* conditional branch */
+          if (GET_MODE_CLASS (GET_MODE (XEXP (op0, 0))) == MODE_CC)
+            {
+              /* Regular conditional branch.  */
+              *cost += COSTS_N_INSNS (1); /* base cost */
+              return true;
+            }
+          else
+            {
+              /* The branch is not based on the condition codes, so it must be
+                 a compare and branch (cbz/cbnz or tbz/tbnz).  */
+              *cost += COSTS_N_INSNS (3); /* base cost */
+              return true;
+            }
+        }
+      else if (GET_MODE_CLASS (GET_MODE (XEXP (op0, 0))) == MODE_CC)
+        {
+          /* It's a conditional operation based on the status flags,
+             so it must be some flavor of CSEL.  */
+          *cost += COSTS_N_INSNS (1); /* base cost */
+
+          /* CSNEG, CSINV, and CSINC are handled for free as part of CSEL.  */
+          if (GET_CODE (op1) == NEG
+              || GET_CODE (op1) == NOT
+              || (GET_CODE (op1) == PLUS && XEXP (op1, 1) == const1_rtx))
+            op1 = XEXP (op1, 0);
+
+          /* If the remaining parameters are not registers,
+             get the cost to put them into registers.  */
+          *cost += rtx_cost (op1, IF_THEN_ELSE, 1, speed);
+          *cost += rtx_cost (op2, IF_THEN_ELSE, 2, speed);
+          return true;
+        }
+      else
+        {
+          *cost += COSTS_N_INSNS (1); /* default cost */
+          return true;
+        }
+
+    case EQ:
+    case NE:
+    case GT:
+    case GTU:
+    case LT:
+    case LTU:
+    case GE:
+    case GEU:
+    case LE:
+    case LEU:
+      /* This looks like a CSET. */
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	{
+          *cost += COSTS_N_INSNS(1); /* base cost */
+          return false; /* All arguments need to be in registers.  */
+        }
+
+      *cost += COSTS_N_INSNS(1); /* default cost */
+      return false;
+
+    case FMA:
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op2 = XEXP (x, 2);
+
+      *cost += COSTS_N_INSNS(5); /* base cost */
+
+      /* FMSUB, FNMADD, and FNMSUB are free.  */
+      if (GET_CODE (op0) == NEG)
+        op0 = XEXP (op0, 0);
+
+      if (GET_CODE (op2) == NEG)
+        op2 = XEXP (op2, 0);
+
+      /* If the remaining parameters are not registers,
+         get the cost to put them into registers.  */
+      *cost += rtx_cost (op0, FMA, 0, speed);
+      *cost += rtx_cost (op1, FMA, 1, speed);
+      *cost += rtx_cost (op2, FMA, 2, speed);
+      return true;
+
+    case FLOAT_EXTEND:
+    case FLOAT_TRUNCATE:
+      *cost += COSTS_N_INSNS (6); /* base cost */
+      return false;
+
+    case ABS:
+      *cost += COSTS_N_INSNS(3); /* FABS when used by FP */
+      return false;
+
+    case SMAX:
+    case SMIN:
+      *cost += COSTS_N_INSNS(3); /* base cost */
+      return false;
+
+    case TRUNCATE:
+      if (mode == DImode
+          && GET_MODE (XEXP (x, 0)) == TImode
+          && GET_CODE (XEXP (x, 0)) == LSHIFTRT
+          && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+          && UINTVAL (XEXP (XEXP (x, 0), 1)) == 64
+          && GET_CODE (XEXP (XEXP (x, 0), 0)) == MULT
+          && ((GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == ZERO_EXTEND
+               && GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1)) == ZERO_EXTEND)
+              || (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == SIGN_EXTEND
+                  && GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1)) == SIGN_EXTEND))
+          && GET_MODE (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 0), 0)) == DImode
+          && GET_MODE (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 1), 0)) == DImode)
+        {
+          /* umulh/smulh */
+          *cost += COSTS_N_INSNS(5);
+          *cost += rtx_cost (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 0), 0), MULT, 0, speed);
+          *cost += rtx_cost (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 1), 0), MULT, 1, speed);
+          return true;
+        }
+
+      *cost += COSTS_N_INSNS(1);   /* default */
+      return false;
+
+    default:
+      *cost += COSTS_N_INSNS (1);  /* default cost */
+      return false;
+    }
+}
+
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 13/14] Initial tuning description for XGene-1 core.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (9 preceding siblings ...)
  2014-02-18 21:28 ` [AArch64 14/14] Add cost-model for XGene-1 Philipp Tomsich
@ 2014-02-18 21:28 ` Philipp Tomsich
  2014-02-18 21:29 ` [AArch64 08/14] Define a variant of cmp for the CC_NZ case Philipp Tomsich
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

The generic cost model for AArch64 can not be used to capture the
microarchitectural cost of XGene-1 in full detail.  For this reason,
we use the basic tuning model of the Cortex-A53 for now.
---
 gcc/config/aarch64/aarch64-cores.def |  2 +-
 gcc/config/aarch64/aarch64.c         | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index b4f6c16..abbfea9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -36,7 +36,7 @@
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa53)
 AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, generic)
-AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FPSIMD, generic)
+AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FPSIMD, xgene1)
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4327eb3..4c06f9b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -192,6 +192,20 @@ static const struct cpu_regmove_cost generic_regmove_cost =
   NAMED_PARAM (FP2FP, 4)
 };
 
+static const struct cpu_regmove_cost xgene1_regmove_cost =
+{
+  NAMED_PARAM (GP2GP, 1),
+  /* We want all GP2FP and FP2GP moves to be handled by a reload.
+     A direct move instruction will have similar microarchitectural
+     cost to a store/load combination.  */
+  NAMED_PARAM (GP2FP, 4),
+  NAMED_PARAM (FP2GP, 4),
+  /* We currently do not provide direct support for TFmode Q->Q move.
+     Therefore we need to raise the cost above 2 in order to have
+     reload handle the situation.  */
+  NAMED_PARAM (FP2FP, 4)
+};
+
 /* Generic costs for vector insn classes.  */
 #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
 __extension__
@@ -237,6 +251,20 @@ static const struct tune_params cortexa53_tunings =
   NAMED_PARAM (branch_cost, 2)
 };
 
+/* We can't model the microarchitectural costs on XGene using  the default
+   cost model for AArch64.  So we leave the extra cost structure pointing
+   to the default cost model for the time being.  */
+static const struct tune_params xgene1_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &xgene1_regmove_cost,
+  &generic_vector_cost,
+  NAMED_PARAM (memmov_cost, 4),
+  NAMED_PARAM (issue_rate, 4),
+  NAMED_PARAM (branch_cost, 2)
+};
+
 /* A processor implementing AArch64.  */
 struct processor
 {
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 08/14] Define a variant of cmp for the CC_NZ case.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (10 preceding siblings ...)
  2014-02-18 21:28 ` [AArch64 13/14] Initial tuning description for XGene-1 core Philipp Tomsich
@ 2014-02-18 21:29 ` Philipp Tomsich
  2014-02-18 21:42   ` Andrew Pinski
  2014-02-18 21:29 ` [AArch64 09/14] Add special cases of zero-extend w/ compare operations Philipp Tomsich
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

This pattern is not strictly necessary and a similar effect could
be achieved through the use of a suitable compatibility relation
for CC modes; in the meantime, this helps on some benchmarks.
---
 gcc/config/aarch64/aarch64.md | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 13a75d3..60e42af 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2159,6 +2159,19 @@
   [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
 )
 
+;; variant that generates a CC_NZ output mode
+(define_insn "*cmp<mode>_nz"
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ (match_operand:GPI 0 "register_operand" "r,r,r")
+		       (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+  ""
+  "@
+   cmp\\t%<w>0, %<w>1
+   cmp\\t%<w>0, %<w>1
+   cmn\\t%<w>0, #%n1"
+  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
+)
+
 (define_insn "*cmp<mode>"
   [(set (reg:CCFP CC_REGNUM)
         (compare:CCFP (match_operand:GPF 0 "register_operand" "w,w")
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 09/14] Add special cases of zero-extend w/ compare operations.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (11 preceding siblings ...)
  2014-02-18 21:29 ` [AArch64 08/14] Define a variant of cmp for the CC_NZ case Philipp Tomsich
@ 2014-02-18 21:29 ` Philipp Tomsich
  2014-02-18 21:42   ` Andrew Pinski
  2014-02-18 21:30 ` [AArch64 12/14] Generate 'bics', when only interested in CC_NZ Philipp Tomsich
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

---
 gcc/config/aarch64/aarch64.md | 56 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 60e42af..c72d123 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2227,6 +2227,62 @@
   [(set_attr "type" "alus_ext")]
 )
 
+(define_insn "*zext<mode>qi3_compare0"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+    (compare:CC_ZESWP
+     (zero_extend:GPI (match_operand:QI 1 "register_operand" "r"))
+     (const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI (match_dup 1)))]
+  ""
+  "ands\\t%<w>0, %<w>1, 0xFF"
+  [(set_attr "type" "logics_imm")]
+)
+
+(define_insn "*zext<mode>hi3_compare0"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+    (compare:CC_ZESWP
+     (zero_extend:GPI (match_operand:HI 1 "register_operand" "r"))
+     (const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI (match_dup 1)))]
+  ""
+  "ands\\t%<w>0, %<w>1, 0xFFFF"
+  [(set_attr "type" "logics_imm")]
+)
+
+(define_insn "*zextdisi3_compare0"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+    (compare:CC_ZESWP
+     (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+     (const_int 0)))
+   (set (match_operand:DI 0 "register_operand" "=r")
+    (zero_extend:DI (match_dup 1)))]
+  ""
+  "ands\\t%x0, %x1, 0xFFFFFFFF"
+  [(set_attr "type" "logics_imm")]
+)
+
+(define_insn "*zextqi3nr_compare0"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+    (compare:CC_ZESWP
+     (match_operand:QI 0 "register_operand" "r")
+     (const_int 0)))]
+  ""
+  "tst\\t%w0, 0xFF"
+  [(set_attr "type" "logics_imm")]
+)
+
+(define_insn "*zexthi3nr_compare0"
+  [(set (reg:CC_ZESWP CC_REGNUM)
+    (compare:CC_ZESWP
+     (match_operand:HI 0 "register_operand" "r")
+     (const_int 0)))]
+  ""
+  "tst\\t%w0, 0xFFFF"
+  [(set_attr "type" "logics_imm")]
+)
+
 ;; -------------------------------------------------------------------
 ;; Store-flag and conditional select insns
 ;; -------------------------------------------------------------------
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64 12/14] Generate 'bics', when only interested in CC_NZ.
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (12 preceding siblings ...)
  2014-02-18 21:29 ` [AArch64 09/14] Add special cases of zero-extend w/ compare operations Philipp Tomsich
@ 2014-02-18 21:30 ` Philipp Tomsich
  2014-02-18 21:43   ` Andrew Pinski
  2014-02-19 14:02 ` [AArch64 00/14] Pipeline-independent changes for XGene-1 Richard Earnshaw
  2014-02-19 14:41 ` Ramana Radhakrishnan
  15 siblings, 1 reply; 32+ messages in thread
From: Philipp Tomsich @ 2014-02-18 21:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: philipp.tomsich

A specialized variant of '*and_one_cmpl<mode>3_compare0' is needed
to match some cases (during the combine stage) that could be folded
into a bics, when the output result is not used (i.e. when only the
condition code is of interest).

This is useful both for CoreMark and SPEC workloads.
---
 gcc/config/aarch64/aarch64.md | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6feedd3..5c62868 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2747,6 +2747,18 @@
   [(set_attr "type" "logics_reg")]
 )
 
+;; variant of the above, that only uses the condition code
+(define_insn "*and_one_cmpl<mode>3_compare0_internal2"
+  [(set (reg:CC_NZ CC_REGNUM)
+        (compare:CC_NZ
+         (and:GPI (not:GPI (match_operand:GPI 0 "register_operand" "r"))
+                  (match_operand:GPI 1 "register_operand" "r"))
+	 (const_int 0)))]
+  ""
+  "bics\\t<w>zr, %<w>1, %<w>0"
+  [(set_attr "type" "logics_reg")]
+)
+
 ;; zero_extend version of above
 (define_insn "*and_one_cmplsi3_compare0_uxtw"
   [(set (reg:CC_NZ CC_REGNUM)
@@ -2792,6 +2804,20 @@
   [(set_attr "type" "logics_shift_imm")]
 )
 
+(define_insn "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_internal2"
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (and:GPI (not:GPI
+		   (SHIFT:GPI
+		    (match_operand:GPI 0 "register_operand" "r")
+		    (match_operand:QI 1 "aarch64_shift_imm_<mode>" "n")))
+		  (match_operand:GPI 2 "register_operand" "r"))
+	 (const_int 0)))]
+  ""
+  "bics\\t<w>zr, %<w>2, %<w>0, <SHIFT:shift> %1"
+  [(set_attr "type" "logics_shift_imm")]
+)
+
 ;; zero_extend version of above
 (define_insn "*and_one_cmpl_<SHIFT:optab>si3_compare0_uxtw"
   [(set (reg:CC_NZ CC_REGNUM)
-- 
1.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 10/14] Add mov<mode>cc definition for GPF case.
  2014-02-18 21:26 ` [AArch64 10/14] Add mov<mode>cc definition for GPF case Philipp Tomsich
@ 2014-02-18 21:40   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:40 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:

  Can you add a testcase or two for this patch?

Thanks,
Andrew Pinski

> ---
>  gcc/config/aarch64/aarch64.md | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c72d123..b6453b6 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2460,6 +2460,25 @@
>    }
>  )
>
> +(define_expand "mov<mode>cc"
> +  [(set (match_operand:GPF 0 "register_operand" "")
> +  (if_then_else:GPF (match_operand 1 "aarch64_comparison_operator" "")
> +                    (match_operand:GPF 2 "register_operand" "")
> +                    (match_operand:GPF 3 "register_operand" "")))]
> +  ""
> +  {
> +    rtx ccreg;
> +    enum rtx_code code = GET_CODE (operands[1]);
> +
> +    if (code == UNEQ || code == LTGT)
> +      FAIL;
> +
> +    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
> +                                       XEXP (operands[1], 1));
> +    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
> +  }
> +)
> +
>  (define_expand "mov<GPF:mode><GPI:mode>cc"
>    [(set (match_operand:GPI 0 "register_operand" "")
>         (if_then_else:GPI (match_operand 1 "aarch64_comparison_operator" "")
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 11/14] Optimize and(s) patterns for HI/QI operands.
  2014-02-18 21:27 ` [AArch64 11/14] Optimize and(s) patterns for HI/QI operands Philipp Tomsich
@ 2014-02-18 21:41   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:41 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> HImode and QImode operands can be handled in a more optimal way for
> logical AND than for logical OR operations. An AND will never set
> bits that are not already set in its operands, so the resulting
> mode/precision depends on the least precision of its operands with
> an implicit zero-extension to any larger precision.
>
> These patterns help to avoid unnecessary zero-extension operations
> on benchmarks, including some SPEC workloads.

Can you add a testcase or two for this patch?  Having an example will
help people in the future understand why these patterns are added.

Thanks,
Andrew Pinski


> ---
>  gcc/config/aarch64/aarch64.md   | 62 ++++++++++++++++++++++++++++++++++++++---
>  gcc/config/aarch64/iterators.md |  2 ++
>  2 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b6453b6..6feedd3 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2551,8 +2551,8 @@
>
>  (define_insn "<optab><mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r,rk")
> -       (LOGICAL:GPI (match_operand:GPI 1 "register_operand" "%r,r")
> -                    (match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>")))]
> +       (OR:GPI (match_operand:GPI 1 "register_operand" "%r,r")
> +               (match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>")))]
>    ""
>    "<logical>\\t%<w>0, %<w>1, %<w>2"
>    [(set_attr "type" "logic_reg,logic_imm")]
> @@ -2569,6 +2569,27 @@
>    [(set_attr "type" "logic_reg,logic_imm")]
>  )
>
> +;; specialized form of AND for HI and QI
> +(define_insn "and<mode>3"
> +  [(set (match_operand:ALLI 0 "register_operand" "=r,rk")
> +        (and:ALLI (match_operand:ALLI 1 "register_operand" "%r,r")
> +                  (match_operand:ALLI 2 "aarch64_logical_operand" "r,<andconst>")))]
> +  ""
> +  "and\\t%<w>0, %<w>1, %<w>2"
> +  [(set_attr "type" "logic_reg,logic_imm")]
> +)
> +
> +;; zero_extend version of above
> +(define_insn "*and<mode>3_zeroextend"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (zero_extend:GPI
> +          (and:ALLX (match_operand:ALLX 1 "register_operand" "r")
> +                    (match_operand:ALLX 2 "const_int_operand" "<andconst>"))))]
> +  ""
> +  "and\\t%w0, %w1, %w2"
> +  [(set_attr "type" "logic_imm")]
> +)
> +
>  (define_insn "*and<mode>3_compare0"
>    [(set (reg:CC_NZ CC_REGNUM)
>         (compare:CC_NZ
> @@ -2582,12 +2603,28 @@
>    [(set_attr "type" "logics_reg,logics_imm")]
>  )
>
> +;; special variant for HI and QI operators (implicitly zero-extending)
> +(define_insn "*and<mode>3_compare0"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +        (compare:CC_NZ
> +                (and:GPI (match_operand:SHORT 1 "register_operand" "%r,r")
> +                         (match_operand:SHORT 2 "aarch64_logical_operand" "r,<andconst>"))
> +                (const_int 0)))
> +   (set (match_operand:GPI 0 "register_operand" "=r,r")
> +        (and:GPI (match_dup 1) (match_dup 2)))]
> +  ""
> +  "@
> +   ands\\t%<w>0, %<w>1, %<w>2
> +   ands\\t%<w>0, %<w>1, %2"
> +  [(set_attr "type" "logic_reg,logic_imm")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*andsi3_compare0_uxtw"
>    [(set (reg:CC_NZ CC_REGNUM)
>         (compare:CC_NZ
> -        (and:SI (match_operand:SI 1 "register_operand" "%r,r")
> -                (match_operand:SI 2 "aarch64_logical_operand" "r,K"))
> +        (and:SI (match_operand:ALLX 1 "register_operand" "%r,r")
> +                (match_operand:ALLX 2 "aarch64_logical_operand" "r,K"))
>          (const_int 0)))
>     (set (match_operand:DI 0 "register_operand" "=r,r")
>         (zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
> @@ -2628,6 +2665,23 @@
>    [(set_attr "type" "logics_shift_imm")]
>  )
>
> +;; specialized form for bitfield tests
> +(define_insn "*ands<mode>3_zeroextract_internal2"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +        (compare:CC_NZ
> +         (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
> +                           (match_operand 1 "const_int_operand" "n")
> +                           (match_operand 2 "const_int_operand" "n"))
> +         (const_int 0)))]
> +  "aarch64_bitmask_imm((((HOST_WIDE_INT)1 << (UINTVAL(operands[1]))) - 1) << UINTVAL(operands[2]), <MODE>mode)"
> +  "*
> +  {
> +    operands[3] = GEN_INT((((HOST_WIDE_INT)1 << (UINTVAL(operands[1]))) - 1) << UINTVAL(operands[2]));
> +    return \"ands\\t<w>zr, %<w>0, %<w>3\";
> +  }"
> +  [(set_attr "type" "logics_reg")]
> +)
> +
>  (define_insn "*<LOGICAL:optab>_<SHIFT:optab><mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>         (LOGICAL:GPI (SHIFT:GPI
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index f1339b8..edba829 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -341,6 +341,7 @@
>
>  ;; Attribute to describe constants acceptable in logical operations
>  (define_mode_attr lconst [(SI "K") (DI "L")])
> +(define_mode_attr andconst [(QI "K") (HI "K") (SI "K") (DI "L")])
>
>  ;; Map a mode to a specific constraint character.
>  (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
> @@ -627,6 +628,7 @@
>
>  ;; Code iterator for logical operations
>  (define_code_iterator LOGICAL [and ior xor])
> +(define_code_iterator OR [ior xor])
>
>  ;; Code iterator for sign/zero extension
>  (define_code_iterator ANY_EXTEND [sign_extend zero_extend])
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 08/14] Define a variant of cmp for the CC_NZ case.
  2014-02-18 21:29 ` [AArch64 08/14] Define a variant of cmp for the CC_NZ case Philipp Tomsich
@ 2014-02-18 21:42   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:42 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> This pattern is not strictly necessary and a similar effect could
> be achieved through the use of a suitable compatibility relation
> for CC modes; in the meantime, this helps on some benchmarks.


Can you add a testcase or two for this patch?  Having an example will
help people in the future understand why these patterns are added.

Thanks,
Andrew Pinski

> ---
>  gcc/config/aarch64/aarch64.md | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 13a75d3..60e42af 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2159,6 +2159,19 @@
>    [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
>  )
>
> +;; variant that generates a CC_NZ output mode
> +(define_insn "*cmp<mode>_nz"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +       (compare:CC_NZ (match_operand:GPI 0 "register_operand" "r,r,r")
> +                      (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
> +  ""
> +  "@
> +   cmp\\t%<w>0, %<w>1
> +   cmp\\t%<w>0, %<w>1
> +   cmn\\t%<w>0, #%n1"
> +  [(set_attr "type" "alus_reg,alus_imm,alus_imm")]
> +)
> +
>  (define_insn "*cmp<mode>"
>    [(set (reg:CCFP CC_REGNUM)
>          (compare:CCFP (match_operand:GPF 0 "register_operand" "w,w")
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 09/14] Add special cases of zero-extend w/ compare operations.
  2014-02-18 21:29 ` [AArch64 09/14] Add special cases of zero-extend w/ compare operations Philipp Tomsich
@ 2014-02-18 21:42   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:42 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:

Can you add a testcase or two for this patch?  Having an example will
help people in the future understand why these patterns are added.

Thanks,
Andrew Pinski


> ---
>  gcc/config/aarch64/aarch64.md | 56 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 60e42af..c72d123 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2227,6 +2227,62 @@
>    [(set_attr "type" "alus_ext")]
>  )
>
> +(define_insn "*zext<mode>qi3_compare0"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +    (compare:CC_ZESWP
> +     (zero_extend:GPI (match_operand:QI 1 "register_operand" "r"))
> +     (const_int 0)))
> +   (set (match_operand:GPI 0 "register_operand" "=r")
> +    (zero_extend:GPI (match_dup 1)))]
> +  ""
> +  "ands\\t%<w>0, %<w>1, 0xFF"
> +  [(set_attr "type" "logics_imm")]
> +)
> +
> +(define_insn "*zext<mode>hi3_compare0"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +    (compare:CC_ZESWP
> +     (zero_extend:GPI (match_operand:HI 1 "register_operand" "r"))
> +     (const_int 0)))
> +   (set (match_operand:GPI 0 "register_operand" "=r")
> +    (zero_extend:GPI (match_dup 1)))]
> +  ""
> +  "ands\\t%<w>0, %<w>1, 0xFFFF"
> +  [(set_attr "type" "logics_imm")]
> +)
> +
> +(define_insn "*zextdisi3_compare0"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +    (compare:CC_ZESWP
> +     (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +     (const_int 0)))
> +   (set (match_operand:DI 0 "register_operand" "=r")
> +    (zero_extend:DI (match_dup 1)))]
> +  ""
> +  "ands\\t%x0, %x1, 0xFFFFFFFF"
> +  [(set_attr "type" "logics_imm")]
> +)
> +
> +(define_insn "*zextqi3nr_compare0"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +    (compare:CC_ZESWP
> +     (match_operand:QI 0 "register_operand" "r")
> +     (const_int 0)))]
> +  ""
> +  "tst\\t%w0, 0xFF"
> +  [(set_attr "type" "logics_imm")]
> +)
> +
> +(define_insn "*zexthi3nr_compare0"
> +  [(set (reg:CC_ZESWP CC_REGNUM)
> +    (compare:CC_ZESWP
> +     (match_operand:HI 0 "register_operand" "r")
> +     (const_int 0)))]
> +  ""
> +  "tst\\t%w0, 0xFFFF"
> +  [(set_attr "type" "logics_imm")]
> +)
> +
>  ;; -------------------------------------------------------------------
>  ;; Store-flag and conditional select insns
>  ;; -------------------------------------------------------------------
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 12/14] Generate 'bics', when only interested in CC_NZ.
  2014-02-18 21:30 ` [AArch64 12/14] Generate 'bics', when only interested in CC_NZ Philipp Tomsich
@ 2014-02-18 21:43   ` Andrew Pinski
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Pinski @ 2014-02-18 21:43 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches

On Tue, Feb 18, 2014 at 1:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> A specialized variant of '*and_one_cmpl<mode>3_compare0' is needed
> to match some cases (during the combine stage) that could be folded
> into a bics, when the output result is not used (i.e. when only the
> condition code is of interest).
>
> This is useful both for CoreMark and SPEC workloads.

Can you add a testcase or two for this patch?  Having an example will
help people in the future understand why these patterns are added.

Thanks,
Andrew Pinski

> ---
>  gcc/config/aarch64/aarch64.md | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6feedd3..5c62868 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2747,6 +2747,18 @@
>    [(set_attr "type" "logics_reg")]
>  )
>
> +;; variant of the above, that only uses the condition code
> +(define_insn "*and_one_cmpl<mode>3_compare0_internal2"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +        (compare:CC_NZ
> +         (and:GPI (not:GPI (match_operand:GPI 0 "register_operand" "r"))
> +                  (match_operand:GPI 1 "register_operand" "r"))
> +        (const_int 0)))]
> +  ""
> +  "bics\\t<w>zr, %<w>1, %<w>0"
> +  [(set_attr "type" "logics_reg")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*and_one_cmplsi3_compare0_uxtw"
>    [(set (reg:CC_NZ CC_REGNUM)
> @@ -2792,6 +2804,20 @@
>    [(set_attr "type" "logics_shift_imm")]
>  )
>
> +(define_insn "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_internal2"
> +  [(set (reg:CC_NZ CC_REGNUM)
> +       (compare:CC_NZ
> +        (and:GPI (not:GPI
> +                  (SHIFT:GPI
> +                   (match_operand:GPI 0 "register_operand" "r")
> +                   (match_operand:QI 1 "aarch64_shift_imm_<mode>" "n")))
> +                 (match_operand:GPI 2 "register_operand" "r"))
> +        (const_int 0)))]
> +  ""
> +  "bics\\t<w>zr, %<w>2, %<w>0, <SHIFT:shift> %1"
> +  [(set_attr "type" "logics_shift_imm")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*and_one_cmpl_<SHIFT:optab>si3_compare0_uxtw"
>    [(set (reg:CC_NZ CC_REGNUM)
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 00/14] Pipeline-independent changes for XGene-1
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (13 preceding siblings ...)
  2014-02-18 21:30 ` [AArch64 12/14] Generate 'bics', when only interested in CC_NZ Philipp Tomsich
@ 2014-02-19 14:02 ` Richard Earnshaw
  2014-02-19 14:41 ` Ramana Radhakrishnan
  15 siblings, 0 replies; 32+ messages in thread
From: Richard Earnshaw @ 2014-02-19 14:02 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: gcc-patches

On 18/02/14 21:09, Philipp Tomsich wrote:
> The following patch-set contains the pipeline-independent changes to gcc
> to support the APM XGene-1 and contains various enhancements derived from
> real-world applications and benchmarks running on XGene-1.
> 
> As the pipeline model has not been fully adapted to the new instruction
> typing shared between the ARM backend and the AArch64 backend, it is not
> yet contained in these patches.
> 
> The most controversial part of these patches will likely consist in the
> new cost-model, which has intentionally been provided as a "hook" that
> intercepts the current cost-model when compiling for XGene-1. Given that
> the matching/structure of this cost-model is different from the existing
> implementation, we've chosen to keep this in a separate function for the
> time being.
> 

This patch series is too late for 4.9 and for stage 1 I'd like to see
this fixed before the code goes in.  Code like this rapidly becomes
unmaintainable and makes it difficult to add support for future
variants; it tends to proliferate once started and then it becomes
necessary to analyse every part of the machine description each time a
new device is added to find out whether it needs adjusting.

It should be possible to plug the XGene timings into the current
infrastructure, though it might be necessary to add some new data values
when doing so.

The end goal is that nothing in the back-end, apart from instruction
scheduling, should be testing for a specific CPU; the backend should
make all its code generation decisions from the architecture and tuning
tables.

R.

> 
> Philipp Tomsich (14):
>   Use "generic" target, if no other default.
>   Add "xgene1" core identifier.
>   Retrieve BRANCH_COST from tuning structure.
>   Correct the maximum shift amount for shifted operands.
>   Add AArch64 'prefetch'-pattern.
>   Extend '*tb<optab><mode>1'.
>   Define additional patterns for adds/subs.
>   Define a variant of cmp for the CC_NZ case.
>   Add special cases of zero-extend w/ compare operations.
>   Add mov<mode>cc definition for GPF case.
>   Optimize and(s) patterns for HI/QI operands.
>   Generate 'bics', when only interested in CC_NZ.
>   Initial tuning description for XGene-1 core.
>   Add cost-model for XGene-1.
> 
>  gcc/config/aarch64/aarch64-cores.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h  |   2 +
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.c         | 922 ++++++++++++++++++++++++++++++++++-
>  gcc/config/aarch64/aarch64.h         |  10 +-
>  gcc/config/aarch64/aarch64.md        | 246 +++++++++-
>  gcc/config/aarch64/iterators.md      |   2 +
>  gcc/config/arm/types.md              |   2 +
>  8 files changed, 1172 insertions(+), 15 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 00/14] Pipeline-independent changes for XGene-1
  2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
                   ` (14 preceding siblings ...)
  2014-02-19 14:02 ` [AArch64 00/14] Pipeline-independent changes for XGene-1 Richard Earnshaw
@ 2014-02-19 14:41 ` Ramana Radhakrishnan
  15 siblings, 0 replies; 32+ messages in thread
From: Ramana Radhakrishnan @ 2014-02-19 14:41 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: gcc-patches

On Tue, Feb 18, 2014 at 9:09 PM, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> The following patch-set contains the pipeline-independent changes to gcc
> to support the APM XGene-1 and contains various enhancements derived from
> real-world applications and benchmarks running on XGene-1.
>
> As the pipeline model has not been fully adapted to the new instruction
> typing shared between the ARM backend and the AArch64 backend, it is not
> yet contained in these patches.
>
> The most controversial part of these patches will likely consist in the
> new cost-model, which has intentionally been provided as a "hook" that
> intercepts the current cost-model when compiling for XGene-1. Given that
> the matching/structure of this cost-model is different from the existing
> implementation, we've chosen to keep this in a separate function for the
> time being.

And please produce Changelog entries for each of the changes. Can I
also ask you to confirm that you have a copyright assignment with the
FSF on file for contributing these changes ?


Ramana

>
>
> Philipp Tomsich (14):
>   Use "generic" target, if no other default.
>   Add "xgene1" core identifier.
>   Retrieve BRANCH_COST from tuning structure.
>   Correct the maximum shift amount for shifted operands.
>   Add AArch64 'prefetch'-pattern.
>   Extend '*tb<optab><mode>1'.
>   Define additional patterns for adds/subs.
>   Define a variant of cmp for the CC_NZ case.
>   Add special cases of zero-extend w/ compare operations.
>   Add mov<mode>cc definition for GPF case.
>   Optimize and(s) patterns for HI/QI operands.
>   Generate 'bics', when only interested in CC_NZ.
>   Initial tuning description for XGene-1 core.
>   Add cost-model for XGene-1.
>
>  gcc/config/aarch64/aarch64-cores.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h  |   2 +
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.c         | 922 ++++++++++++++++++++++++++++++++++-
>  gcc/config/aarch64/aarch64.h         |  10 +-
>  gcc/config/aarch64/aarch64.md        | 246 +++++++++-
>  gcc/config/aarch64/iterators.md      |   2 +
>  gcc/config/arm/types.md              |   2 +
>  8 files changed, 1172 insertions(+), 15 deletions(-)
>
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 01/14] Use "generic" target, if no other default.
  2014-02-18 21:10 ` [AArch64 01/14] Use "generic" target, if no other default Philipp Tomsich
@ 2014-02-21 14:02   ` Kyrill Tkachov
  0 siblings, 0 replies; 32+ messages in thread
From: Kyrill Tkachov @ 2014-02-21 14:02 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: gcc-patches

Hi Philipp,

On 18/02/14 21:09, Philipp Tomsich wrote:
> The default target should be "generic", as Cortex-A53 includes
> optional ISA features (CRC and CRYPTO) that are not required for
> architectural compliance. The key difference between generic (which
> already uses the cortexa53 pipeline model for scheduling) is the
> absence of any optional ISA features in the "generic" target.
> ---
>   gcc/config/aarch64/aarch64.c | 2 +-
>   gcc/config/aarch64/aarch64.h | 4 ++--
>   2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 784bfa3..70dda00 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5244,7 +5244,7 @@ aarch64_override_options (void)
>   
>     /* If the user did not specify a processor, choose the default
>        one for them.  This will be the CPU set during configuration using
> -     --with-cpu, otherwise it is "cortex-a53".  */
> +     --with-cpu, otherwise it is "generic".  */
>     if (!selected_cpu)
>       {
>         selected_cpu = &all_cores[TARGET_CPU_DEFAULT & 0x3f];
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 13c424c..b66a6b4 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -472,10 +472,10 @@ enum target_cpus
>     TARGET_CPU_generic
>   };
>   
> -/* If there is no CPU defined at configure, use "cortex-a53" as default.  */
> +/* If there is no CPU defined at configure, use "generic" as default.  */
>   #ifndef TARGET_CPU_DEFAULT
>   #define TARGET_CPU_DEFAULT \
> -  (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
> +  (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
>   #endif
>   
>   /* The processor for which instructions should be scheduled.  */

I don't think this approach will work. The bug we have here is that in 
config.gcc when processing a --with-arch directive it will use the CPU flags of 
the sample cpu given for the architecture in aarch64-arches.def. This will cause 
it to use cortex-a53+fp+simd+crypto+crc when asked to configure for 
--with-arch=armv8-a. Instead it should be using the 4th field of the 
AARCH64_ARCH which specifies the ISA flags implied by the architecture. Then we 
would get cortex-a53+fp+simd.

Also, if no --with-arch or --with-cpu is specified, config.gcc will still 
specify TARGET_CPU_DEFAULT as TARGET_CPU_generic but without encoding the ISA 
flags (AARCH64_FL_FOR_ARCH8 in this case) for it in the upper bits of 
TARGET_CPU_DEFAULT, leading to an always defined TARGET_CPU_DEFAULT which will 
cause the last hunk in this patch to never be used and configuring.

I'm working on a fix for these issues.

HTH,
Kyrill

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
  2014-02-18 21:18   ` Andrew Pinski
@ 2014-02-28  8:58   ` Gopalasubramanian, Ganesh
  2014-02-28  9:14   ` Gopalasubramanian, Ganesh
  2 siblings, 0 replies; 32+ messages in thread
From: Gopalasubramanian, Ganesh @ 2014-02-28  8:58 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: gcc-patches, pinskia

[-- Attachment #1: Type: text/plain, Size: 3264 bytes --]

With the locality value received in the instruction pattern, I think it would be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this locality.

+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+            (match_operand:QI 1 "const_int_operand" "n")
+            (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : \"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+

I also have attached a patch that implements
*	Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this.
*	Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this.
*	Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh

-----Original Message-----
From: Philipp Tomsich [mailto:philipp.tomsich@theobroma-systems.com] 
Sent: Wednesday, February 19, 2014 2:40 AM
To: gcc-patches@gcc.gnu.org
Cc: philipp.tomsich@theobroma-systems.com
Subject: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

---
 gcc/config/aarch64/aarch64.md | 17 +++++++++++++++++
 gcc/config/arm/types.md       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..b972a1b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -293,6 +293,23 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "register_operand" "r")
+	     (match_operand:QI 1 "const_int_operand" "n")
+	     (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  if (INTVAL(operands[2]) == 0)
+     /* no temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : 
+\"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : 
+\"prfm\\tPLDL1KEEP, [%0, #0]\"; }"
+  [(set_attr "type" "prefetch")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index cc39cd1..1d1280d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -117,6 +117,7 @@
 ; mvn_shift_reg      inverting move instruction, shifted operand by a register.
 ; no_insn            an insn which does not represent an instruction in the
 ;                    final output, thus having no impact on scheduling.
+; prefetch	     a prefetch instruction
 ; rbit               reverse bits.
 ; rev                reverse bytes.
 ; sdiv               signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\
--
1.9.0


[-- Attachment #2: prefetchdiff.log --]
[-- Type: application/octet-stream, Size: 3919 bytes --]

Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md	(revision 208107)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -293,6 +293,73 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "*prefetch"
+  [(prefetch (plus:DI (match_operand:DI 0 "register_operand" "r")
+                      (match_operand:DI 1 "aarch64_prefetch_pimm" "")
+             )
+            (match_operand:QI 2 "const_int_operand" "n")
+            (match_operand:QI 3 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[3]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[2])) ? \"prfm\\tPSTL1STRM, [%0, %1]\" : \"prfm\\tPLDL1STRM, [%0, %1]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[2])) ? \"prfm\\tPSTL%3KEEP, [%0, %1]\" : \"prfm\\tPLDL%3KEEP, [%0, %1]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
+(define_insn "*prefetch"
+  [(prefetch (plus:DI (match_operand:DI 0 "register_operand" "r")
+                      (match_operand:DI 1 "aarch64_prefetch_unscaled" "")
+             )
+            (match_operand:QI 2 "const_int_operand" "n")
+            (match_operand:QI 3 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[3]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[2])) ? \"prfum\\tPSTL1STRM, [%0, %1]\" : \"prfm\\tPLDL1STRM, [%0, %1]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[2])) ? \"prfum\\tPSTL%3KEEP, [%0, %1]\" : \"prfm\\tPLDL%3KEEP, [%0, %1]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+            (match_operand:QI 1 "const_int_operand" "n")
+            (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : \"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
Index: gcc/config/aarch64/predicates.md
===================================================================
--- gcc/config/aarch64/predicates.md	(revision 208107)
+++ gcc/config/aarch64/predicates.md	(working copy)
@@ -66,6 +66,14 @@
   (and (match_code "const_int")
        (match_test "(INTVAL (op) < 0xffffff && INTVAL (op) > -0xffffff)")))
 
+(define_predicate "aarch64_prefetch_pimm"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) < 0x7ff8 && (0 == INTVAL (op) % 8))")))
+
+(define_predicate "aarch64_prefetch_unscaled"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) < 255 && INTVAL (op) > -256)")))
+
 (define_predicate "aarch64_pluslong_operand"
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "aarch64_pluslong_immediate")))
Index: gcc/config/arm/types.md
===================================================================
--- gcc/config/arm/types.md	(revision 208107)
+++ gcc/config/arm/types.md	(working copy)
@@ -117,6 +117,7 @@
 ; mvn_shift_reg      inverting move instruction, shifted operand by a register.
 ; no_insn            an insn which does not represent an instruction in the
 ;                    final output, thus having no impact on scheduling.
+; prefetch	     a prefetch instruction
 ; rbit               reverse bits.
 ; rev                reverse bytes.
 ; sdiv               signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
  2014-02-18 21:18   ` Andrew Pinski
  2014-02-28  8:58   ` Gopalasubramanian, Ganesh
@ 2014-02-28  9:14   ` Gopalasubramanian, Ganesh
  2014-02-28  9:28     ` Dr. Philipp Tomsich
  2 siblings, 1 reply; 32+ messages in thread
From: Gopalasubramanian, Ganesh @ 2014-02-28  9:14 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: gcc-patches, pinskia

[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]

Avoided top-posting and resending.

+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : 
+\"prfm\\tPLDL1KEEP, [%0, #0]\"; }"
+  [(set_attr "type" "prefetch")]
+)
+

With the locality value received in the instruction pattern, I think it would be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this locality.

+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+            (match_operand:QI 1 "const_int_operand" "n")
+            (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : \"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+

I also have attached a patch that implements the following. 
*	Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this.
*	Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this.
*	Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh

[-- Attachment #2: prefetchdiff.log --]
[-- Type: application/octet-stream, Size: 3919 bytes --]

Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md	(revision 208107)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -293,6 +293,73 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "*prefetch"
+  [(prefetch (plus:DI (match_operand:DI 0 "register_operand" "r")
+                      (match_operand:DI 1 "aarch64_prefetch_pimm" "")
+             )
+            (match_operand:QI 2 "const_int_operand" "n")
+            (match_operand:QI 3 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[3]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[2])) ? \"prfm\\tPSTL1STRM, [%0, %1]\" : \"prfm\\tPLDL1STRM, [%0, %1]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[2])) ? \"prfm\\tPSTL%3KEEP, [%0, %1]\" : \"prfm\\tPLDL%3KEEP, [%0, %1]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
+(define_insn "*prefetch"
+  [(prefetch (plus:DI (match_operand:DI 0 "register_operand" "r")
+                      (match_operand:DI 1 "aarch64_prefetch_unscaled" "")
+             )
+            (match_operand:QI 2 "const_int_operand" "n")
+            (match_operand:QI 3 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[3]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[2])) ? \"prfum\\tPSTL1STRM, [%0, %1]\" : \"prfm\\tPLDL1STRM, [%0, %1]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[2])) ? \"prfum\\tPSTL%3KEEP, [%0, %1]\" : \"prfm\\tPLDL%3KEEP, [%0, %1]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+            (match_operand:QI 1 "const_int_operand" "n")
+            (match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+     /* non temporal locality */
+     return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : \"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : \"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
Index: gcc/config/aarch64/predicates.md
===================================================================
--- gcc/config/aarch64/predicates.md	(revision 208107)
+++ gcc/config/aarch64/predicates.md	(working copy)
@@ -66,6 +66,14 @@
   (and (match_code "const_int")
        (match_test "(INTVAL (op) < 0xffffff && INTVAL (op) > -0xffffff)")))
 
+(define_predicate "aarch64_prefetch_pimm"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) < 0x7ff8 && (0 == INTVAL (op) % 8))")))
+
+(define_predicate "aarch64_prefetch_unscaled"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) < 255 && INTVAL (op) > -256)")))
+
 (define_predicate "aarch64_pluslong_operand"
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "aarch64_pluslong_immediate")))
Index: gcc/config/arm/types.md
===================================================================
--- gcc/config/arm/types.md	(revision 208107)
+++ gcc/config/arm/types.md	(working copy)
@@ -117,6 +117,7 @@
 ; mvn_shift_reg      inverting move instruction, shifted operand by a register.
 ; no_insn            an insn which does not represent an instruction in the
 ;                    final output, thus having no impact on scheduling.
+; prefetch	     a prefetch instruction
 ; rbit               reverse bits.
 ; rev                reverse bytes.
 ; sdiv               signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-28  9:14   ` Gopalasubramanian, Ganesh
@ 2014-02-28  9:28     ` Dr. Philipp Tomsich
  2014-05-28 14:25       ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 32+ messages in thread
From: Dr. Philipp Tomsich @ 2014-02-28  9:28 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches, pinskia

Ganesh,

On 28 Feb 2014, at 10:13 , Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:

> I also have attached a patch that implements the following. 
> *	Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this.
> *	Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this.
> *	Prefetch with register offset. (modified for printing the locality)

These changes look good to me.
We’ll try them out on the benchmarks that caused us to add prefetching in the first place.

Best,
Philipp.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-02-28  9:28     ` Dr. Philipp Tomsich
@ 2014-05-28 14:25       ` Gopalasubramanian, Ganesh
  2014-05-28 14:41         ` Dr. Philipp Tomsich
  0 siblings, 1 reply; 32+ messages in thread
From: Gopalasubramanian, Ganesh @ 2014-05-28 14:25 UTC (permalink / raw)
  To: Dr. Philipp Tomsich; +Cc: gcc-patches, pinskia

Hi Philipp,

> These changes look good to me.
> We'll try them out on the benchmarks that caused us to add prefetching in the first place.

If you are OK, I would like to get these changes upstreamed.

-Ganesh

-----Original Message-----
From: Dr. Philipp Tomsich [mailto:philipp.tomsich@theobroma-systems.com] 
Sent: Friday, February 28, 2014 2:58 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; pinskia@gmail.com
Subject: Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

Ganesh,

On 28 Feb 2014, at 10:13 , Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:

> I also have attached a patch that implements the following. 
> *	Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this.
> *	Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this.
> *	Prefetch with register offset. (modified for printing the locality)

These changes look good to me.
We'll try them out on the benchmarks that caused us to add prefetching in the first place.

Best,
Philipp.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
  2014-05-28 14:25       ` Gopalasubramanian, Ganesh
@ 2014-05-28 14:41         ` Dr. Philipp Tomsich
  0 siblings, 0 replies; 32+ messages in thread
From: Dr. Philipp Tomsich @ 2014-05-28 14:41 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches, pinskia


On 28 May 2014, at 16:25 , Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:

> Hi Philipp,
> 
>> These changes look good to me.
>> We'll try them out on the benchmarks that caused us to add prefetching in the first place.
> 
> If you are OK, I would like to get these changes upstreamed.

Sorry for the delay, I was pre-occupied with other issues in the runtime system.
Yes, please go ahead with applying these changes.

Best,
Philipp.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-05-28 14:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-18 21:10 [AArch64 00/14] Pipeline-independent changes for XGene-1 Philipp Tomsich
2014-02-18 21:10 ` [AArch64 04/14] Correct the maximum shift amount for shifted operands Philipp Tomsich
2014-02-18 21:20   ` Andrew Pinski
2014-02-18 21:10 ` [AArch64 03/14] Retrieve BRANCH_COST from tuning structure Philipp Tomsich
2014-02-18 21:10 ` [AArch64 02/14] Add "xgene1" core identifier Philipp Tomsich
2014-02-18 21:10 ` [AArch64 06/14] Extend '*tb<optab><mode>1' Philipp Tomsich
2014-02-18 21:19   ` Andrew Pinski
2014-02-18 21:10 ` [AArch64 07/14] Define additional patterns for adds/subs Philipp Tomsich
2014-02-18 21:19   ` Andrew Pinski
2014-02-18 21:10 ` [AArch64 01/14] Use "generic" target, if no other default Philipp Tomsich
2014-02-21 14:02   ` Kyrill Tkachov
2014-02-18 21:10 ` [AArch64 05/14] Add AArch64 'prefetch'-pattern Philipp Tomsich
2014-02-18 21:18   ` Andrew Pinski
2014-02-28  8:58   ` Gopalasubramanian, Ganesh
2014-02-28  9:14   ` Gopalasubramanian, Ganesh
2014-02-28  9:28     ` Dr. Philipp Tomsich
2014-05-28 14:25       ` Gopalasubramanian, Ganesh
2014-05-28 14:41         ` Dr. Philipp Tomsich
2014-02-18 21:26 ` [AArch64 10/14] Add mov<mode>cc definition for GPF case Philipp Tomsich
2014-02-18 21:40   ` Andrew Pinski
2014-02-18 21:27 ` [AArch64 11/14] Optimize and(s) patterns for HI/QI operands Philipp Tomsich
2014-02-18 21:41   ` Andrew Pinski
2014-02-18 21:28 ` [AArch64 14/14] Add cost-model for XGene-1 Philipp Tomsich
2014-02-18 21:28 ` [AArch64 13/14] Initial tuning description for XGene-1 core Philipp Tomsich
2014-02-18 21:29 ` [AArch64 08/14] Define a variant of cmp for the CC_NZ case Philipp Tomsich
2014-02-18 21:42   ` Andrew Pinski
2014-02-18 21:29 ` [AArch64 09/14] Add special cases of zero-extend w/ compare operations Philipp Tomsich
2014-02-18 21:42   ` Andrew Pinski
2014-02-18 21:30 ` [AArch64 12/14] Generate 'bics', when only interested in CC_NZ Philipp Tomsich
2014-02-18 21:43   ` Andrew Pinski
2014-02-19 14:02 ` [AArch64 00/14] Pipeline-independent changes for XGene-1 Richard Earnshaw
2014-02-19 14:41 ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).