[PATCH 0/3] Improve ThunderX support

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Improve ThunderX support
@ 2014-11-14  0:56 Andrew Pinski
  2014-11-14  1:02 ` [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-11-14  0:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

Hi,
  This set of patches improve support for the ThunderX processor from Cavium.
The first patch adds support for macro fusion that is in the ThunderX processor.
The next patch adds the scheduler which was missing in the original addition.
The last patch adds tuning field to allow the processors set the alignment
of functions, loops and jumps.

Thanks,
Andrew Pinski

Andrew Pinski (3):
  [AARCH64]  Add macro fusion support for cmp/b.X for ThunderX
  [AARCH64] Add scheduler for ThunderX
  [AARCH64] Add aligning of functions/loops/jumps

 gcc/config/aarch64/aarch64-cores.def |    2 +-
 gcc/config/aarch64/aarch64-protos.h  |    1 +
 gcc/config/aarch64/aarch64.c         |   37 +++++-
 gcc/config/aarch64/aarch64.md        |    3 +-
 gcc/config/aarch64/thunderx.md       |  260 ++++++++++++++++++++++++++++++++++
 5 files changed, 297 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/aarch64/thunderx.md

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] [AARCH64]  Add macro fusion support for cmp/b.X for ThunderX
  2014-11-14  0:56 [PATCH 0/3] Improve ThunderX support Andrew Pinski
@ 2014-11-14  1:02 ` Andrew Pinski
  2014-11-14  9:28   ` Kyrill Tkachov
  2014-11-14  1:06 ` [PATCH 3/3] [AARCH64] Add aligning of functions/loops/jumps Andrew Pinski
  2014-11-14  1:10 ` [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Andrew Pinski
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Pinski @ 2014-11-14  1:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

In ThunderX, any 1 cycle arthemantic instruction that produces the flags
register, will be fused with a branch.  This patch depends on
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01508.html.
Note I know bit 1 is going is already going to be used and that is why I
proposed this being bit 2.

Build and tested for aarch64-elf with no regressions.

ChangeLog:
* config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
(thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
---
 gcc/config/aarch64/aarch64.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a258f40..5216ac0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -304,6 +304,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 
 #define AARCH64_FUSE_NOTHING	(0)
 #define AARCH64_FUSE_MOV_MOVK	(1 << 0)
+#define AARCH64_FUSE_CMP_BRANCH	(1 << 2)
 
 #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
 __extension__
@@ -349,7 +350,7 @@ static const struct tune_params thunderx_tunings =
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
 };
 
 /* A processor implementing AArch64.  */
@@ -10036,6 +10037,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
         }
     }
 
+  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)
+      && any_condjump_p (curr))
+    {
+      /* FIXME: this misses some which is considered simple arthematic
+         instructions for ThunderX.  Simple shifts are missed here.  */
+      if (get_attr_type (prev) == TYPE_ALUS_SREG
+          || get_attr_type (prev) == TYPE_ALUS_IMM
+          || get_attr_type (prev) == TYPE_LOGICS_REG
+          || get_attr_type (prev) == TYPE_LOGICS_IMM)
+	return true;
+    }
+
   return false;
 }
 
-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] [AARCH64] Add aligning of functions/loops/jumps
  2014-11-14  0:56 [PATCH 0/3] Improve ThunderX support Andrew Pinski
  2014-11-14  1:02 ` [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX Andrew Pinski
@ 2014-11-14  1:06 ` Andrew Pinski
  2014-11-14  1:10 ` [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Andrew Pinski
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-11-14  1:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

On ThunderX, I found that aligning functions/loops/jumps to an 8 byte
boundary have a slightly better performance because the hardware issue
and dispatch matches what GCC's schedule has created.

I set generic, cortex-a53 and cortex-a57 also to be 8 byte aligned
also.  Someone might want to change the cortext-a57 number to be more
correct to that processor. Understanding how cortex-a53 is a dual issue,
it made sense to set to 8 byte alignment but I don't know if it really
make sense.

Build and tested for aarch64-elf with no regressions.

ChangeLog:
* config/aarch64/aarch64-protos.h (tune_params): Add align field.
* config/aarch64/aarch64.c (generic_tunings): Specify align.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(thunderx_tunings): Likewise.
(aarch64_override_options): Set align_loops, align_jumps,
align_functions based on what the tuning struct.
---
 gcc/config/aarch64/aarch64-protos.h |    1 +
 gcc/config/aarch64/aarch64.c        |   24 ++++++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e0ff8c..3e70495 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -171,6 +171,7 @@ struct tune_params
   const int memmov_cost;
   const int issue_rate;
   const unsigned int fuseable_ops;
+  const unsigned int align;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5216ac0..9214332 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -317,7 +317,8 @@ static const struct tune_params generic_tunings =
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING),
+  NAMED_PARAM (align, 8)
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -328,7 +329,8 @@ static const struct tune_params cortexa53_tunings =
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK),
+  NAMED_PARAM (align, 8)
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -339,7 +341,8 @@ static const struct tune_params cortexa57_tunings =
   &cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 3),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK),
+  NAMED_PARAM (align, 8)
 };
 
 static const struct tune_params thunderx_tunings =
@@ -350,7 +353,8 @@ static const struct tune_params thunderx_tunings =
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH),
+  NAMED_PARAM (align, 8)
 };
 
 /* A processor implementing AArch64.  */
@@ -6501,6 +6505,18 @@ aarch64_override_options (void)
 #endif
     }
 
+  /* If not opzimizing for size, set the default
+     alignment to what the target wants */
+  if (!optimize_size)
+    {
+      if (align_loops <= 0)
+	align_loops = aarch64_tune_params->align;
+      if (align_jumps <= 0)
+	align_jumps = aarch64_tune_params->align;
+      if (align_functions <= 0)
+	align_functions = aarch64_tune_params->align;
+    }
+
   aarch64_override_options_after_change ();
 }
 
-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] [AARCH64] Add scheduler for ThunderX
  2014-11-14  0:56 [PATCH 0/3] Improve ThunderX support Andrew Pinski
  2014-11-14  1:02 ` [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX Andrew Pinski
  2014-11-14  1:06 ` [PATCH 3/3] [AARCH64] Add aligning of functions/loops/jumps Andrew Pinski
@ 2014-11-14  1:10 ` Andrew Pinski
  2014-11-14 11:03   ` Marcus Shawcroft
  2014-11-17 20:17   ` Sebastian Pop
  2 siblings, 2 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-11-14  1:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

This adds the schedule model for ThunderX. There are a few TODOs in that
not all of the SIMD is model currently.  Also the idea of a simple
shift/extend is not modeled and all cases where there is a shift/extend
is considered as non simple and take up two cycles rather than correct
value of one cycle.  Also the 32bit divide and the 64bit divide
have different cycle counts but there is no way to model that currently.
Also multiply high takes one cycle more than the normal multiply but
there is no way to model that currently either.

Build and tested for aarch64-elf with no regressions.

ChangeLog:
* config/aarch64/aarch64-cores.def (thunderx): Change the scheduler
over to thunderx.
* config/aarch64/aarch64.md: Include thunderx.md.
(generic_sched): Set to no for thunderx.
* config/aarch64/thunderx.md: New file.
---
 gcc/config/aarch64/aarch64-cores.def |    2 +-
 gcc/config/aarch64/aarch64.md        |    3 +-
 gcc/config/aarch64/thunderx.md       |  260 ++++++++++++++++++++++++++++++++++
 3 files changed, 263 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/thunderx.md

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index b3318c3..471cdd6 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -36,7 +36,7 @@
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa53)
 AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57)
-AARCH64_CORE("thunderx",    thunderx,  cortexa53, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx)
+AARCH64_CORE("thunderx",    thunderx,  thunderx, 8,  AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx)
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 17570ba..80f2db7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -191,13 +191,14 @@
 
 (define_attr "generic_sched" "yes,no"
   (const (if_then_else
-          (eq_attr "tune" "cortexa53,cortexa15")
+          (eq_attr "tune" "cortexa53,cortexa15,thunderx")
           (const_string "no")
           (const_string "yes"))))
 
 ;; Scheduling
 (include "../arm/cortex-a53.md")
 (include "../arm/cortex-a15.md")
+(include "thunderx.md")
 
 ;; -------------------------------------------------------------------
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
new file mode 100644
index 0000000..30e4395
--- /dev/null
+++ b/gcc/config/aarch64/thunderx.md
@@ -0,0 +1,260 @@
+;; Cavium ThunderX pipeline description
+;; Copyright (C) 2014 Free Software Foundation, Inc.
+;;
+;; Written by Andrew Pinski  <apinski@cavium.com>
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+;;   Copyright (C) 2004, 2005, 2006 Cavium Networks.
+
+
+;; Thunder is a dual-issue processor that can issue all instructions on
+;; pipe0 and a subset on pipe1.
+
+
+(define_automaton "thunderx_main, thunderx_mult, thunderx_divide, thunderx_simd")
+
+(define_cpu_unit "thunderx_pipe0" "thunderx_main")
+(define_cpu_unit "thunderx_pipe1" "thunderx_main")
+(define_cpu_unit "thunderx_mult" "thunderx_mult")
+(define_cpu_unit "thunderx_divide" "thunderx_divide")
+(define_cpu_unit "thunderx_simd" "thunderx_simd")
+
+(define_insn_reservation "thunderx_add" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "adc_imm,adc_reg,adr,alu_imm,alu_sreg,alus_imm,alus_sreg,extend,logic_imm,logic_reg,logics_imm,logics_reg,mov_imm,mov_reg"))
+  "thunderx_pipe0 | thunderx_pipe1")
+
+(define_insn_reservation "thunderx_shift" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "bfm,extend,shift_imm,shift_reg"))
+  "thunderx_pipe0 | thunderx_pipe1")
+
+
+;; Arthimentic instructions with an extra shift or extend is two cycles.
+;; FIXME: This needs more attributes on aarch64 than what is currently there;
+;;    this is conserative for now.
+;; Except this is not correct as this is only for !(LSL && shift by 0/1/2/3)
+;; Except this is not correct as this is only for !(zero extend)
+
+(define_insn_reservation "thunderx_arith_shift" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "alu_ext,alu_shift_imm,alu_shift_reg,alus_ext,logic_shift_imm,logic_shift_reg,logics_shift_imm,logics_shift_reg,alus_shift_imm"))
+  "thunderx_pipe0 | thunderx_pipe1")
+
+(define_insn_reservation "thunderx_csel" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "csel"))
+  "thunderx_pipe0 | thunderx_pipe1")
+
+;; Multiply and mulitply accumulate and count leading zeros can only happen on pipe 1
+
+(define_insn_reservation "thunderx_mul" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "mul,muls,mla,mlas,clz,smull,umull,smlal,umlal"))
+  "thunderx_pipe1 + thunderx_mult")
+
+;; Multiply high instructions take an extra cycle and cause the muliply unit to
+;; be busy for an extra cycle.
+
+;(define_insn_reservation "thunderx_mul_high" 5
+;  (and (eq_attr "tune" "thunderx")
+;       (eq_attr "type" "smull,umull"))
+;  "thunderx_pipe1 + thunderx_mult")
+
+(define_insn_reservation "thunderx_div32" 22
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "udiv,sdiv"))
+  "thunderx_pipe1 + thunderx_divide, thunderx_divide * 21")
+
+;(define_insn_reservation "thunderx_div64" 38
+;  (and (eq_attr "tune" "thunderx")
+;       (eq_attr "type" "udiv,sdiv")
+;       (eq_attr "mode" "DI"))
+;  "thunderx_pipe1 + thunderx_divide, thunderx_divide * 34")
+
+;; Stores take one cycle in pipe 0
+(define_insn_reservation "thunderx_store" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "store1"))
+  "thunderx_pipe0")
+
+;; Store pair are single issued
+(define_insn_reservation "thunderx_storepair" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "store2"))
+  "thunderx_pipe0 + thunderx_pipe1")
+
+
+;; loads (and load pairs) from L1 take 3 cycles in pipe 0
+(define_insn_reservation "thunderx_load" 3
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "load1, load2"))
+  "thunderx_pipe0")
+
+(define_insn_reservation "thunderx_brj" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "branch,trap,call"))
+  "thunderx_pipe1")
+
+;; FPU
+
+(define_insn_reservation "thunderx_fadd" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "faddd,fadds"))
+  "thunderx_pipe1")
+
+(define_insn_reservation "thunderx_fconst" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fconsts,fconstd"))
+  "thunderx_pipe1")
+
+;; Moves between fp are 2 cycles including min/max/select/abs/neg
+(define_insn_reservation "thunderx_fmov" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fmov,f_minmaxs,f_minmaxd,fcsel,ffarithd,ffariths"))
+  "thunderx_pipe1")
+
+(define_insn_reservation "thunderx_fmovgpr" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "f_mrc, f_mcr"))
+  "thunderx_pipe1")
+
+(define_insn_reservation "thunderx_fmul" 6
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fmacs,fmacd,fmuls,fmuld"))
+  "thunderx_pipe1")
+
+(define_insn_reservation "thunderx_fdivs" 12
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fdivs"))
+  "thunderx_pipe1 + thunderx_divide, thunderx_divide*8")
+
+(define_insn_reservation "thunderx_fdivd" 22
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fdivd"))
+  "thunderx_pipe1 + thunderx_divide, thunderx_divide*18")
+
+(define_insn_reservation "thunderx_fsqrts" 17
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fsqrts"))
+  "thunderx_pipe1 + thunderx_divide, thunderx_divide*13")
+
+(define_insn_reservation "thunderx_fsqrtd" 28
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "fsqrtd"))
+  "thunderx_pipe1 + thunderx_divide, thunderx_divide*31")
+
+;; The rounding conversion inside fp is 4 cycles
+(define_insn_reservation "thunderx_frint" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "f_rints,f_rintd"))
+  "thunderx_pipe1")
+
+;; Float to integer with a move from int to/from float is 6 cycles
+(define_insn_reservation "thunderx_f_cvt" 6
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "f_cvt,f_cvtf2i,f_cvti2f"))
+  "thunderx_pipe1")
+
+;; FP/SIMD load/stores happen in pipe 0
+;; 64bit Loads register/pairs are 4 cycles from L1
+(define_insn_reservation "thunderx_64simd_fp_load" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "f_loadd,f_loads,neon_load1_1reg,\
+			neon_load1_1reg_q,neon_load1_2reg"))
+  "thunderx_pipe0")
+
+;; 128bit load pair is singled issue and 4 cycles from L1
+(define_insn_reservation "thunderx_128simd_pair_load" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_load1_2reg_q"))
+  "thunderx_pipe0+thunderx_pipe1")
+
+;; FP/SIMD Stores takes one cycle in pipe 0
+(define_insn_reservation "thunderx_simd_fp_store" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "f_stored,f_stores,neon_store1_1reg,neon_store1_1reg_q"))
+  "thunderx_pipe0")
+
+;; 64bit neon store pairs are single issue for one cycle
+(define_insn_reservation "thunderx_64neon_storepair" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_store1_2reg"))
+  "thunderx_pipe0 + thunderx_pipe1")
+
+;; 128bit neon store pair are single issued for two cycles
+(define_insn_reservation "thunderx_128neon_storepair" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_store1_2reg_q"))
+  "(thunderx_pipe0 + thunderx_pipe1)*2")
+
+
+;; SIMD/NEON (q forms take an extra cycle)
+
+;; Thunder simd move instruction types - 2/3 cycles
+(define_insn_reservation "thunderx_neon_move" 2
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_logic, neon_bsl, neon_fp_compare_s, \
+			neon_fp_compare_d, neon_move"))
+  "thunderx_pipe1 + thunderx_simd")
+
+(define_insn_reservation "thunderx_neon_move_q" 3
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_logic_q, neon_bsl_q, neon_fp_compare_s_q, \
+			neon_fp_compare_d_q, neon_move_q"))
+  "thunderx_pipe1 + thunderx_simd, thunderx_simd")
+
+
+;; Thunder simd simple/add instruction types - 4/5 cycles
+
+(define_insn_reservation "thunderx_neon_add" 4
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_reduc_add, neon_reduc_minmax, neon_fp_reduc_add_s, \
+			neon_fp_reduc_add_d, neon_fp_to_int_s, neon_fp_to_int_d, \
+			neon_add_halve, neon_sub_halve, neon_qadd, neon_compare, \
+			neon_compare_zero, neon_minmax, neon_abd, neon_add, neon_sub, \
+			neon_fp_minmax_s, neon_fp_minmax_d, neon_reduc_add, neon_cls, \
+			neon_qabs, neon_qneg, neon_fp_addsub_s, neon_fp_addsub_d"))
+  "thunderx_pipe1 + thunderx_simd")
+
+;; BIG NOTE: neon_add_long/neon_sub_long don't have a q form which is incorrect
+
+(define_insn_reservation "thunderx_neon_add_q" 5
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "neon_reduc_add_q, neon_reduc_minmax_q, neon_fp_reduc_add_s_q, \
+			neon_fp_reduc_add_d_q, neon_fp_to_int_s_q, neon_fp_to_int_d_q, \
+			neon_add_halve_q, neon_sub_halve_q, neon_qadd_q, neon_compare_q, \
+			neon_compare_zero_q, neon_minmax_q, neon_abd_q, neon_add_q, neon_sub_q, \
+			neon_fp_minmax_s_q, neon_fp_minmax_d_q, neon_reduc_add_q, neon_cls_q, \
+			neon_qabs_q, neon_qneg_q, neon_fp_addsub_s_q, neon_fp_addsub_d_q, \
+			neon_add_long, neon_sub_long"))
+  "thunderx_pipe1 + thunderx_simd, thunderx_simd")
+
+
+;; Thunder 128bit SIMD reads the upper halve in cycle 2 and writes in the last cycle
+(define_bypass 2 "thunderx_neon_move_q" "thunderx_neon_move_q, thunderx_neon_add_q")
+(define_bypass 4 "thunderx_neon_add_q" "thunderx_neon_move_q, thunderx_neon_add_q")
+
+;; Assume both pipes are needed for unknown and multiple-instruction
+;; patterns.
+
+(define_insn_reservation "thunderx_unknown" 1
+  (and (eq_attr "tune" "thunderx")
+       (eq_attr "type" "untyped,multiple"))
+  "thunderx_pipe0 + thunderx_pipe1")
+
+
-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [AARCH64]  Add macro fusion support for cmp/b.X for ThunderX
  2014-11-14  1:02 ` [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX Andrew Pinski
@ 2014-11-14  9:28   ` Kyrill Tkachov
  2014-11-14 10:08     ` Andrew Pinski
  0 siblings, 1 reply; 9+ messages in thread
From: Kyrill Tkachov @ 2014-11-14  9:28 UTC (permalink / raw)
  To: Andrew Pinski, gcc-patches

Hi Andrew,

On 14/11/14 00:56, Andrew Pinski wrote:
> In ThunderX, any 1 cycle arthemantic instruction that produces the flags
> register, will be fused with a branch.  This patch depends on
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01508.html.
> Note I know bit 1 is going is already going to be used and that is why I
> proposed this being bit 2.
>
> Build and tested for aarch64-elf with no regressions.
>
> ChangeLog:
> * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
> (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
> (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
> ---
>   gcc/config/aarch64/aarch64.c |   15 ++++++++++++++-
>   1 files changed, 14 insertions(+), 1 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a258f40..5216ac0 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -304,6 +304,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
>   
>   #define AARCH64_FUSE_NOTHING	(0)
>   #define AARCH64_FUSE_MOV_MOVK	(1 << 0)
> +#define AARCH64_FUSE_CMP_BRANCH	(1 << 2)
>   
>   #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
>   __extension__
> @@ -349,7 +350,7 @@ static const struct tune_params thunderx_tunings =
>     &generic_vector_cost,
>     NAMED_PARAM (memmov_cost, 6),
>     NAMED_PARAM (issue_rate, 2),
> -  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
> +  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
>   };
>   
>   /* A processor implementing AArch64.  */
> @@ -10036,6 +10037,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
>           }
>       }
>   
> +  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)
> +      && any_condjump_p (curr))
> +    {
> +      /* FIXME: this misses some which is considered simple arthematic
> +         instructions for ThunderX.  Simple shifts are missed here.  */
s/is/are

> +      if (get_attr_type (prev) == TYPE_ALUS_SREG
> +          || get_attr_type (prev) == TYPE_ALUS_IMM
> +          || get_attr_type (prev) == TYPE_LOGICS_REG
> +          || get_attr_type (prev) == TYPE_LOGICS_IMM)
> +	return true;

IIRC the get_attr_* functions can call recog_memoized on prev which can 
potentially change
the recog_data for the insn, sometimes resulting in corruption. Is this 
definitely safe to do?

Kyrill

> +    }
> +
>     return false;
>   }
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX
  2014-11-14  9:28   ` Kyrill Tkachov
@ 2014-11-14 10:08     ` Andrew Pinski
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-11-14 10:08 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Andrew Pinski, gcc-patches

On Fri, Nov 14, 2014 at 1:08 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi Andrew,
>
>
> On 14/11/14 00:56, Andrew Pinski wrote:
>>
>> In ThunderX, any 1 cycle arthemantic instruction that produces the flags
>> register, will be fused with a branch.  This patch depends on
>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01508.html.
>> Note I know bit 1 is going is already going to be used and that is why I
>> proposed this being bit 2.
>>
>> Build and tested for aarch64-elf with no regressions.
>>
>> ChangeLog:
>> * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
>> (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
>> (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
>> ---
>>   gcc/config/aarch64/aarch64.c |   15 ++++++++++++++-
>>   1 files changed, 14 insertions(+), 1 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index a258f40..5216ac0 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -304,6 +304,7 @@ static const struct cpu_vector_cost
>> cortexa57_vector_cost =
>>     #define AARCH64_FUSE_NOTHING        (0)
>>   #define AARCH64_FUSE_MOV_MOVK (1 << 0)
>> +#define AARCH64_FUSE_CMP_BRANCH        (1 << 2)
>>     #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
>>   __extension__
>> @@ -349,7 +350,7 @@ static const struct tune_params thunderx_tunings =
>>     &generic_vector_cost,
>>     NAMED_PARAM (memmov_cost, 6),
>>     NAMED_PARAM (issue_rate, 2),
>> -  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
>> +  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
>>   };
>>     /* A processor implementing AArch64.  */
>> @@ -10036,6 +10037,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev,
>> rtx_insn *curr)
>>           }
>>       }
>>   +  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)
>> +      && any_condjump_p (curr))
>> +    {
>> +      /* FIXME: this misses some which is considered simple arthematic
>> +         instructions for ThunderX.  Simple shifts are missed here.  */
>
> s/is/are
>
>> +      if (get_attr_type (prev) == TYPE_ALUS_SREG
>> +          || get_attr_type (prev) == TYPE_ALUS_IMM
>> +          || get_attr_type (prev) == TYPE_LOGICS_REG
>> +          || get_attr_type (prev) == TYPE_LOGICS_IMM)
>> +       return true;
>
>
> IIRC the get_attr_* functions can call recog_memoized on prev which can
> potentially change
> the recog_data for the insn, sometimes resulting in corruption. Is this
> definitely safe to do?

Safe in this context, yes.  I used the similar pattern as what is done for x86:
In the sched-deps.c before calling this function we have the following
(if before reload):
      extract_insn (insn);

extract_insn already will call recog_memoized.

Thanks,
Andrew

>
> Kyrill
>
>> +    }
>> +
>>     return false;
>>   }
>>
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] [AARCH64] Add scheduler for ThunderX
  2014-11-14  1:10 ` [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Andrew Pinski
@ 2014-11-14 11:03   ` Marcus Shawcroft
  2014-11-17 20:17   ` Sebastian Pop
  1 sibling, 0 replies; 9+ messages in thread
From: Marcus Shawcroft @ 2014-11-14 11:03 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches

On 14 November 2014 00:56, Andrew Pinski <apinski@cavium.com> wrote:
> This adds the schedule model for ThunderX. There are a few TODOs in that
> not all of the SIMD is model currently.  Also the idea of a simple
> shift/extend is not modeled and all cases where there is a shift/extend
> is considered as non simple and take up two cycles rather than correct
> value of one cycle.  Also the 32bit divide and the 64bit divide
> have different cycle counts but there is no way to model that currently.
> Also multiply high takes one cycle more than the normal multiply but
> there is no way to model that currently either.
>
> Build and tested for aarch64-elf with no regressions.
>
> ChangeLog:
> * config/aarch64/aarch64-cores.def (thunderx): Change the scheduler
> over to thunderx.
> * config/aarch64/aarch64.md: Include thunderx.md.
> (generic_sched): Set to no for thunderx.
> * config/aarch64/thunderx.md: New file.

OK /Marcus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] [AARCH64] Add scheduler for ThunderX
  2014-11-14  1:10 ` [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Andrew Pinski
  2014-11-14 11:03   ` Marcus Shawcroft
@ 2014-11-17 20:17   ` Sebastian Pop
  2014-11-17 23:03     ` Andrew Pinski
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Pop @ 2014-11-17 20:17 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches

Andrew Pinski wrote:
> diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
> new file mode 100644
> index 0000000..30e4395
> --- /dev/null
> +++ b/gcc/config/aarch64/thunderx.md
> @@ -0,0 +1,260 @@
> +;; Cavium ThunderX pipeline description
> +;; Copyright (C) 2014 Free Software Foundation, Inc.
> +;;
> +;; Written by Andrew Pinski  <apinski@cavium.com>
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +
> +;; GCC is distributed in the hope that it will be useful,
> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;; GNU General Public License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +;;   Copyright (C) 2004, 2005, 2006 Cavium Networks.

You should remove this line before commit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] [AARCH64] Add scheduler for ThunderX
  2014-11-17 20:17   ` Sebastian Pop
@ 2014-11-17 23:03     ` Andrew Pinski
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-11-17 23:03 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Andrew Pinski, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Mon, Nov 17, 2014 at 12:04 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Andrew Pinski wrote:
>> diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
>> new file mode 100644
>> index 0000000..30e4395
>> --- /dev/null
>> +++ b/gcc/config/aarch64/thunderx.md
>> @@ -0,0 +1,260 @@
>> +;; Cavium ThunderX pipeline description
>> +;; Copyright (C) 2014 Free Software Foundation, Inc.
>> +;;
>> +;; Written by Andrew Pinski  <apinski@cavium.com>
>> +
>> +;; This file is part of GCC.
>> +
>> +;; GCC is free software; you can redistribute it and/or modify
>> +;; it under the terms of the GNU General Public License as published by
>> +;; the Free Software Foundation; either version 3, or (at your option)
>> +;; any later version.
>> +
>> +;; GCC is distributed in the hope that it will be useful,
>> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +;; GNU General Public License for more details.
>> +
>> +;; You should have received a copy of the GNU General Public License
>> +;; along with GCC; see the file COPYING3.  If not see
>> +;; <http://www.gnu.org/licenses/>.
>> +;;   Copyright (C) 2004, 2005, 2006 Cavium Networks.
>
> You should remove this line before commit.


Woops, I had missed that when I was writing this code.  It was a copy
and paste from our octeon.md file too.
Anyways committed after a quick build.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/thunderx.md: Remove copyright which should not have been there.

[-- Attachment #2: removecopyright.diff.txt --]
[-- Type: text/plain, Size: 507 bytes --]

Index: config/aarch64/thunderx.md
===================================================================
--- config/aarch64/thunderx.md	(revision 217675)
+++ config/aarch64/thunderx.md	(working copy)
@@ -18,7 +18,6 @@
 ;; You should have received a copy of the GNU General Public License
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
-;;   Copyright (C) 2004, 2005, 2006 Cavium Networks.
 
 
 ;; Thunder is a dual-issue processor that can issue all instructions on

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-17 22:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14  0:56 [PATCH 0/3] Improve ThunderX support Andrew Pinski
2014-11-14  1:02 ` [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX Andrew Pinski
2014-11-14  9:28   ` Kyrill Tkachov
2014-11-14 10:08     ` Andrew Pinski
2014-11-14  1:06 ` [PATCH 3/3] [AARCH64] Add aligning of functions/loops/jumps Andrew Pinski
2014-11-14  1:10 ` [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Andrew Pinski
2014-11-14 11:03   ` Marcus Shawcroft
2014-11-17 20:17   ` Sebastian Pop
2014-11-17 23:03     ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).