* [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
2017-09-12 8:34 [PATCH 0/3] [ARM] Addressing mode costs v3 charles.baylis
@ 2017-09-12 8:34 ` charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-12 8:35 ` [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes charles.baylis
2017-09-12 8:35 ` [PATCH 2/3] [ARM] Refactor costs calculation for MEM charles.baylis
2 siblings, 1 reply; 20+ messages in thread
From: charles.baylis @ 2017-09-12 8:34 UTC (permalink / raw)
To: rearnsha, Ramana.Radhakrishnan, pinskia, kyrylo.tkachov; +Cc: gcc-patches
From: Charles Baylis <charles.baylis@linaro.org>
Add bus widths. These use the approximation that v7 and later cores have
64bit data bus width, and earlier cores have 32 bit bus width, with the
exception of v7m.
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm-protos.h (struct tune_params): New field
bus_width.
* config/arm/arm.c (arm_slowmul_tune): Initialise bus_width field.
(arm_fastmul_tune): Likewise.
(arm_strongarm_tune): Likewise.
(arm_xscale_tune): Likewise.
(arm_9e_tune): Likewise.
(arm_marvell_pj4_tune): Likewise.
(arm_v6t2_tune): Likewise.
(arm_cortex_tune): Likewise.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a15_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a57_tune): Likewise.
(arm_exynosm1_tune): Likewise.
(arm_xgene1_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_cortex_a12_tune): Likewise.
(arm_cortex_a73_tune): Likewise.
(arm_v7m_tune): Likewise.
(arm_cortex_m7_tune): Likewise.
(arm_v6m_tune): Likewise.
(arm_fa726te_tune): Likewise.
Change-Id: I613e876db93ffd6f8c1e72ba483be2efc0b56d66
---
gcc/config/arm/arm-protos.h | 2 ++
gcc/config/arm/arm.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4538078..47a85cc 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -278,6 +278,8 @@ struct tune_params
int max_insns_inline_memset;
/* Issue rate of the processor. */
unsigned int issue_rate;
+ /* Bus width (bits). */
+ unsigned int bus_width;
/* Explicit prefetch data. */
struct
{
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bca8a34..32001e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1761,6 +1761,7 @@ const struct tune_params arm_slowmul_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1783,6 +1784,7 @@ const struct tune_params arm_fastmul_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1808,6 +1810,7 @@ const struct tune_params arm_strongarm_tune =
3, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1830,6 +1833,7 @@ const struct tune_params arm_xscale_tune =
3, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1852,6 +1856,7 @@ const struct tune_params arm_9e_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1874,6 +1879,7 @@ const struct tune_params arm_marvell_pj4_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1896,6 +1902,7 @@ const struct tune_params arm_v6t2_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1920,6 +1927,7 @@ const struct tune_params arm_cortex_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1942,6 +1950,7 @@ const struct tune_params arm_cortex_a8_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1964,6 +1973,7 @@ const struct tune_params arm_cortex_a7_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1986,6 +1996,7 @@ const struct tune_params arm_cortex_a15_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2008,6 +2019,7 @@ const struct tune_params arm_cortex_a35_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2030,6 +2042,7 @@ const struct tune_params arm_cortex_a53_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2052,6 +2065,7 @@ const struct tune_params arm_cortex_a57_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2074,6 +2088,7 @@ const struct tune_params arm_exynosm1_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2096,6 +2111,7 @@ const struct tune_params arm_xgene1_tune =
2, /* Max cond insns. */
32, /* Memset max inline. */
4, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2121,6 +2137,7 @@ const struct tune_params arm_cortex_a5_tune =
1, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2143,6 +2160,7 @@ const struct tune_params arm_cortex_a9_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_BENEFICIAL(4,32,32),
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2165,6 +2183,7 @@ const struct tune_params arm_cortex_a12_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2187,6 +2206,7 @@ const struct tune_params arm_cortex_a73_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2216,6 +2236,7 @@ const struct tune_params arm_v7m_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -2240,6 +2261,7 @@ const struct tune_params arm_cortex_m7_tune =
1, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -2265,6 +2287,7 @@ const struct tune_params arm_v6m_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2287,6 +2310,7 @@ const struct tune_params arm_fa726te_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/3] [ARM] Addressing mode costs v3
@ 2017-09-12 8:34 charles.baylis
2017-09-12 8:34 ` [PATCH 1/3] [ARM] Add bus_width_bits to tune_params charles.baylis
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: charles.baylis @ 2017-09-12 8:34 UTC (permalink / raw)
To: rearnsha, Ramana.Radhakrishnan, pinskia, kyrylo.tkachov; +Cc: gcc-patches
From: Charles Baylis <charles.baylis@linaro.org>
This patch set includes the following updates from v2 [1]:
. addr_mode_costs table moved into struct tune_params from
struct cpu_cost_table (avoids overlap with AArch64 port)
. CPU data bus width now comes from a table entry in struct tune_params.
(Not intended to be 100% accurate, but sufficient for this
patch series)
. test cases for {pre,post}-indexed addressing
[1] https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01518.html and
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01519.html
Charles Baylis (3):
[ARM] Add bus_width_bits to tune_params
[ARM] Refactor costs calculation for MEM.
[ARM] Add table of costs for AAarch32 addressing modes.
gcc/config/arm/arm-protos.h | 22 +++
gcc/config/arm/arm.c | 172 ++++++++++++++++++++----
gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 ++++++
gcc/testsuite/gcc.target/arm/addr-modes-int.c | 46 +++++++
gcc/testsuite/gcc.target/arm/addr-modes.h | 53 ++++++++
5 files changed, 310 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-12 8:34 [PATCH 0/3] [ARM] Addressing mode costs v3 charles.baylis
2017-09-12 8:34 ` [PATCH 1/3] [ARM] Add bus_width_bits to tune_params charles.baylis
@ 2017-09-12 8:35 ` charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-12 8:35 ` [PATCH 2/3] [ARM] Refactor costs calculation for MEM charles.baylis
2 siblings, 1 reply; 20+ messages in thread
From: charles.baylis @ 2017-09-12 8:35 UTC (permalink / raw)
To: rearnsha, Ramana.Radhakrishnan, pinskia, kyrylo.tkachov; +Cc: gcc-patches
From: Charles Baylis <charles.baylis@linaro.org>
This patch adds support for modelling the varying costs of
different addressing modes. The generic cost table treats
all addressing modes as having equal cost.
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm-protos.h (enum arm_addr_mode_op): New.
(struct addr_mode_cost_table): New.
(struct tune_params): Add field addr_mode_costs.
* config/arm/arm.c (generic_addr_mode_costs): New.
(arm_slowmul_tune): Initialise addr_mode_costs field.
(arm_fastmul_tune): Likewise.
(arm_strongarm_tune): Likewise.
(arm_xscale_tune): Likewise.
(arm_9e_tune): Likewise.
(arm_marvell_pj4_tune): Likewise.
(arm_v6t2_tune): Likewise.
(arm_cortex_tune): Likewise.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a15_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a57_tune): Likewise.
(arm_exynosm1_tune): Likewise.
(arm_xgene1_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_cortex_a12_tune): Likewise.
(arm_cortex_a73_tune): Likewise.
(arm_v7m_tune): Likewise.
(arm_cortex_m7_tune): Likewise.
(arm_v6m_tune): Likewise.
(arm_fa726te_tune): Likewise.
(arm_mem_costs): Use table lookup to calculate cost of addressing
mode.
Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e
---
gcc/config/arm/arm-protos.h | 20 +++++++++++
gcc/config/arm/arm.c | 83 ++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 47a85cc..3d6b515 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -261,12 +261,32 @@ struct cpu_vec_costs {
struct cpu_cost_table;
+/* Addressing mode operations. Used to index tables in struct
+ addr_mode_cost_table. */
+enum arm_addr_mode_op
+{
+ AMO_DEFAULT,
+ AMO_NO_WB, /* Offset with no writeback. */
+ AMO_WB, /* Offset with writeback. */
+ AMO_MAX /* For array size. */
+};
+
+/* Table of additional costs when using addressing modes for each
+ access type. */
+struct addr_mode_cost_table
+{
+ const int integer[AMO_MAX];
+ const int fp[AMO_MAX];
+ const int vector[AMO_MAX];
+};
+
/* Dump function ARM_PRINT_TUNE_INFO should be updated whenever this
structure is modified. */
struct tune_params
{
const struct cpu_cost_table *insn_extra_cost;
+ const struct addr_mode_cost_table *addr_mode_costs;
bool (*sched_adjust_cost) (rtx_insn *, int, rtx_insn *, int *);
int (*branch_cost) (bool, bool);
/* Vectorizer costs. */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b8dbed6..0d31f5f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1751,9 +1751,32 @@ const struct cpu_cost_table v7m_extra_costs =
}
};
+const struct addr_mode_cost_table generic_addr_mode_costs =
+{
+ /* int. */
+ {
+ 0, /* AMO_DEFAULT. */
+ 0, /* AMO_NO_WB. */
+ 0 /* AMO_WB. */
+ },
+ /* float. */
+ {
+ 0, /* AMO_DEFAULT. */
+ 0, /* AMO_NO_WB. */
+ 0 /* AMO_WB. */
+ },
+ /* vector. */
+ {
+ 0, /* AMO_DEFAULT. */
+ 0, /* AMO_NO_WB. */
+ 0 /* AMO_WB. */
+ }
+};
+
const struct tune_params arm_slowmul_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1777,6 +1800,7 @@ const struct tune_params arm_slowmul_tune =
const struct tune_params arm_fastmul_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1803,6 +1827,7 @@ const struct tune_params arm_fastmul_tune =
const struct tune_params arm_strongarm_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1826,6 +1851,7 @@ const struct tune_params arm_strongarm_tune =
const struct tune_params arm_xscale_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
xscale_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1849,6 +1875,7 @@ const struct tune_params arm_xscale_tune =
const struct tune_params arm_9e_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1872,6 +1899,7 @@ const struct tune_params arm_9e_tune =
const struct tune_params arm_marvell_pj4_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1895,6 +1923,7 @@ const struct tune_params arm_marvell_pj4_tune =
const struct tune_params arm_v6t2_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1920,6 +1949,7 @@ const struct tune_params arm_v6t2_tune =
const struct tune_params arm_cortex_tune =
{
&generic_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1943,6 +1973,7 @@ const struct tune_params arm_cortex_tune =
const struct tune_params arm_cortex_a8_tune =
{
&cortexa8_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1966,6 +1997,7 @@ const struct tune_params arm_cortex_a8_tune =
const struct tune_params arm_cortex_a7_tune =
{
&cortexa7_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1989,6 +2021,7 @@ const struct tune_params arm_cortex_a7_tune =
const struct tune_params arm_cortex_a15_tune =
{
&cortexa15_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2012,6 +2045,7 @@ const struct tune_params arm_cortex_a15_tune =
const struct tune_params arm_cortex_a35_tune =
{
&cortexa53_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2035,6 +2069,7 @@ const struct tune_params arm_cortex_a35_tune =
const struct tune_params arm_cortex_a53_tune =
{
&cortexa53_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2058,6 +2093,7 @@ const struct tune_params arm_cortex_a53_tune =
const struct tune_params arm_cortex_a57_tune =
{
&cortexa57_extra_costs,
+ &generic_addr_mode_costs, /* addressing mode costs */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2081,6 +2117,7 @@ const struct tune_params arm_cortex_a57_tune =
const struct tune_params arm_exynosm1_tune =
{
&exynosm1_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2104,6 +2141,7 @@ const struct tune_params arm_exynosm1_tune =
const struct tune_params arm_xgene1_tune =
{
&xgene1_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2130,6 +2168,7 @@ const struct tune_params arm_xgene1_tune =
const struct tune_params arm_cortex_a5_tune =
{
&cortexa5_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_a5_branch_cost,
&arm_default_vec_cost,
@@ -2153,6 +2192,7 @@ const struct tune_params arm_cortex_a5_tune =
const struct tune_params arm_cortex_a9_tune =
{
&cortexa9_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
cortex_a9_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2176,6 +2216,7 @@ const struct tune_params arm_cortex_a9_tune =
const struct tune_params arm_cortex_a12_tune =
{
&cortexa12_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2199,6 +2240,7 @@ const struct tune_params arm_cortex_a12_tune =
const struct tune_params arm_cortex_a73_tune =
{
&cortexa57_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2229,6 +2271,7 @@ const struct tune_params arm_cortex_a73_tune =
const struct tune_params arm_v7m_tune =
{
&v7m_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_m_branch_cost,
&arm_default_vec_cost,
@@ -2254,6 +2297,7 @@ const struct tune_params arm_v7m_tune =
const struct tune_params arm_cortex_m7_tune =
{
&v7m_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_m7_branch_cost,
&arm_default_vec_cost,
@@ -2280,6 +2324,7 @@ const struct tune_params arm_cortex_m7_tune =
const struct tune_params arm_v6m_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2303,6 +2348,7 @@ const struct tune_params arm_v6m_tune =
const struct tune_params arm_fa726te_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
fa726te_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -9249,7 +9295,42 @@ arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
/* Calculate cost of the addressing mode. */
if (speed_p)
{
- /* TODO: Add table-driven costs for addressing modes. (See patch 2) */
+ arm_addr_mode_op op_type;
+ switch (GET_CODE (XEXP (x, 0)))
+ {
+ default:
+ case REG:
+ op_type = AMO_DEFAULT;
+ break;
+ case MINUS:
+ /* MINUS does not appear in RTL, but the architecture supports it,
+ so handle this case defensively. */
+ /* fall through */
+ case PLUS:
+ op_type = AMO_NO_WB;
+ break;
+ case PRE_INC:
+ case PRE_DEC:
+ case POST_INC:
+ case POST_DEC:
+ case PRE_MODIFY:
+ case POST_MODIFY:
+ op_type = AMO_WB;
+ break;
+ }
+
+ if (VECTOR_MODE_P (mode))
+ {
+ *cost += current_tune->addr_mode_costs->vector[op_type];
+ }
+ else if (FLOAT_MODE_P (mode))
+ {
+ *cost += current_tune->addr_mode_costs->fp[op_type];
+ }
+ else
+ {
+ *cost += current_tune->addr_mode_costs->integer[op_type];
+ }
}
/* Calculate cost of memory access. */
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-09-12 8:34 [PATCH 0/3] [ARM] Addressing mode costs v3 charles.baylis
2017-09-12 8:34 ` [PATCH 1/3] [ARM] Add bus_width_bits to tune_params charles.baylis
2017-09-12 8:35 ` [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes charles.baylis
@ 2017-09-12 8:35 ` charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2 siblings, 1 reply; 20+ messages in thread
From: charles.baylis @ 2017-09-12 8:35 UTC (permalink / raw)
To: rearnsha, Ramana.Radhakrishnan, pinskia, kyrylo.tkachov; +Cc: gcc-patches
From: Charles Baylis <charles.baylis@linaro.org>
This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.
In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.
gcc/testsuite/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.
Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
gcc/config/arm/arm.c | 67 ++++++++++++++++---------
gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 ++++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes-int.c | 46 +++++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes.h | 53 +++++++++++++++++++
4 files changed, 183 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 32001e5..b8dbed6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9228,8 +9228,48 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
} \
while (0);
+/* Helper function for arm_rtx_costs_internal. Calculates the cost of a MEM,
+ considering the costs of the addressing mode and memory access
+ separately. */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+ int *cost, bool speed_p)
+{
+ machine_mode mode = GET_MODE (x);
+ if (flag_pic
+ && GET_CODE (XEXP (x, 0)) == PLUS
+ && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+ /* This will be split into two instructions. Add the cost of the
+ additional instruction here. The cost of the memory access is computed
+ below. See arm.md:calculate_pic_address. */
+ *cost = COSTS_N_INSNS (1);
+ else
+ *cost = 0;
+
+ /* Calculate cost of the addressing mode. */
+ if (speed_p)
+ {
+ /* TODO: Add table-driven costs for addressing modes. (See patch 2) */
+ }
+
+ /* Calculate cost of memory access. */
+ if (speed_p)
+ {
+ /* data transfer is transfer size divided by bus width. */
+ int bus_width_bytes = current_tune->bus_width / 4;
+ *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+ *cost += extra_cost->ldst.load;
+ }
+ else
+ {
+ *cost += COSTS_N_INSNS (1);
+ }
+
+ return true;
+}
+
/* RTX costs. Make an estimate of the cost of executing the operation
- X, which is contained with an operation with code OUTER_CODE.
+ X, which is contained within an operation with code OUTER_CODE.
SPEED_P indicates whether the cost desired is the performance cost,
or the size cost. The estimate is stored in COST and the return
value is TRUE if the cost calculation is final, or FALSE if the
@@ -9308,30 +9348,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
return false;
case MEM:
- /* A memory access costs 1 insn if the mode is small, or the address is
- a single register, otherwise it costs one insn per word. */
- if (REG_P (XEXP (x, 0)))
- *cost = COSTS_N_INSNS (1);
- else if (flag_pic
- && GET_CODE (XEXP (x, 0)) == PLUS
- && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
- /* This will be split into two instructions.
- See arm.md:calculate_pic_address. */
- *cost = COSTS_N_INSNS (2);
- else
- *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
- /* For speed optimizations, add the costs of the address and
- accessing memory. */
- if (speed_p)
-#ifdef NOT_YET
- *cost += (extra_cost->ldst.load
- + arm_address_cost (XEXP (x, 0), mode,
- ADDR_SPACE_GENERIC, speed_p));
-#else
- *cost += extra_cost->ldst.load;
-#endif
- return true;
+ return arm_mem_costs (x, extra_cost, cost, speed_p);
case PARALLEL:
{
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@
+
+#define PRE_STORE(T) \
+ T * \
+ T ## _pre_store (T *p, T v) \
+ { \
+ *++p = v; \
+ return p; \
+ } \
+
+#define POST_STORE(T) \
+ T * \
+ T ## _post_store (T *p, T v) \
+ { \
+ *p++ = v; \
+ return p; \
+ }
+
+#define POST_STORE_VEC(T, VT, OP) \
+ T * \
+ VT ## _post_store (T * p, VT v) \
+ { \
+ OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ return p; \
+ }
+
+#define PRE_LOAD(T) \
+ void \
+ T ## _pre_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *++p; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD(T) \
+ void \
+ T ## _post_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *p++; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD_VEC(T, VT, OP) \
+ void \
+ VT ## _post_load (T * p) \
+ { \
+ extern void f ## T (T*,T); \
+ VT x = OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ f ## T (p, x); \
+ }
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
2017-09-12 8:34 ` [PATCH 1/3] [ARM] Add bus_width_bits to tune_params charles.baylis
@ 2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-13 9:02 UTC (permalink / raw)
To: charles.baylis, Richard Earnshaw, Ramana Radhakrishnan, pinskia
Cc: gcc-patches
Hi Charles,
On 12/09/17 09:34, charles.baylis@linaro.org wrote:
> From: Charles Baylis <charles.baylis@linaro.org>
>
> Add bus widths. These use the approximation that v7 and later cores have
> 64bit data bus width, and earlier cores have 32 bit bus width, with the
> exception of v7m.
>
Given the way this field is used in patch 2 does it affect the
addressing mode generation
in the tests you added depending on the -mtune option given?
If so, we'll get testsuite failures when people test with particular
default CPU configurations.
Could you expand on the benefits we get from this extra bus_width
information?
I get that we increase the cost of memory accesses if the size of the
mode we load is larger than the
bus width, but it's not as if there is ever an alternative in this
regard, such as loading less memory,
so what pass can make different decisions thanks to this field?
Thanks,
Kyrill
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * config/arm/arm-protos.h (struct tune_params): New field
> bus_width.
> * config/arm/arm.c (arm_slowmul_tune): Initialise bus_width field.
> (arm_fastmul_tune): Likewise.
> (arm_strongarm_tune): Likewise.
> (arm_xscale_tune): Likewise.
> (arm_9e_tune): Likewise.
> (arm_marvell_pj4_tune): Likewise.
> (arm_v6t2_tune): Likewise.
> (arm_cortex_tune): Likewise.
> (arm_cortex_a8_tune): Likewise.
> (arm_cortex_a7_tune): Likewise.
> (arm_cortex_a15_tune): Likewise.
> (arm_cortex_a35_tune): Likewise.
> (arm_cortex_a53_tune): Likewise.
> (arm_cortex_a57_tune): Likewise.
> (arm_exynosm1_tune): Likewise.
> (arm_xgene1_tune): Likewise.
> (arm_cortex_a5_tune): Likewise.
> (arm_cortex_a9_tune): Likewise.
> (arm_cortex_a12_tune): Likewise.
> (arm_cortex_a73_tune): Likewise.
> (arm_v7m_tune): Likewise.
> (arm_cortex_m7_tune): Likewise.
> (arm_v6m_tune): Likewise.
> (arm_fa726te_tune): Likewise.
>
> Change-Id: I613e876db93ffd6f8c1e72ba483be2efc0b56d66
> ---
> gcc/config/arm/arm-protos.h | 2 ++
> gcc/config/arm/arm.c | 24 ++++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 4538078..47a85cc 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -278,6 +278,8 @@ struct tune_params
> int max_insns_inline_memset;
> /* Issue rate of the processor. */
> unsigned int issue_rate;
> + /* Bus width (bits). */
> + unsigned int bus_width;
> /* Explicit prefetch data. */
> struct
> {
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index bca8a34..32001e5 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1761,6 +1761,7 @@ const struct tune_params arm_slowmul_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1783,6 +1784,7 @@ const struct tune_params arm_fastmul_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1808,6 +1810,7 @@ const struct tune_params arm_strongarm_tune =
> 3, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1830,6 +1833,7 @@ const struct tune_params arm_xscale_tune =
> 3, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1852,6 +1856,7 @@ const struct tune_params arm_9e_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1874,6 +1879,7 @@ const struct tune_params arm_marvell_pj4_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1896,6 +1902,7 @@ const struct tune_params arm_v6t2_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1920,6 +1927,7 @@ const struct tune_params arm_cortex_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1942,6 +1950,7 @@ const struct tune_params arm_cortex_a8_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1964,6 +1973,7 @@ const struct tune_params arm_cortex_a7_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -1986,6 +1996,7 @@ const struct tune_params arm_cortex_a15_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 3, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2008,6 +2019,7 @@ const struct tune_params arm_cortex_a35_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2030,6 +2042,7 @@ const struct tune_params arm_cortex_a53_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2052,6 +2065,7 @@ const struct tune_params arm_cortex_a57_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 3, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2074,6 +2088,7 @@ const struct tune_params arm_exynosm1_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 3, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2096,6 +2111,7 @@ const struct tune_params arm_xgene1_tune =
> 2, /* Max cond insns. */
> 32, /* Memset max inline. */
> 4, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2121,6 +2137,7 @@ const struct tune_params arm_cortex_a5_tune =
> 1, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2143,6 +2160,7 @@ const struct tune_params arm_cortex_a9_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_BENEFICIAL(4,32,32),
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2165,6 +2183,7 @@ const struct tune_params arm_cortex_a12_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2187,6 +2206,7 @@ const struct tune_params arm_cortex_a73_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_TRUE,
> @@ -2216,6 +2236,7 @@ const struct tune_params arm_v7m_tune =
> 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2240,6 +2261,7 @@ const struct tune_params arm_cortex_m7_tune =
> 1, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 64, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2265,6 +2287,7 @@ const struct tune_params arm_v6m_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_FALSE,
> tune_params::PREF_LDRD_FALSE,
> @@ -2287,6 +2310,7 @@ const struct tune_params arm_fa726te_tune =
> 5, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> + 32, /* Bus width. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> tune_params::PREF_CONST_POOL_TRUE,
> tune_params::PREF_LDRD_FALSE,
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-09-12 8:35 ` [PATCH 2/3] [ARM] Refactor costs calculation for MEM charles.baylis
@ 2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-13 9:02 UTC (permalink / raw)
To: charles.baylis, Richard Earnshaw, Ramana Radhakrishnan, pinskia
Cc: gcc-patches
Hi Charles,
On 12/09/17 09:34, charles.baylis@linaro.org wrote:
> From: Charles Baylis <charles.baylis@linaro.org>
>
> This patch moves the calculation of costs for MEM into a
> separate function, and reforms the calculation into two
> parts. Firstly any additional cost of the addressing mode
> is calculated, and then the cost of the memory access itself
> is added.
>
> In this patch, the calculation of the cost of the addressing
> mode is left as a placeholder, to be added in a subsequent
> patch.
>
Can you please mention how has this series been tested?
A bootstrap and test run on arm-none-linux-gnueabihf is required at least.
Also, do you have any benchmarking results for this?
I agree that generating the addressing modes in the new tests is desirable.
So I'm not objecting to the goal of this patch, but a check to make sure
that this doesn't regress SPEC
would be great. Further comments on the patch inline.
> gcc/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * config/arm/arm.c (arm_mem_costs): New function.
> (arm_rtx_costs_internal): Use arm_mem_costs.
>
> gcc/testsuite/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * gcc.target/arm/addr-modes-float.c: New test.
> * gcc.target/arm/addr-modes-int.c: New test.
> * gcc.target/arm/addr-modes.h: New header.
>
> Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
> ---
> gcc/config/arm/arm.c | 67
> ++++++++++++++++---------
> gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 ++++++++++++++++
> gcc/testsuite/gcc.target/arm/addr-modes-int.c | 46 +++++++++++++++++
> gcc/testsuite/gcc.target/arm/addr-modes.h | 53 +++++++++++++++++++
> 4 files changed, 183 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
> create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
> create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 32001e5..b8dbed6 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -9228,8 +9228,48 @@ arm_unspec_cost (rtx x, enum rtx_code /*
> outer_code */, bool speed_p, int *cost)
> } \
> while (0);
>
> +/* Helper function for arm_rtx_costs_internal. Calculates the cost
> of a MEM,
> + considering the costs of the addressing mode and memory access
> + separately. */
> +static bool
> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
> + int *cost, bool speed_p)
> +{
> + machine_mode mode = GET_MODE (x);
> + if (flag_pic
> + && GET_CODE (XEXP (x, 0)) == PLUS
> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
> + /* This will be split into two instructions. Add the cost of the
> + additional instruction here. The cost of the memory access is
> computed
> + below. See arm.md:calculate_pic_address. */
> + *cost = COSTS_N_INSNS (1);
> + else
> + *cost = 0;
For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a
each insn)
plus the appropriate field in extra_cost. So you should unconditionally
initialise the cost
to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)
with the condition above.
> +
> + /* Calculate cost of the addressing mode. */
> + if (speed_p)
> + {
> + /* TODO: Add table-driven costs for addressing modes. (See
> patch 2) */
> + }
You mean "patch 3". I recommend you just remove this conditional from
this patch and add the logic
in patch 3 entirely.
> +
> + /* Calculate cost of memory access. */
> + if (speed_p)
> + {
> + /* data transfer is transfer size divided by bus width. */
> + int bus_width_bytes = current_tune->bus_width / 4;
This should be bus_width / BITS_PER_UNIT to get the size in bytes.
BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
cost calculation and generated code is still appropriate.
> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
> + *cost += extra_cost->ldst.load;
> + }
> + else
> + {
> + *cost += COSTS_N_INSNS (1);
> + }
Given my first comment above this else would be deleted.
Thanks,
Kyrill
> +
> + return true;
> +}
> +
> /* RTX costs. Make an estimate of the cost of executing the operation
> - X, which is contained with an operation with code OUTER_CODE.
> + X, which is contained within an operation with code OUTER_CODE.
> SPEED_P indicates whether the cost desired is the performance cost,
> or the size cost. The estimate is stored in COST and the return
> value is TRUE if the cost calculation is final, or FALSE if the
> @@ -9308,30 +9348,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> code, enum rtx_code outer_code,
> return false;
>
> case MEM:
> - /* A memory access costs 1 insn if the mode is small, or the
> address is
> - a single register, otherwise it costs one insn per word. */
> - if (REG_P (XEXP (x, 0)))
> - *cost = COSTS_N_INSNS (1);
> - else if (flag_pic
> - && GET_CODE (XEXP (x, 0)) == PLUS
> - && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
> - /* This will be split into two instructions.
> - See arm.md:calculate_pic_address. */
> - *cost = COSTS_N_INSNS (2);
> - else
> - *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
> -
> - /* For speed optimizations, add the costs of the address and
> - accessing memory. */
> - if (speed_p)
> -#ifdef NOT_YET
> - *cost += (extra_cost->ldst.load
> - + arm_address_cost (XEXP (x, 0), mode,
> - ADDR_SPACE_GENERIC, speed_p));
> -#else
> - *cost += extra_cost->ldst.load;
> -#endif
> - return true;
> + return arm_mem_costs (x, extra_cost, cost, speed_p);
>
> case PARALLEL:
> {
> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c
> b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
> new file mode 100644
> index 0000000..3b4235c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
> @@ -0,0 +1,42 @@
> +/* { dg-options "-O2" } */
> +/* { dg-add-options arm_neon } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-do compile } */
> +
> +#include <arm_neon.h>
> +
> +#include "addr-modes.h"
> +
> +POST_STORE(float)
> +/* { dg-final { scan-assembler "vstmia.32" } } */
> +POST_STORE(double)
> +/* { dg-final { scan-assembler "vstmia.64" } } */
> +
> +POST_LOAD(float)
> +/* { dg-final { scan-assembler "vldmia.32" } } */
> +POST_LOAD(double)
> +/* { dg-final { scan-assembler "vldmia.64" } } */
> +
> +POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
> +/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" }
> } */
> +POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
> +/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\},
> \\\[r\[0-9\]+\\\]!" } } */
> +
> +POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
> +/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" }
> } */
> +POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
> +/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\},
> \\\[r\[0-9\]+\\\]!" } } */
> +
> +POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
> +/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" }
> } */
> +POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
> +/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\],
> d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
> +/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\],
> d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
> +
> +POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
> +/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" }
> } */
> +POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
> +/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\],
> d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
> +/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\],
> d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
> +
> +/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c
> b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
> new file mode 100644
> index 0000000..e3e1e6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
> @@ -0,0 +1,46 @@
> +/* { dg-options "-O2 -march=armv7-a" } */
> +/* { dg-add-options arm_neon } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-do compile } */
> +
> +#include "addr-modes.h"
> +
> +typedef long long ll;
> +
> +PRE_STORE(char)
> +/* { dg-final { scan-assembler "strb.*#1]!" } } */
> +PRE_STORE(short)
> +/* { dg-final { scan-assembler "strh.*#2]!" } } */
> +PRE_STORE(int)
> +/* { dg-final { scan-assembler "str.*#4]!" } } */
> +PRE_STORE(ll)
> +/* { dg-final { scan-assembler "strd.*#8]!" } } */
> +
> +POST_STORE(char)
> +/* { dg-final { scan-assembler "strb.*], #1" } } */
> +POST_STORE(short)
> +/* { dg-final { scan-assembler "strh.*], #2" } } */
> +POST_STORE(int)
> +/* { dg-final { scan-assembler "str.*], #4" } } */
> +POST_STORE(ll)
> +/* { dg-final { scan-assembler "strd.*], #8" } } */
> +
> +PRE_LOAD(char)
> +/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
> +PRE_LOAD(short)
> +/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
> +PRE_LOAD(int)
> +/* { dg-final { scan-assembler "ldr.*#4]!" } } */
> +PRE_LOAD(ll)
> +/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
> +
> +POST_LOAD(char)
> +/* { dg-final { scan-assembler "ldrb.*], #1" } } */
> +POST_LOAD(short)
> +/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
> +POST_LOAD(int)
> +/* { dg-final { scan-assembler "ldr.*], #4" } } */
> +POST_LOAD(ll)
> +/* { dg-final { scan-assembler "ldrd.*], #8" } } */
> +
> +/* { dg-final { scan-assembler-not "\tadd" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h
> b/gcc/testsuite/gcc.target/arm/addr-modes.h
> new file mode 100644
> index 0000000..eac4678
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
> @@ -0,0 +1,53 @@
> +
> +#define PRE_STORE(T) \
> + T * \
> + T ## _pre_store (T *p, T v) \
> + { \
> + *++p = v; \
> + return p; \
> + } \
> +
> +#define POST_STORE(T) \
> + T * \
> + T ## _post_store (T *p, T v) \
> + { \
> + *p++ = v; \
> + return p; \
> + }
> +
> +#define POST_STORE_VEC(T, VT, OP) \
> + T * \
> + VT ## _post_store (T * p, VT v) \
> + { \
> + OP (p, v); \
> + p += sizeof (VT) / sizeof (T); \
> + return p; \
> + }
> +
> +#define PRE_LOAD(T) \
> + void \
> + T ## _pre_load (T *p) \
> + { \
> + extern void f ## T (T*,T); \
> + T x = *++p; \
> + f ## T (p, x); \
> + }
> +
> +#define POST_LOAD(T) \
> + void \
> + T ## _post_load (T *p) \
> + { \
> + extern void f ## T (T*,T); \
> + T x = *p++; \
> + f ## T (p, x); \
> + }
> +
> +#define POST_LOAD_VEC(T, VT, OP) \
> + void \
> + VT ## _post_load (T * p) \
> + { \
> + extern void f ## T (T*,T); \
> + VT x = OP (p, v); \
> + p += sizeof (VT) / sizeof (T); \
> + f ## T (p, x); \
> + }
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-12 8:35 ` [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes charles.baylis
@ 2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
2017-09-15 15:38 ` Charles Baylis
0 siblings, 2 replies; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-13 9:02 UTC (permalink / raw)
To: charles.baylis, Richard Earnshaw, Ramana Radhakrishnan, pinskia
Cc: gcc-patches
Hi Charles,
On 12/09/17 09:34, charles.baylis@linaro.org wrote:
> From: Charles Baylis <charles.baylis@linaro.org>
>
> This patch adds support for modelling the varying costs of
> different addressing modes. The generic cost table treats
> all addressing modes as having equal cost.
>
> gcc/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * config/arm/arm-protos.h (enum arm_addr_mode_op): New.
> (struct addr_mode_cost_table): New.
> (struct tune_params): Add field addr_mode_costs.
> * config/arm/arm.c (generic_addr_mode_costs): New.
> (arm_slowmul_tune): Initialise addr_mode_costs field.
> (arm_fastmul_tune): Likewise.
> (arm_strongarm_tune): Likewise.
> (arm_xscale_tune): Likewise.
> (arm_9e_tune): Likewise.
> (arm_marvell_pj4_tune): Likewise.
> (arm_v6t2_tune): Likewise.
> (arm_cortex_tune): Likewise.
> (arm_cortex_a8_tune): Likewise.
> (arm_cortex_a7_tune): Likewise.
> (arm_cortex_a15_tune): Likewise.
> (arm_cortex_a35_tune): Likewise.
> (arm_cortex_a53_tune): Likewise.
> (arm_cortex_a57_tune): Likewise.
> (arm_exynosm1_tune): Likewise.
> (arm_xgene1_tune): Likewise.
> (arm_cortex_a5_tune): Likewise.
> (arm_cortex_a9_tune): Likewise.
> (arm_cortex_a12_tune): Likewise.
> (arm_cortex_a73_tune): Likewise.
> (arm_v7m_tune): Likewise.
> (arm_cortex_m7_tune): Likewise.
> (arm_v6m_tune): Likewise.
> (arm_fa726te_tune): Likewise.
> (arm_mem_costs): Use table lookup to calculate cost of addressing
> mode.
>
> Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e
> ---
> gcc/config/arm/arm-protos.h | 20 +++++++++++
> gcc/config/arm/arm.c | 83
> ++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 47a85cc..3d6b515 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -261,12 +261,32 @@ struct cpu_vec_costs {
>
> struct cpu_cost_table;
>
> +/* Addressing mode operations. Used to index tables in struct
> + addr_mode_cost_table. */
> +enum arm_addr_mode_op
> +{
> + AMO_DEFAULT,
> + AMO_NO_WB, /* Offset with no writeback. */
> + AMO_WB, /* Offset with writeback. */
> + AMO_MAX /* For array size. */
> +};
> +
> +/* Table of additional costs when using addressing modes for each
> + access type. */
Please add a comment here saying that the units are in COSTS_N_INSNS
so that we can reduce the temptation to use these in inappropriate contexts.
> +struct addr_mode_cost_table
> +{
> + const int integer[AMO_MAX];
> + const int fp[AMO_MAX];
> + const int vector[AMO_MAX];
> +};
> +
> /* Dump function ARM_PRINT_TUNE_INFO should be updated whenever this
> structure is modified. */
>
> struct tune_params
> {
> const struct cpu_cost_table *insn_extra_cost;
> + const struct addr_mode_cost_table *addr_mode_costs;
> bool (*sched_adjust_cost) (rtx_insn *, int, rtx_insn *, int *);
> int (*branch_cost) (bool, bool);
> /* Vectorizer costs. */
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index b8dbed6..0d31f5f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1751,9 +1751,32 @@ const struct cpu_cost_table v7m_extra_costs =
> }
> };
>
> +const struct addr_mode_cost_table generic_addr_mode_costs =
> +{
> + /* int. */
> + {
> + 0, /* AMO_DEFAULT. */
> + 0, /* AMO_NO_WB. */
> + 0 /* AMO_WB. */
> + },
> + /* float. */
> + {
> + 0, /* AMO_DEFAULT. */
> + 0, /* AMO_NO_WB. */
> + 0 /* AMO_WB. */
> + },
> + /* vector. */
> + {
> + 0, /* AMO_DEFAULT. */
> + 0, /* AMO_NO_WB. */
> + 0 /* AMO_WB. */
> + }
> +};
> +
> const struct tune_params arm_slowmul_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1777,6 +1800,7 @@ const struct tune_params arm_slowmul_tune =
> const struct tune_params arm_fastmul_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1803,6 +1827,7 @@ const struct tune_params arm_fastmul_tune =
> const struct tune_params arm_strongarm_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1826,6 +1851,7 @@ const struct tune_params arm_strongarm_tune =
> const struct tune_params arm_xscale_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> xscale_sched_adjust_cost,
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1849,6 +1875,7 @@ const struct tune_params arm_xscale_tune =
> const struct tune_params arm_9e_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1872,6 +1899,7 @@ const struct tune_params arm_9e_tune =
> const struct tune_params arm_marvell_pj4_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1895,6 +1923,7 @@ const struct tune_params arm_marvell_pj4_tune =
> const struct tune_params arm_v6t2_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1920,6 +1949,7 @@ const struct tune_params arm_v6t2_tune =
> const struct tune_params arm_cortex_tune =
> {
> &generic_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1943,6 +1973,7 @@ const struct tune_params arm_cortex_tune =
> const struct tune_params arm_cortex_a8_tune =
> {
> &cortexa8_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1966,6 +1997,7 @@ const struct tune_params arm_cortex_a8_tune =
> const struct tune_params arm_cortex_a7_tune =
> {
> &cortexa7_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -1989,6 +2021,7 @@ const struct tune_params arm_cortex_a7_tune =
> const struct tune_params arm_cortex_a15_tune =
> {
> &cortexa15_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2012,6 +2045,7 @@ const struct tune_params arm_cortex_a15_tune =
> const struct tune_params arm_cortex_a35_tune =
> {
> &cortexa53_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2035,6 +2069,7 @@ const struct tune_params arm_cortex_a35_tune =
> const struct tune_params arm_cortex_a53_tune =
> {
> &cortexa53_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2058,6 +2093,7 @@ const struct tune_params arm_cortex_a53_tune =
> const struct tune_params arm_cortex_a57_tune =
> {
> &cortexa57_extra_costs,
> + &generic_addr_mode_costs, /* addressing mode costs */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2081,6 +2117,7 @@ const struct tune_params arm_cortex_a57_tune =
> const struct tune_params arm_exynosm1_tune =
> {
> &exynosm1_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode
> costs. */
> NULL, /* Sched adj
> cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2104,6 +2141,7 @@ const struct tune_params arm_exynosm1_tune =
> const struct tune_params arm_xgene1_tune =
> {
> &xgene1_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2130,6 +2168,7 @@ const struct tune_params arm_xgene1_tune =
> const struct tune_params arm_cortex_a5_tune =
> {
> &cortexa5_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_cortex_a5_branch_cost,
> &arm_default_vec_cost,
> @@ -2153,6 +2192,7 @@ const struct tune_params arm_cortex_a5_tune =
> const struct tune_params arm_cortex_a9_tune =
> {
> &cortexa9_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> cortex_a9_sched_adjust_cost,
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -2176,6 +2216,7 @@ const struct tune_params arm_cortex_a9_tune =
> const struct tune_params arm_cortex_a12_tune =
> {
> &cortexa12_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost, /* Vectorizer costs. */
> @@ -2199,6 +2240,7 @@ const struct tune_params arm_cortex_a12_tune =
> const struct tune_params arm_cortex_a73_tune =
> {
> &cortexa57_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode
> costs. */
> NULL, /* Sched adj
> cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost, /* Vectorizer costs. */
> @@ -2229,6 +2271,7 @@ const struct tune_params arm_cortex_a73_tune =
> const struct tune_params arm_v7m_tune =
> {
> &v7m_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_cortex_m_branch_cost,
> &arm_default_vec_cost,
> @@ -2254,6 +2297,7 @@ const struct tune_params arm_v7m_tune =
> const struct tune_params arm_cortex_m7_tune =
> {
> &v7m_extra_costs,
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_cortex_m7_branch_cost,
> &arm_default_vec_cost,
> @@ -2280,6 +2324,7 @@ const struct tune_params arm_cortex_m7_tune =
> const struct tune_params arm_v6m_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode costs. */
> NULL, /* Sched adj cost. */
> arm_default_branch_cost,
> &arm_default_vec_cost, /* Vectorizer costs. */
> @@ -2303,6 +2348,7 @@ const struct tune_params arm_v6m_tune =
> const struct tune_params arm_fa726te_tune =
> {
> &generic_extra_costs, /* Insn extra costs. */
> + &generic_addr_mode_costs, /* Addressing mode
> costs. */
> fa726te_sched_adjust_cost,
> arm_default_branch_cost,
> &arm_default_vec_cost,
> @@ -9249,7 +9295,42 @@ arm_mem_costs (rtx x, const struct
> cpu_cost_table *extra_cost,
> /* Calculate cost of the addressing mode. */
> if (speed_p)
> {
> - /* TODO: Add table-driven costs for addressing modes. (See
> patch 2) */
> + arm_addr_mode_op op_type;
> + switch (GET_CODE (XEXP (x, 0)))
> + {
> + default:
> + case REG:
> + op_type = AMO_DEFAULT;
> + break;
> + case MINUS:
> + /* MINUS does not appear in RTL, but the architecture
> supports it,
> + so handle this case defensively. */
> + /* fall through */
> + case PLUS:
> + op_type = AMO_NO_WB;
> + break;
> + case PRE_INC:
> + case PRE_DEC:
> + case POST_INC:
> + case POST_DEC:
> + case PRE_MODIFY:
> + case POST_MODIFY:
> + op_type = AMO_WB;
> + break;
> + }
> +
> + if (VECTOR_MODE_P (mode))
> + {
> + *cost += current_tune->addr_mode_costs->vector[op_type];
> + }
> + else if (FLOAT_MODE_P (mode))
> + {
> + *cost += current_tune->addr_mode_costs->fp[op_type];
> + }
> + else
> + {
> + *cost += current_tune->addr_mode_costs->integer[op_type];
> + }
No need for brackets for single-statement conditionals.
This is okay once the prerequisites are committed.
Kyrill
> }
>
> /* Calculate cost of memory access. */
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
@ 2017-09-15 15:38 ` Charles Baylis
2017-09-15 16:57 ` Kyrill Tkachov
1 sibling, 1 reply; 20+ messages in thread
From: Charles Baylis @ 2017-09-15 15:38 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 683 bytes --]
On 13 September 2017 at 10:02, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Please add a comment here saying that the units are in COSTS_N_INSNS
> so that we can reduce the temptation to use these in inappropriate contexts.
>> + if (VECTOR_MODE_P (mode))
>> + {
>> + *cost += current_tune->addr_mode_costs->vector[op_type];
>> + }
>> + else if (FLOAT_MODE_P (mode))
>> + {
>> + *cost += current_tune->addr_mode_costs->fp[op_type];
>> + }
>> + else
>> + {
>> + *cost += current_tune->addr_mode_costs->integer[op_type];
>> + }
>
>
> No need for brackets for single-statement conditionals.
Done.
[-- Attachment #2: 0003-ARM-Add-table-of-costs-for-AAarch32-addressing-modes.patch --]
[-- Type: text/x-patch, Size: 11946 bytes --]
From a35fa59f4dc3be42a52519a90bdd2d47e74db086 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Thu, 14 Sep 2017 12:47:41 +0100
Subject: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
This patch adds support for modelling the varying costs of
different addressing modes. The generic cost table treats
all addressing modes as having equal cost.
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm-protos.h (enum arm_addr_mode_op): New.
(struct addr_mode_cost_table): New.
(struct tune_params): Add field addr_mode_costs.
* config/arm/arm.c (generic_addr_mode_costs): New.
(arm_slowmul_tune): Initialise addr_mode_costs field.
(arm_fastmul_tune): Likewise.
(arm_strongarm_tune): Likewise.
(arm_xscale_tune): Likewise.
(arm_9e_tune): Likewise.
(arm_marvell_pj4_tune): Likewise.
(arm_v6t2_tune): Likewise.
(arm_cortex_tune): Likewise.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a15_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a57_tune): Likewise.
(arm_exynosm1_tune): Likewise.
(arm_xgene1_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_cortex_a12_tune): Likewise.
(arm_cortex_a73_tune): Likewise.
(arm_v7m_tune): Likewise.
(arm_cortex_m7_tune): Likewise.
(arm_v6m_tune): Likewise.
(arm_fa726te_tune): Likewise.
(arm_mem_costs): Use table lookup to calculate cost of addressing
mode.
Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e
---
gcc/config/arm/arm-protos.h | 20 +++++++++++
gcc/config/arm/arm.c | 81 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 47a85cc..7769726 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -261,12 +261,32 @@ struct cpu_vec_costs {
struct cpu_cost_table;
+/* Addressing mode operations. Used to index tables in struct
+ addr_mode_cost_table. */
+enum arm_addr_mode_op
+{
+ AMO_DEFAULT,
+ AMO_NO_WB, /* Offset with no writeback. */
+ AMO_WB, /* Offset with writeback. */
+ AMO_MAX /* For array size. */
+};
+
+/* Table of additional costs in units of COSTS_N_INSNS() when using
+ addressing modes for each access type. */
+struct addr_mode_cost_table
+{
+ const int integer[AMO_MAX];
+ const int fp[AMO_MAX];
+ const int vector[AMO_MAX];
+};
+
/* Dump function ARM_PRINT_TUNE_INFO should be updated whenever this
structure is modified. */
struct tune_params
{
const struct cpu_cost_table *insn_extra_cost;
+ const struct addr_mode_cost_table *addr_mode_costs;
bool (*sched_adjust_cost) (rtx_insn *, int, rtx_insn *, int *);
int (*branch_cost) (bool, bool);
/* Vectorizer costs. */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64230b8..7773ec3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1751,9 +1751,32 @@ const struct cpu_cost_table v7m_extra_costs =
}
};
+const struct addr_mode_cost_table generic_addr_mode_costs =
+{
+ /* int. */
+ {
+ COSTS_N_INSNS (0), /* AMO_DEFAULT. */
+ COSTS_N_INSNS (0), /* AMO_NO_WB. */
+ COSTS_N_INSNS (0) /* AMO_WB. */
+ },
+ /* float. */
+ {
+ COSTS_N_INSNS (0), /* AMO_DEFAULT. */
+ COSTS_N_INSNS (0), /* AMO_NO_WB. */
+ COSTS_N_INSNS (0) /* AMO_WB. */
+ },
+ /* vector. */
+ {
+ COSTS_N_INSNS (0), /* AMO_DEFAULT. */
+ COSTS_N_INSNS (0), /* AMO_NO_WB. */
+ COSTS_N_INSNS (0) /* AMO_WB. */
+ }
+};
+
const struct tune_params arm_slowmul_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1777,6 +1800,7 @@ const struct tune_params arm_slowmul_tune =
const struct tune_params arm_fastmul_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1803,6 +1827,7 @@ const struct tune_params arm_fastmul_tune =
const struct tune_params arm_strongarm_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1826,6 +1851,7 @@ const struct tune_params arm_strongarm_tune =
const struct tune_params arm_xscale_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
xscale_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1849,6 +1875,7 @@ const struct tune_params arm_xscale_tune =
const struct tune_params arm_9e_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1872,6 +1899,7 @@ const struct tune_params arm_9e_tune =
const struct tune_params arm_marvell_pj4_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1895,6 +1923,7 @@ const struct tune_params arm_marvell_pj4_tune =
const struct tune_params arm_v6t2_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1920,6 +1949,7 @@ const struct tune_params arm_v6t2_tune =
const struct tune_params arm_cortex_tune =
{
&generic_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1943,6 +1973,7 @@ const struct tune_params arm_cortex_tune =
const struct tune_params arm_cortex_a8_tune =
{
&cortexa8_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1966,6 +1997,7 @@ const struct tune_params arm_cortex_a8_tune =
const struct tune_params arm_cortex_a7_tune =
{
&cortexa7_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -1989,6 +2021,7 @@ const struct tune_params arm_cortex_a7_tune =
const struct tune_params arm_cortex_a15_tune =
{
&cortexa15_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2012,6 +2045,7 @@ const struct tune_params arm_cortex_a15_tune =
const struct tune_params arm_cortex_a35_tune =
{
&cortexa53_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2035,6 +2069,7 @@ const struct tune_params arm_cortex_a35_tune =
const struct tune_params arm_cortex_a53_tune =
{
&cortexa53_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2058,6 +2093,7 @@ const struct tune_params arm_cortex_a53_tune =
const struct tune_params arm_cortex_a57_tune =
{
&cortexa57_extra_costs,
+ &generic_addr_mode_costs, /* addressing mode costs */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2081,6 +2117,7 @@ const struct tune_params arm_cortex_a57_tune =
const struct tune_params arm_exynosm1_tune =
{
&exynosm1_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2104,6 +2141,7 @@ const struct tune_params arm_exynosm1_tune =
const struct tune_params arm_xgene1_tune =
{
&xgene1_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2130,6 +2168,7 @@ const struct tune_params arm_xgene1_tune =
const struct tune_params arm_cortex_a5_tune =
{
&cortexa5_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_a5_branch_cost,
&arm_default_vec_cost,
@@ -2153,6 +2192,7 @@ const struct tune_params arm_cortex_a5_tune =
const struct tune_params arm_cortex_a9_tune =
{
&cortexa9_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
cortex_a9_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -2176,6 +2216,7 @@ const struct tune_params arm_cortex_a9_tune =
const struct tune_params arm_cortex_a12_tune =
{
&cortexa12_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2199,6 +2240,7 @@ const struct tune_params arm_cortex_a12_tune =
const struct tune_params arm_cortex_a73_tune =
{
&cortexa57_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2229,6 +2271,7 @@ const struct tune_params arm_cortex_a73_tune =
const struct tune_params arm_v7m_tune =
{
&v7m_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_m_branch_cost,
&arm_default_vec_cost,
@@ -2254,6 +2297,7 @@ const struct tune_params arm_v7m_tune =
const struct tune_params arm_cortex_m7_tune =
{
&v7m_extra_costs,
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_cortex_m7_branch_cost,
&arm_default_vec_cost,
@@ -2280,6 +2324,7 @@ const struct tune_params arm_cortex_m7_tune =
const struct tune_params arm_v6m_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost, /* Vectorizer costs. */
@@ -2303,6 +2348,7 @@ const struct tune_params arm_v6m_tune =
const struct tune_params arm_fa726te_tune =
{
&generic_extra_costs, /* Insn extra costs. */
+ &generic_addr_mode_costs, /* Addressing mode costs. */
fa726te_sched_adjust_cost,
arm_default_branch_cost,
&arm_default_vec_cost,
@@ -9247,6 +9293,41 @@ arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
below. See arm.md:calculate_pic_address. */
*cost += COSTS_N_INSNS (1);
+ /* Calculate cost of the addressing mode. */
+ if (speed_p)
+ {
+ arm_addr_mode_op op_type;
+ switch (GET_CODE (XEXP (x, 0)))
+ {
+ default:
+ case REG:
+ op_type = AMO_DEFAULT;
+ break;
+ case MINUS:
+ /* MINUS does not appear in RTL, but the architecture supports it,
+ so handle this case defensively. */
+ /* fall through */
+ case PLUS:
+ op_type = AMO_NO_WB;
+ break;
+ case PRE_INC:
+ case PRE_DEC:
+ case POST_INC:
+ case POST_DEC:
+ case PRE_MODIFY:
+ case POST_MODIFY:
+ op_type = AMO_WB;
+ break;
+ }
+
+ if (VECTOR_MODE_P (mode))
+ *cost += current_tune->addr_mode_costs->vector[op_type];
+ else if (FLOAT_MODE_P (mode))
+ *cost += current_tune->addr_mode_costs->fp[op_type];
+ else
+ *cost += current_tune->addr_mode_costs->integer[op_type];
+ }
+
/* Calculate cost of memory access. */
if (speed_p)
{
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
2017-09-13 9:02 ` Kyrill Tkachov
@ 2017-09-15 15:38 ` Charles Baylis
2017-09-15 17:01 ` Kyrill Tkachov
0 siblings, 1 reply; 20+ messages in thread
From: Charles Baylis @ 2017-09-15 15:38 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1617 bytes --]
On 13 September 2017 at 10:02, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Charles,
>
> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
>>
>> From: Charles Baylis <charles.baylis@linaro.org>
>>
>> Add bus widths. These use the approximation that v7 and later cores have
>> 64bit data bus width, and earlier cores have 32 bit bus width, with the
>> exception of v7m.
>>
>
> Given the way this field is used in patch 2 does it affect the addressing
> mode generation
> in the tests you added depending on the -mtune option given?
> If so, we'll get testsuite failures when people test with particular default
> CPU configurations.
No, because the auto_inc_dec phase compares the cost of two different
MEMs which differ only by addressing mode. The part of the calculation
which depends on the bus_width is the same both times, so it is
cancelled out.
> Could you expand on the benefits we get from this extra bus_width
> information?
> I get that we increase the cost of memory accesses if the size of the mode
> we load is larger than the
> bus width, but it's not as if there is ever an alternative in this regard,
> such as loading less memory,
> so what pass can make different decisions thanks to this field?
As far as this patch series is concerned, it doesn't matter. It is
there to encapsulate the notion that a larger transfer results in
rtx_costs() returning a larger cost, but I don't know of any part of
the compiler which is sensitive to that difference. It's done this way
because Ramana and Richard wanted it done that way
(https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00652.html).
[-- Attachment #2: 0001-ARM-Add-bus_width_bits-to-tune_params.patch --]
[-- Type: text/x-patch, Size: 9290 bytes --]
From b7bec2e4f7ca0335e0e5bd84c297215a3a7fb8c7 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Fri, 8 Sep 2017 12:53:50 +0100
Subject: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
Add bus widths. These use the approximation that v7 and later cores have
64bit data bus width, and earlier cores have 32 bit bus width, with the
exception of v7m.
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm-protos.h (struct tune_params): New field
bus_width.
* config/arm/arm.c (arm_slowmul_tune): Initialise bus_width field.
(arm_fastmul_tune): Likewise.
(arm_strongarm_tune): Likewise.
(arm_xscale_tune): Likewise.
(arm_9e_tune): Likewise.
(arm_marvell_pj4_tune): Likewise.
(arm_v6t2_tune): Likewise.
(arm_cortex_tune): Likewise.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a15_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a57_tune): Likewise.
(arm_exynosm1_tune): Likewise.
(arm_xgene1_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_cortex_a12_tune): Likewise.
(arm_cortex_a73_tune): Likewise.
(arm_v7m_tune): Likewise.
(arm_cortex_m7_tune): Likewise.
(arm_v6m_tune): Likewise.
(arm_fa726te_tune): Likewise.
Change-Id: I613e876db93ffd6f8c1e72ba483be2efc0b56d66
---
gcc/config/arm/arm-protos.h | 2 ++
gcc/config/arm/arm.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4538078..47a85cc 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -278,6 +278,8 @@ struct tune_params
int max_insns_inline_memset;
/* Issue rate of the processor. */
unsigned int issue_rate;
+ /* Bus width (bits). */
+ unsigned int bus_width;
/* Explicit prefetch data. */
struct
{
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bca8a34..32001e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1761,6 +1761,7 @@ const struct tune_params arm_slowmul_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1783,6 +1784,7 @@ const struct tune_params arm_fastmul_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1808,6 +1810,7 @@ const struct tune_params arm_strongarm_tune =
3, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1830,6 +1833,7 @@ const struct tune_params arm_xscale_tune =
3, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1852,6 +1856,7 @@ const struct tune_params arm_9e_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1874,6 +1879,7 @@ const struct tune_params arm_marvell_pj4_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -1896,6 +1902,7 @@ const struct tune_params arm_v6t2_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1920,6 +1927,7 @@ const struct tune_params arm_cortex_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1942,6 +1950,7 @@ const struct tune_params arm_cortex_a8_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1964,6 +1973,7 @@ const struct tune_params arm_cortex_a7_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -1986,6 +1996,7 @@ const struct tune_params arm_cortex_a15_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2008,6 +2019,7 @@ const struct tune_params arm_cortex_a35_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2030,6 +2042,7 @@ const struct tune_params arm_cortex_a53_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2052,6 +2065,7 @@ const struct tune_params arm_cortex_a57_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2074,6 +2088,7 @@ const struct tune_params arm_exynosm1_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
3, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2096,6 +2111,7 @@ const struct tune_params arm_xgene1_tune =
2, /* Max cond insns. */
32, /* Memset max inline. */
4, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2121,6 +2137,7 @@ const struct tune_params arm_cortex_a5_tune =
1, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2143,6 +2160,7 @@ const struct tune_params arm_cortex_a9_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_BENEFICIAL(4,32,32),
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2165,6 +2183,7 @@ const struct tune_params arm_cortex_a12_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2187,6 +2206,7 @@ const struct tune_params arm_cortex_a73_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_TRUE,
@@ -2216,6 +2236,7 @@ const struct tune_params arm_v7m_tune =
2, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -2240,6 +2261,7 @@ const struct tune_params arm_cortex_m7_tune =
1, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 64, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
@@ -2265,6 +2287,7 @@ const struct tune_params arm_v6m_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
@@ -2287,6 +2310,7 @@ const struct tune_params arm_fa726te_tune =
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
+ 32, /* Bus width. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_TRUE,
tune_params::PREF_LDRD_FALSE,
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-13 9:02 ` Kyrill Tkachov
@ 2017-09-15 15:38 ` Charles Baylis
2017-09-15 15:38 ` Charles Baylis
1 sibling, 0 replies; 20+ messages in thread
From: Charles Baylis @ 2017-09-15 15:38 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 13 September 2017 at 10:02, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Please add a comment here saying that the units are in COSTS_N_INSNS
> so that we can reduce the temptation to use these in inappropriate contexts.
>> + if (VECTOR_MODE_P (mode))
>> + {
>> + *cost += current_tune->addr_mode_costs->vector[op_type];
>> + }
>> + else if (FLOAT_MODE_P (mode))
>> + {
>> + *cost += current_tune->addr_mode_costs->fp[op_type];
>> + }
>> + else
>> + {
>> + *cost += current_tune->addr_mode_costs->integer[op_type];
>> + }
>
>
> No need for brackets for single-statement conditionals.
Done.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-09-13 9:02 ` Kyrill Tkachov
@ 2017-09-15 15:38 ` Charles Baylis
2017-09-15 17:01 ` Kyrill Tkachov
0 siblings, 1 reply; 20+ messages in thread
From: Charles Baylis @ 2017-09-15 15:38 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 3501 bytes --]
On 13 September 2017 at 10:02, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Charles,
>
> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
>>
>> From: Charles Baylis <charles.baylis@linaro.org>
>>
>> This patch moves the calculation of costs for MEM into a
>> separate function, and reforms the calculation into two
>> parts. Firstly any additional cost of the addressing mode
>> is calculated, and then the cost of the memory access itself
>> is added.
>>
>> In this patch, the calculation of the cost of the addressing
>> mode is left as a placeholder, to be added in a subsequent
>> patch.
>>
>
> Can you please mention how has this series been tested?
> A bootstrap and test run on arm-none-linux-gnueabihf is required at least.
It has been tested with make check on arm-unknown-linux-gnueabihf with
no regressions. I've successfully bootstrapped the next spin.
> Also, do you have any benchmarking results for this?
> I agree that generating the addressing modes in the new tests is desirable.
> So I'm not objecting to the goal of this patch, but a check to make sure
> that this doesn't regress SPEC
> would be great. Further comments on the patch inline.
SPEC2006 scores are unaffected by this patch on Cortex-A57.
>> +/* Helper function for arm_rtx_costs_internal. Calculates the cost of a
>> MEM,
>> + considering the costs of the addressing mode and memory access
>> + separately. */
>> +static bool
>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
>> + int *cost, bool speed_p)
>> +{
>> + machine_mode mode = GET_MODE (x);
>> + if (flag_pic
>> + && GET_CODE (XEXP (x, 0)) == PLUS
>> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
>> + /* This will be split into two instructions. Add the cost of the
>> + additional instruction here. The cost of the memory access is
>> computed
>> + below. See arm.md:calculate_pic_address. */
>> + *cost = COSTS_N_INSNS (1);
>> + else
>> + *cost = 0;
>
>
> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each
> insn)
> plus the appropriate field in extra_cost. So you should unconditionally
> initialise the cost
> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with
> the condition above.
OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
part because the cost of a single bus transfer is included in that
initial cost.
>> +
>> + /* Calculate cost of the addressing mode. */
>> + if (speed_p)
>> + {
>> + /* TODO: Add table-driven costs for addressing modes. (See patch
>> 2) */
>> + }
>
>
> You mean "patch 3". I recommend you just remove this conditional from this
> patch and add the logic
> in patch 3 entirely.
OK.
>> +
>> + /* Calculate cost of memory access. */
>> + if (speed_p)
>> + {
>> + /* data transfer is transfer size divided by bus width. */
>> + int bus_width_bytes = current_tune->bus_width / 4;
>
>
> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
> cost calculation and generated code is still appropriate.
Oops, I changed the units around and messed this up. I'll fix this.
>> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
>> + *cost += extra_cost->ldst.load;
>> + }
>> + else
>> + {
>> + *cost += COSTS_N_INSNS (1);
>> + }
>
> Given my first comment above this else would be deleted.
OK
[-- Attachment #2: 0002-ARM-Refactor-costs-calculation-for-MEM.patch --]
[-- Type: text/x-patch, Size: 9469 bytes --]
From f81e1d3212475a3dc7aaeb8cb3171c6defd40687 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Wed, 8 Feb 2017 16:52:10 +0000
Subject: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.
In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.
gcc/testsuite/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.
Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
gcc/config/arm/arm.c | 60 ++++++++++++++-----------
gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 +++++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes-int.c | 46 +++++++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes.h | 53 ++++++++++++++++++++++
4 files changed, 176 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 32001e5..64230b8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9228,8 +9228,41 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
} \
while (0);
+/* Helper function for arm_rtx_costs_internal. Calculates the cost of a MEM,
+ considering the costs of the addressing mode and memory access
+ separately. */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+ int *cost, bool speed_p)
+{
+ machine_mode mode = GET_MODE (x);
+
+ *cost = COSTS_N_INSNS (1);
+
+ if (flag_pic
+ && GET_CODE (XEXP (x, 0)) == PLUS
+ && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+ /* This will be split into two instructions. Add the cost of the
+ additional instruction here. The cost of the memory access is computed
+ below. See arm.md:calculate_pic_address. */
+ *cost += COSTS_N_INSNS (1);
+
+ /* Calculate cost of memory access. */
+ if (speed_p)
+ {
+ /* data transfer is transfer size divided by bus width. */
+ int bus_width_bytes = current_tune->bus_width / BITS_PER_UNIT;
+ int num_transfers = CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+ /* first transfer included in initial cost */
+ *cost += COSTS_N_INSNS (num_transfers - 1);
+ *cost += extra_cost->ldst.load;
+ }
+
+ return true;
+}
+
/* RTX costs. Make an estimate of the cost of executing the operation
- X, which is contained with an operation with code OUTER_CODE.
+ X, which is contained within an operation with code OUTER_CODE.
SPEED_P indicates whether the cost desired is the performance cost,
or the size cost. The estimate is stored in COST and the return
value is TRUE if the cost calculation is final, or FALSE if the
@@ -9308,30 +9341,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
return false;
case MEM:
- /* A memory access costs 1 insn if the mode is small, or the address is
- a single register, otherwise it costs one insn per word. */
- if (REG_P (XEXP (x, 0)))
- *cost = COSTS_N_INSNS (1);
- else if (flag_pic
- && GET_CODE (XEXP (x, 0)) == PLUS
- && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
- /* This will be split into two instructions.
- See arm.md:calculate_pic_address. */
- *cost = COSTS_N_INSNS (2);
- else
- *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
- /* For speed optimizations, add the costs of the address and
- accessing memory. */
- if (speed_p)
-#ifdef NOT_YET
- *cost += (extra_cost->ldst.load
- + arm_address_cost (XEXP (x, 0), mode,
- ADDR_SPACE_GENERIC, speed_p));
-#else
- *cost += extra_cost->ldst.load;
-#endif
- return true;
+ return arm_mem_costs (x, extra_cost, cost, speed_p);
case PARALLEL:
{
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@
+
+#define PRE_STORE(T) \
+ T * \
+ T ## _pre_store (T *p, T v) \
+ { \
+ *++p = v; \
+ return p; \
+ } \
+
+#define POST_STORE(T) \
+ T * \
+ T ## _post_store (T *p, T v) \
+ { \
+ *p++ = v; \
+ return p; \
+ }
+
+#define POST_STORE_VEC(T, VT, OP) \
+ T * \
+ VT ## _post_store (T * p, VT v) \
+ { \
+ OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ return p; \
+ }
+
+#define PRE_LOAD(T) \
+ void \
+ T ## _pre_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *++p; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD(T) \
+ void \
+ T ## _post_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *p++; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD_VEC(T, VT, OP) \
+ void \
+ VT ## _post_load (T * p) \
+ { \
+ extern void f ## T (T*,T); \
+ VT x = OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ f ## T (p, x); \
+ }
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-15 15:38 ` Charles Baylis
@ 2017-09-15 16:57 ` Kyrill Tkachov
2017-11-23 19:25 ` Charles Baylis
0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-15 16:57 UTC (permalink / raw)
To: Charles Baylis
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 15/09/17 16:38, Charles Baylis wrote:
> On 13 September 2017 at 10:02, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>
>> Please add a comment here saying that the units are in COSTS_N_INSNS
>> so that we can reduce the temptation to use these in inappropriate contexts.
>>> + if (VECTOR_MODE_P (mode))
>>> + {
>>> + *cost += current_tune->addr_mode_costs->vector[op_type];
>>> + }
>>> + else if (FLOAT_MODE_P (mode))
>>> + {
>>> + *cost += current_tune->addr_mode_costs->fp[op_type];
>>> + }
>>> + else
>>> + {
>>> + *cost += current_tune->addr_mode_costs->integer[op_type];
>>> + }
>>
>> No need for brackets for single-statement conditionals.
> Done.
Thanks, this is ok once the prerequisites are sorted.
Kyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-09-15 15:38 ` Charles Baylis
@ 2017-09-15 17:01 ` Kyrill Tkachov
2017-11-20 21:11 ` Charles Baylis
0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-15 17:01 UTC (permalink / raw)
To: Charles Baylis
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 15/09/17 16:38, Charles Baylis wrote:
> On 13 September 2017 at 10:02, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Charles,
>>
>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
>>> From: Charles Baylis <charles.baylis@linaro.org>
>>>
>>> This patch moves the calculation of costs for MEM into a
>>> separate function, and reforms the calculation into two
>>> parts. Firstly any additional cost of the addressing mode
>>> is calculated, and then the cost of the memory access itself
>>> is added.
>>>
>>> In this patch, the calculation of the cost of the addressing
>>> mode is left as a placeholder, to be added in a subsequent
>>> patch.
>>>
>> Can you please mention how has this series been tested?
>> A bootstrap and test run on arm-none-linux-gnueabihf is required at least.
> It has been tested with make check on arm-unknown-linux-gnueabihf with
> no regressions. I've successfully bootstrapped the next spin.
Thanks.
>> Also, do you have any benchmarking results for this?
>> I agree that generating the addressing modes in the new tests is desirable.
>> So I'm not objecting to the goal of this patch, but a check to make sure
>> that this doesn't regress SPEC
>> would be great. Further comments on the patch inline.
> SPEC2006 scores are unaffected by this patch on Cortex-A57.
Good, thanks for checking :)
>>> +/* Helper function for arm_rtx_costs_internal. Calculates the cost of a
>>> MEM,
>>> + considering the costs of the addressing mode and memory access
>>> + separately. */
>>> +static bool
>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
>>> + int *cost, bool speed_p)
>>> +{
>>> + machine_mode mode = GET_MODE (x);
>>> + if (flag_pic
>>> + && GET_CODE (XEXP (x, 0)) == PLUS
>>> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
>>> + /* This will be split into two instructions. Add the cost of the
>>> + additional instruction here. The cost of the memory access is
>>> computed
>>> + below. See arm.md:calculate_pic_address. */
>>> + *cost = COSTS_N_INSNS (1);
>>> + else
>>> + *cost = 0;
>>
>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each
>> insn)
>> plus the appropriate field in extra_cost. So you should unconditionally
>> initialise the cost
>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with
>> the condition above.
> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
> part because the cost of a single bus transfer is included in that
> initial cost.
>
>>> +
>>> + /* Calculate cost of the addressing mode. */
>>> + if (speed_p)
>>> + {
>>> + /* TODO: Add table-driven costs for addressing modes. (See patch
>>> 2) */
>>> + }
>>
>> You mean "patch 3". I recommend you just remove this conditional from this
>> patch and add the logic
>> in patch 3 entirely.
> OK.
>
>>> +
>>> + /* Calculate cost of memory access. */
>>> + if (speed_p)
>>> + {
>>> + /* data transfer is transfer size divided by bus width. */
>>> + int bus_width_bytes = current_tune->bus_width / 4;
>>
>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
>> cost calculation and generated code is still appropriate.
> Oops, I changed the units around and messed this up. I'll fix this.
>
>>> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
>>> + *cost += extra_cost->ldst.load;
>>> + }
>>> + else
>>> + {
>>> + *cost += COSTS_N_INSNS (1);
>>> + }
>> Given my first comment above this else would be deleted.
> OK
I have a concern about using the bus_width parameter which
I explain in the thread for patch 1 (I don't think we need it, we should
use the fields in extra_cost->ldst
more carefully).
Kyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
2017-09-15 15:38 ` Charles Baylis
@ 2017-09-15 17:01 ` Kyrill Tkachov
2017-11-20 21:12 ` Charles Baylis
0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-09-15 17:01 UTC (permalink / raw)
To: Charles Baylis
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 15/09/17 16:38, Charles Baylis wrote:
> On 13 September 2017 at 10:02, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Charles,
>>
>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
>>> From: Charles Baylis <charles.baylis@linaro.org>
>>>
>>> Add bus widths. These use the approximation that v7 and later cores have
>>> 64bit data bus width, and earlier cores have 32 bit bus width, with the
>>> exception of v7m.
>>>
>> Given the way this field is used in patch 2 does it affect the addressing
>> mode generation
>> in the tests you added depending on the -mtune option given?
>> If so, we'll get testsuite failures when people test with particular default
>> CPU configurations.
> No, because the auto_inc_dec phase compares the cost of two different
> MEMs which differ only by addressing mode. The part of the calculation
> which depends on the bus_width is the same both times, so it is
> cancelled out.
>
>> Could you expand on the benefits we get from this extra bus_width
>> information?
>> I get that we increase the cost of memory accesses if the size of the mode
>> we load is larger than the
>> bus width, but it's not as if there is ever an alternative in this regard,
>> such as loading less memory,
>> so what pass can make different decisions thanks to this field?
> As far as this patch series is concerned, it doesn't matter. It is
> there to encapsulate the notion that a larger transfer results in
> rtx_costs() returning a larger cost, but I don't know of any part of
> the compiler which is sensitive to that difference. It's done this way
> because Ramana and Richard wanted it done that way
> (https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00652.html).
From what I can tell Ramana and Richard preferred to encode this
attribute as
a tuning struct property rather than an inline conditional based on
arm_arch7.
I agree that if we want to use that information, it should be encoded
this way.
What I'm not convinced about is whether we do want this parameter in the
first place.
The cost tables already encode information about the costs of different
sized loads/stores.
In patch 2, for example, you add the cost for extra_cost->ldst.load
which is nominally just
the cost of a normal 32-bit ldr. But we also have costs for ldst.ldrd
which is the 64-bit two-register load
which should reflect any extra cost due to a narrower bus in it. We also
have costs for ldst.loadf (for 32-bit
VFP loads) and ldst.loadd (for 64-bit VFP D-register loads). So I think
we should use those cost fields
depending on the mode class and size instead of using ldst.load
unconditionally and adding a new bus_size parameter.
So I think the way forward is to drop this patch and modify patch 2/3 to
use the extra_cost->ldst fields as described above.
Sorry for the back-and-forth. I think this is the best approach because
it uses the existing fields more naturally and
doesn't add new parameters that partly duplicate the information encoded
in the existing fields.
Ramana, Richard: if you prefer the bus_width approach I won't block it,
but could you clarify your preference?
If we do end up adding the bus_width parameter then this patch and patch
2/3 look ok.
Thanks,
Kyrill
P.S. I'm going on a 4-week holiday from today, so I won't be able to do
any further review in that timeframe.
As I said, if we go with the bus_size approach then these patches are
ok. If we go with my suggestion, this would
be dropped and patch 2 would be extended to select the appropriate
extra_cost->ldst field depending on mode.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-09-15 17:01 ` Kyrill Tkachov
@ 2017-11-20 21:11 ` Charles Baylis
2017-11-21 10:21 ` Charles Baylis
2017-11-23 10:50 ` Kyrill Tkachov
0 siblings, 2 replies; 20+ messages in thread
From: Charles Baylis @ 2017-11-20 21:11 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4765 bytes --]
On 15 September 2017 at 18:01, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 15/09/17 16:38, Charles Baylis wrote:
>>
>> On 13 September 2017 at 10:02, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> Hi Charles,
>>>
>>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
>>>>
>>>> From: Charles Baylis <charles.baylis@linaro.org>
>>>>
>>>> This patch moves the calculation of costs for MEM into a
>>>> separate function, and reforms the calculation into two
>>>> parts. Firstly any additional cost of the addressing mode
>>>> is calculated, and then the cost of the memory access itself
>>>> is added.
>>>>
>>>> In this patch, the calculation of the cost of the addressing
>>>> mode is left as a placeholder, to be added in a subsequent
>>>> patch.
>>>>
>>> Can you please mention how has this series been tested?
>>> A bootstrap and test run on arm-none-linux-gnueabihf is required at
>>> least.
>>
>> It has been tested with make check on arm-unknown-linux-gnueabihf with
>> no regressions. I've successfully bootstrapped the next spin.
>
>
> Thanks.
>
>>> Also, do you have any benchmarking results for this?
>>> I agree that generating the addressing modes in the new tests is
>>> desirable.
>>> So I'm not objecting to the goal of this patch, but a check to make sure
>>> that this doesn't regress SPEC
>>> would be great. Further comments on the patch inline.
>>
>> SPEC2006 scores are unaffected by this patch on Cortex-A57.
>
>
> Good, thanks for checking :)
>
>
>>>> +/* Helper function for arm_rtx_costs_internal. Calculates the cost of
>>>> a
>>>> MEM,
>>>> + considering the costs of the addressing mode and memory access
>>>> + separately. */
>>>> +static bool
>>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
>>>> + int *cost, bool speed_p)
>>>> +{
>>>> + machine_mode mode = GET_MODE (x);
>>>> + if (flag_pic
>>>> + && GET_CODE (XEXP (x, 0)) == PLUS
>>>> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
>>>> + /* This will be split into two instructions. Add the cost of the
>>>> + additional instruction here. The cost of the memory access is
>>>> computed
>>>> + below. See arm.md:calculate_pic_address. */
>>>> + *cost = COSTS_N_INSNS (1);
>>>> + else
>>>> + *cost = 0;
>>>
>>>
>>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a
>>> each
>>> insn)
>>> plus the appropriate field in extra_cost. So you should unconditionally
>>> initialise the cost
>>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)
>>> with
>>> the condition above.
>>
>> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
>> part because the cost of a single bus transfer is included in that
>> initial cost.
>>
>>>> +
>>>> + /* Calculate cost of the addressing mode. */
>>>> + if (speed_p)
>>>> + {
>>>> + /* TODO: Add table-driven costs for addressing modes. (See patch
>>>> 2) */
>>>> + }
>>>
>>>
>>> You mean "patch 3". I recommend you just remove this conditional from
>>> this
>>> patch and add the logic
>>> in patch 3 entirely.
>>
>> OK.
>>
>>>> +
>>>> + /* Calculate cost of memory access. */
>>>> + if (speed_p)
>>>> + {
>>>> + /* data transfer is transfer size divided by bus width. */
>>>> + int bus_width_bytes = current_tune->bus_width / 4;
>>>
>>>
>>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
>>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure
>>> the
>>> cost calculation and generated code is still appropriate.
>>
>> Oops, I changed the units around and messed this up. I'll fix this.
>>
>>>> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
>>>> + *cost += extra_cost->ldst.load;
>>>> + }
>>>> + else
>>>> + {
>>>> + *cost += COSTS_N_INSNS (1);
>>>> + }
>>>
>>> Given my first comment above this else would be deleted.
>>
>> OK
>
>
> I have a concern about using the bus_width parameter which
> I explain in the thread for patch 1 (I don't think we need it, we should use
> the fields in extra_cost->ldst
> more carefully).
I have modified this patch accordingly. Patch 1 is no longer needed.
Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
regressions. Bootstrap is in progress.
Can I still get this in during stage 3?
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.
gcc/testsuite/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.
[-- Attachment #2: 0002-ARM-Refactor-costs-calculation-for-MEM.patch --]
[-- Type: text/x-patch, Size: 9491 bytes --]
From 26d9c0839ef7318074d3fd38dca3989bd3e51d54 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Wed, 8 Feb 2017 16:52:10 +0000
Subject: [PATCH 1/3] [ARM] Refactor costs calculation for MEM.
This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.
In this patch, the calculation of the cost of the addressing
mode is omitted, to be added in a subsequent patch.
gcc/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.
gcc/testsuite/ChangeLog:
<date> Charles Baylis <charles.baylis@linaro.org>
* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.
Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
gcc/config/arm/arm.c | 71 ++++++++++++++++---------
gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 +++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes-int.c | 46 ++++++++++++++++
gcc/testsuite/gcc.target/arm/addr-modes.h | 53 ++++++++++++++++++
4 files changed, 187 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1c2f8fa..ce59d80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9242,8 +9242,52 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
} \
while (0)
+/* Helper function for arm_rtx_costs_internal. Calculates the cost of a MEM,
+ considering the costs of the addressing mode and memory access
+ separately. */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+ int *cost, bool speed_p)
+{
+ machine_mode mode = GET_MODE (x);
+
+ *cost = COSTS_N_INSNS (1);
+
+ if (flag_pic
+ && GET_CODE (XEXP (x, 0)) == PLUS
+ && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+ /* This will be split into two instructions. Add the cost of the
+ additional instruction here. The cost of the memory access is computed
+ below. See arm.md:calculate_pic_address. */
+ *cost += COSTS_N_INSNS (1);
+
+ /* Calculate cost of memory access. */
+ if (speed_p)
+ {
+ if (FLOAT_MODE_P (mode))
+ {
+ if (GET_MODE_SIZE (mode) == 8)
+ *cost += extra_cost->ldst.loadd;
+ else
+ *cost += extra_cost->ldst.loadf;
+ }
+ else if (VECTOR_MODE_P (mode))
+ *cost += extra_cost->ldst.loadv;
+ else
+ {
+ /* Integer modes */
+ if (GET_MODE_SIZE (mode) == 8)
+ *cost += extra_cost->ldst.ldrd;
+ else
+ *cost += extra_cost->ldst.load;
+ }
+ }
+
+ return true;
+}
+
/* RTX costs. Make an estimate of the cost of executing the operation
- X, which is contained with an operation with code OUTER_CODE.
+ X, which is contained within an operation with code OUTER_CODE.
SPEED_P indicates whether the cost desired is the performance cost,
or the size cost. The estimate is stored in COST and the return
value is TRUE if the cost calculation is final, or FALSE if the
@@ -9322,30 +9366,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
return false;
case MEM:
- /* A memory access costs 1 insn if the mode is small, or the address is
- a single register, otherwise it costs one insn per word. */
- if (REG_P (XEXP (x, 0)))
- *cost = COSTS_N_INSNS (1);
- else if (flag_pic
- && GET_CODE (XEXP (x, 0)) == PLUS
- && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
- /* This will be split into two instructions.
- See arm.md:calculate_pic_address. */
- *cost = COSTS_N_INSNS (2);
- else
- *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
- /* For speed optimizations, add the costs of the address and
- accessing memory. */
- if (speed_p)
-#ifdef NOT_YET
- *cost += (extra_cost->ldst.load
- + arm_address_cost (XEXP (x, 0), mode,
- ADDR_SPACE_GENERIC, speed_p));
-#else
- *cost += extra_cost->ldst.load;
-#endif
- return true;
+ return arm_mem_costs (x, extra_cost, cost, speed_p);
case PARALLEL:
{
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@
+
+#define PRE_STORE(T) \
+ T * \
+ T ## _pre_store (T *p, T v) \
+ { \
+ *++p = v; \
+ return p; \
+ } \
+
+#define POST_STORE(T) \
+ T * \
+ T ## _post_store (T *p, T v) \
+ { \
+ *p++ = v; \
+ return p; \
+ }
+
+#define POST_STORE_VEC(T, VT, OP) \
+ T * \
+ VT ## _post_store (T * p, VT v) \
+ { \
+ OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ return p; \
+ }
+
+#define PRE_LOAD(T) \
+ void \
+ T ## _pre_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *++p; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD(T) \
+ void \
+ T ## _post_load (T *p) \
+ { \
+ extern void f ## T (T*,T); \
+ T x = *p++; \
+ f ## T (p, x); \
+ }
+
+#define POST_LOAD_VEC(T, VT, OP) \
+ void \
+ VT ## _post_load (T * p) \
+ { \
+ extern void f ## T (T*,T); \
+ VT x = OP (p, v); \
+ p += sizeof (VT) / sizeof (T); \
+ f ## T (p, x); \
+ }
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params
2017-09-15 17:01 ` Kyrill Tkachov
@ 2017-11-20 21:12 ` Charles Baylis
0 siblings, 0 replies; 20+ messages in thread
From: Charles Baylis @ 2017-11-20 21:12 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 15 September 2017 at 18:01, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> From what I can tell Ramana and Richard preferred to encode this attribute
> as
> a tuning struct property rather than an inline conditional based on
> arm_arch7.
> I agree that if we want to use that information, it should be encoded this
> way.
> What I'm not convinced about is whether we do want this parameter in the
> first place.
>
> The cost tables already encode information about the costs of different
> sized loads/stores.
> In patch 2, for example, you add the cost for extra_cost->ldst.load which is
> nominally just
> the cost of a normal 32-bit ldr. But we also have costs for ldst.ldrd which
> is the 64-bit two-register load
> which should reflect any extra cost due to a narrower bus in it. We also
> have costs for ldst.loadf (for 32-bit
> VFP loads) and ldst.loadd (for 64-bit VFP D-register loads). So I think we
> should use those cost fields
> depending on the mode class and size instead of using ldst.load
> unconditionally and adding a new bus_size parameter.
>
> So I think the way forward is to drop this patch and modify patch 2/3 to use
> the extra_cost->ldst fields as described above.
>
> Sorry for the back-and-forth. I think this is the best approach because it
> uses the existing fields more naturally and
> doesn't add new parameters that partly duplicate the information encoded in
> the existing fields.
> Ramana, Richard: if you prefer the bus_width approach I won't block it, but
> could you clarify your preference?
> If we do end up adding the bus_width parameter then this patch and patch 2/3
> look ok.
> Thanks,
> Kyrill
>
> P.S. I'm going on a 4-week holiday from today, so I won't be able to do any
> further review in that timeframe.
> As I said, if we go with the bus_size approach then these patches are ok. If
> we go with my suggestion, this would
> be dropped and patch 2 would be extended to select the appropriate
> extra_cost->ldst field depending on mode.
OK, I agree with dropping this patch. I have posted an updated patch 2
which does not require it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-11-20 21:11 ` Charles Baylis
@ 2017-11-21 10:21 ` Charles Baylis
2017-11-23 10:50 ` Kyrill Tkachov
1 sibling, 0 replies; 20+ messages in thread
From: Charles Baylis @ 2017-11-21 10:21 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 20 November 2017 at 21:09, Charles Baylis <charles.baylis@linaro.org> wrote:
> I have modified this patch accordingly. Patch 1 is no longer needed.
>
> Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
> regressions. Bootstrap is in progress.
Bootstrap built successfully using qemu host.
> Can I still get this in during stage 3?
>
> gcc/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * config/arm/arm.c (arm_mem_costs): New function.
> (arm_rtx_costs_internal): Use arm_mem_costs.
>
> gcc/testsuite/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * gcc.target/arm/addr-modes-float.c: New test.
> * gcc.target/arm/addr-modes-int.c: New test.
> * gcc.target/arm/addr-modes.h: New header.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-11-20 21:11 ` Charles Baylis
2017-11-21 10:21 ` Charles Baylis
@ 2017-11-23 10:50 ` Kyrill Tkachov
2017-11-23 19:13 ` Charles Baylis
1 sibling, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2017-11-23 10:50 UTC (permalink / raw)
To: Charles Baylis
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
Hi Charles,
On 20/11/17 21:09, Charles Baylis wrote:
> On 15 September 2017 at 18:01, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
> >
> > On 15/09/17 16:38, Charles Baylis wrote:
> >>
> >> On 13 September 2017 at 10:02, Kyrill Tkachov
> >> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>
> >>> Hi Charles,
> >>>
> >>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:
> >>>>
> >>>> From: Charles Baylis <charles.baylis@linaro.org>
> >>>>
> >>>> This patch moves the calculation of costs for MEM into a
> >>>> separate function, and reforms the calculation into two
> >>>> parts. Firstly any additional cost of the addressing mode
> >>>> is calculated, and then the cost of the memory access itself
> >>>> is added.
> >>>>
> >>>> In this patch, the calculation of the cost of the addressing
> >>>> mode is left as a placeholder, to be added in a subsequent
> >>>> patch.
> >>>>
> >>> Can you please mention how has this series been tested?
> >>> A bootstrap and test run on arm-none-linux-gnueabihf is required at
> >>> least.
> >>
> >> It has been tested with make check on arm-unknown-linux-gnueabihf with
> >> no regressions. I've successfully bootstrapped the next spin.
> >
> >
> > Thanks.
> >
> >>> Also, do you have any benchmarking results for this?
> >>> I agree that generating the addressing modes in the new tests is
> >>> desirable.
> >>> So I'm not objecting to the goal of this patch, but a check to
> make sure
> >>> that this doesn't regress SPEC
> >>> would be great. Further comments on the patch inline.
> >>
> >> SPEC2006 scores are unaffected by this patch on Cortex-A57.
> >
> >
> > Good, thanks for checking :)
> >
> >
> >>>> +/* Helper function for arm_rtx_costs_internal. Calculates the
> cost of
> >>>> a
> >>>> MEM,
> >>>> + considering the costs of the addressing mode and memory access
> >>>> + separately. */
> >>>> +static bool
> >>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
> >>>> + int *cost, bool speed_p)
> >>>> +{
> >>>> + machine_mode mode = GET_MODE (x);
> >>>> + if (flag_pic
> >>>> + && GET_CODE (XEXP (x, 0)) == PLUS
> >>>> + && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
> >>>> + /* This will be split into two instructions. Add the cost
> of the
> >>>> + additional instruction here. The cost of the memory
> access is
> >>>> computed
> >>>> + below. See arm.md:calculate_pic_address. */
> >>>> + *cost = COSTS_N_INSNS (1);
> >>>> + else
> >>>> + *cost = 0;
> >>>
> >>>
> >>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a
> >>> each
> >>> insn)
> >>> plus the appropriate field in extra_cost. So you should
> unconditionally
> >>> initialise the cost
> >>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)
> >>> with
> >>> the condition above.
> >>
> >> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
> >> part because the cost of a single bus transfer is included in that
> >> initial cost.
> >>
> >>>> +
> >>>> + /* Calculate cost of the addressing mode. */
> >>>> + if (speed_p)
> >>>> + {
> >>>> + /* TODO: Add table-driven costs for addressing modes.
> (See patch
> >>>> 2) */
> >>>> + }
> >>>
> >>>
> >>> You mean "patch 3". I recommend you just remove this conditional from
> >>> this
> >>> patch and add the logic
> >>> in patch 3 entirely.
> >>
> >> OK.
> >>
> >>>> +
> >>>> + /* Calculate cost of memory access. */
> >>>> + if (speed_p)
> >>>> + {
> >>>> + /* data transfer is transfer size divided by bus width. */
> >>>> + int bus_width_bytes = current_tune->bus_width / 4;
> >>>
> >>>
> >>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
> >>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure
> >>> the
> >>> cost calculation and generated code is still appropriate.
> >>
> >> Oops, I changed the units around and messed this up. I'll fix this.
> >>
> >>>> + *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
> >>>> + *cost += extra_cost->ldst.load;
> >>>> + }
> >>>> + else
> >>>> + {
> >>>> + *cost += COSTS_N_INSNS (1);
> >>>> + }
> >>>
> >>> Given my first comment above this else would be deleted.
> >>
> >> OK
> >
> >
> > I have a concern about using the bus_width parameter which
> > I explain in the thread for patch 1 (I don't think we need it, we
> should use
> > the fields in extra_cost->ldst
> > more carefully).
>
> I have modified this patch accordingly. Patch 1 is no longer needed.
>
> Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
> regressions. Bootstrap is in progress.
>
> Can I still get this in during stage 3?
>
Thanks, these are ok for trunk.
They were originally posted way before stage 3 and this is just a rework,
so it's acceptable at this stage as far as I'm concerned.
Thank you for working on these,
Kyrill
> gcc/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * config/arm/arm.c (arm_mem_costs): New function.
> (arm_rtx_costs_internal): Use arm_mem_costs.
>
> gcc/testsuite/ChangeLog:
>
> <date> Charles Baylis <charles.baylis@linaro.org>
>
> * gcc.target/arm/addr-modes-float.c: New test.
> * gcc.target/arm/addr-modes-int.c: New test.
> * gcc.target/arm/addr-modes.h: New header.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.
2017-11-23 10:50 ` Kyrill Tkachov
@ 2017-11-23 19:13 ` Charles Baylis
0 siblings, 0 replies; 20+ messages in thread
From: Charles Baylis @ 2017-11-23 19:13 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 23 November 2017 at 10:01, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Thanks, these are ok for trunk.
> They were originally posted way before stage 3 and this is just a rework,
> so it's acceptable at this stage as far as I'm concerned.
Thanks. Committed to trunk as r255111.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.
2017-09-15 16:57 ` Kyrill Tkachov
@ 2017-11-23 19:25 ` Charles Baylis
0 siblings, 0 replies; 20+ messages in thread
From: Charles Baylis @ 2017-11-23 19:25 UTC (permalink / raw)
To: Kyrill Tkachov
Cc: Richard Earnshaw, Ramana Radhakrishnan, pinskia, gcc-patches
On 15 September 2017 at 17:57, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Thanks, this is ok once the prerequisites are sorted.
Patch 1 was abandoned, and a later version of patch 2 has been
committed, so this was applied to trunk as r255112.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-11-23 18:53 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12 8:34 [PATCH 0/3] [ARM] Addressing mode costs v3 charles.baylis
2017-09-12 8:34 ` [PATCH 1/3] [ARM] Add bus_width_bits to tune_params charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
2017-09-15 17:01 ` Kyrill Tkachov
2017-11-20 21:12 ` Charles Baylis
2017-09-12 8:35 ` [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
2017-09-15 15:38 ` Charles Baylis
2017-09-15 16:57 ` Kyrill Tkachov
2017-11-23 19:25 ` Charles Baylis
2017-09-12 8:35 ` [PATCH 2/3] [ARM] Refactor costs calculation for MEM charles.baylis
2017-09-13 9:02 ` Kyrill Tkachov
2017-09-15 15:38 ` Charles Baylis
2017-09-15 17:01 ` Kyrill Tkachov
2017-11-20 21:11 ` Charles Baylis
2017-11-21 10:21 ` Charles Baylis
2017-11-23 10:50 ` Kyrill Tkachov
2017-11-23 19:13 ` Charles Baylis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).