* [PATCH 09/10] [ARC] Update (u)maddsidi patterns.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
2017-12-07 23:35 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
` (7 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess
From: claziss <claziss@synopsys.com>
The accumulator registers are freely used by the compiler. However,
there are a number of instructions which are having an intrinsic use
of these registers. Update patterns to inform the compiler which ones.
gcc/
2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (maddsidi4, maddsidi4_split): Update pattern.
(umaddsidi4,umaddsidi4): Likewise.
gcc/testsuite
2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/tumaddsidi4.c: New test.
---
gcc/config/arc/arc.md | 32 ++++++++++++++++++++++++++----
gcc/testsuite/gcc.target/arc/tumaddsidi4.c | 14 +++++++++++++
2 files changed, 42 insertions(+), 4 deletions(-)
create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 42c6a23..155ee6c 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -6175,13 +6175,25 @@ archs4xd, archs4xd_slow, core_3"
[(set_attr "length" "0")])
;; MAC and DMPY instructions
-(define_insn_and_split "maddsidi4"
+(define_expand "maddsidi4"
+ [(match_operand:DI 0 "register_operand" "")
+ (match_operand:SI 1 "register_operand" "")
+ (match_operand:SI 2 "extend_operand" "")
+ (match_operand:DI 3 "register_operand" "")]
+ "TARGET_PLUS_DMPY"
+ "{
+ emit_insn (gen_maddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
+ DONE;
+ }")
+
+(define_insn_and_split "maddsidi4_split"
[(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI
(mult:DI
(sign_extend:DI (match_operand:SI 1 "register_operand" "%r"))
(sign_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
- (match_operand:DI 3 "register_operand" "r")))]
+ (match_operand:DI 3 "register_operand" "r")))
+ (clobber (reg:DI ARCV2_ACC))]
"TARGET_PLUS_DMPY"
"#"
"TARGET_PLUS_DMPY && reload_completed"
@@ -6263,13 +6275,25 @@ archs4xd, archs4xd_slow, core_3"
(set_attr "predicable" "no")
(set_attr "cond" "nocond")])
-(define_insn_and_split "umaddsidi4"
+(define_expand "umaddsidi4"
+ [(match_operand:DI 0 "register_operand" "")
+ (match_operand:SI 1 "register_operand" "")
+ (match_operand:SI 2 "extend_operand" "")
+ (match_operand:DI 3 "register_operand" "")]
+ "TARGET_PLUS_DMPY"
+ "{
+ emit_insn (gen_umaddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
+ DONE;
+ }")
+
+(define_insn_and_split "umaddsidi4_split"
[(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI
(mult:DI
(zero_extend:DI (match_operand:SI 1 "register_operand" "%r"))
(zero_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
- (match_operand:DI 3 "register_operand" "r")))]
+ (match_operand:DI 3 "register_operand" "r")))
+ (clobber (reg:DI ARCV2_ACC))]
"TARGET_PLUS_DMPY"
"#"
"TARGET_PLUS_DMPY && reload_completed"
diff --git a/gcc/testsuite/gcc.target/arc/tumaddsidi4.c b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
new file mode 100755
index 0000000..40d2b33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=archs -O1 -mmpy-option=plus_dmpy" } */
+
+/* Check how we generate umaddsidi4 patterns. */
+long a;
+long long b;
+unsigned c, d;
+
+void fn1(void)
+{
+ b = d * (long long)c + a;
+}
+
+/* { dg-final { scan-assembler "macu 0,r" } } */
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
2017-11-27 23:29 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
` (8 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess
From: claziss <claziss@synopsys.com>
Make sure we mark the hw-loop labels as beeing used.
gcc/
2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (hwloop_optimize): Update hw-loop's end/start
labels number of usages.
gcc/testsuite
2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/loop-2.cpp: New test.
---
gcc/config/arc/arc.c | 3 +++
gcc/testsuite/gcc.target/arc/loop-2.cpp | 18 ++++++++++++++++++
2 files changed, 21 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 25f123c..964815a 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7702,6 +7702,9 @@ hwloop_optimize (hwloop_info loop)
/* Insert the loop end label before the last instruction of the
loop. */
emit_label_after (end_label, loop->last_insn);
+ /* Make sure we mark the begining and end label as used. */
+ LABEL_NUSES (loop->end_label)++;
+ LABEL_NUSES (loop->start_label)++;
return true;
}
diff --git a/gcc/testsuite/gcc.target/arc/loop-2.cpp b/gcc/testsuite/gcc.target/arc/loop-2.cpp
new file mode 100644
index 0000000..d1dc917
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-2.cpp
@@ -0,0 +1,18 @@
+/* { dg-options "-O2" } *
+/* { dg-do assemble } */
+
+/* This file fails to assemble if we forgot to increase the number of
+ uses for loop's start and end labels. */
+int a, c, d;
+int *b;
+void fn1(int p1) {
+ if (d == 5)
+ for (int i; i < p1; ++i)
+ if (c)
+ b[i] = c;
+ else
+ int t = a = t;
+ else
+ for (int i; i < p1; ++i)
+ b[i] = 0;
+}
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 05/10] [ARC] Add trap instruction.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
2017-11-27 23:40 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
` (9 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
2017-11-07 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (trap): New pattern.
---
gcc/config/arc/arc.md | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index b8fa44e..42c6a23 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -4321,6 +4321,13 @@ archs4xd, archs4xd_slow, core_3"
; use it for lack of inter-procedural branch shortening.
; Link-time relaxation would help...
+(define_insn "trap"
+ [(trap_if (const_int 1) (const_int 0))]
+ "!TARGET_ARC600_FAMILY"
+ "trap_s\\t5"
+ [(set_attr "type" "misc")
+ (set_attr "length" "2")])
+
(define_insn "nop"
[(const_int 0)]
""
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (2 preceding siblings ...)
2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
2017-11-27 23:32 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
` (6 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
The ARC ZOL implementation doesn't allow the last instruction to be a
control instruction or part of a delay slot. Thus, we add a note to
the last ZOL instruction which will prevent it to finish into a delay
slot.
2017-10-20 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (hwloop_optimize): Prevent the last
ZOL instruction to end into a delay slot.
* config/arc/arc.md (cond_delay_insn): Check if the instruction
can be placed into a delay slot against reg_note.
(in_delay_slot): Likewise.
testsuite/
2017-10-20 Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/loop-3.c: New test.
* gcc.target/arc/loop-4.c: Likewise.
[FIX][ZOL] fix checking for jumps
---
gcc/config/arc/arc.c | 6 ++++++
gcc/config/arc/arc.md | 4 ++++
gcc/testsuite/gcc.target/arc/loop-3.c | 27 +++++++++++++++++++++++++++
gcc/testsuite/gcc.target/arc/loop-4.c | 14 ++++++++++++++
4 files changed, 51 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 964815a..1479a8d 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7609,6 +7609,12 @@ hwloop_optimize (hwloop_info loop)
loop->loop_no);
last_insn = emit_insn_after (gen_nopv (), last_insn);
}
+
+ /* SAVE_NOTE is used by haifa scheduler. However, we are after it
+ and we can use it to indicate the last ZOL instruction cannot be
+ part of a delay slot. */
+ add_reg_note (last_insn, REG_SAVE_NOTE, GEN_INT (2));
+
loop->last_insn = last_insn;
/* Get the loop iteration register. */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 2e0ac52..6239483 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -472,6 +472,8 @@
(symbol_ref "(arc_hazard (prev_active_insn (insn), insn)
+ arc_hazard (insn, next_active_insn (insn)))"))
(const_string "false")
+ (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
+ (const_string "false")
(eq_attr "iscompact" "maybe") (const_string "true")
]
@@ -499,6 +501,8 @@
(cond [(eq_attr "cond" "!canuse") (const_string "no")
(eq_attr "type" "call,branch,uncond_branch,jump,brcc")
(const_string "no")
+ (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
+ (const_string "no")
(eq_attr "length" "2,4") (const_string "yes")]
(const_string "no")))
diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c b/gcc/testsuite/gcc.target/arc/loop-3.c
new file mode 100644
index 0000000..bf7aec9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-3.c
@@ -0,0 +1,27 @@
+/* { dg-do assemble } */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sdata" } *
+
+/* This example will fail to assemble if the last instruction is a
+ branch with delay slot. */
+int d;
+extern char * fn2 (void);
+
+void fn1(void)
+{
+ char *a = fn2();
+ for (;;) {
+ long long b;
+ int e = 8;
+ for (; e <= 63; e += 7) {
+ long c = *a++;
+ b += c & e;
+ if (c & 28)
+ break;
+ }
+ d = b;
+ }
+}
+
+/* { dg-final { scan-assembler "bne_s @.L2" } } */
+/* { dg-final { scan-assembler-not "add.eq" } } */
diff --git a/gcc/testsuite/gcc.target/arc/loop-4.c b/gcc/testsuite/gcc.target/arc/loop-4.c
new file mode 100644
index 0000000..99a93a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-4.c
@@ -0,0 +1,14 @@
+/* { dg-do assemble } */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+
+void fn1(void *p1, int p2, int p3)
+{
+ char *d = p1;
+ do
+ *d++ = p2;
+ while (--p3);
+}
+
+/* { dg-final { scan-assembler "lp_count" } } */
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 08/10] [ARC] Enable unaligned access.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (4 preceding siblings ...)
2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
2018-01-02 12:05 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
` (4 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
Use munaligned-access to control if we can have unaligned accesses. For ARC
HS family unaligned access is always on.
2017-10-19 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc-c.def (__ARC_UNALIGNED__): New define.
* config/arc/arc.h (STRICT_ALIGNMENT): Control this macro using
munaligned-access.
---
gcc/config/arc/arc-c.def | 1 +
gcc/config/arc/arc.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
index c9443c9..86eab4e 100644
--- a/gcc/config/arc/arc-c.def
+++ b/gcc/config/arc/arc-c.def
@@ -29,6 +29,7 @@ ARC_C_DEF ("__ARC_MUL64__", TARGET_MUL64_SET)
ARC_C_DEF ("__ARC_MUL32BY16__", TARGET_MULMAC_32BY16_SET)
ARC_C_DEF ("__ARC_SIMD__", TARGET_SIMD_SET)
ARC_C_DEF ("__ARC_RF16__", TARGET_RF16)
+ARC_C_DEF ("__ARC_UNALIGNED__", !STRICT_ALIGNMENT)
ARC_C_DEF ("__ARC_BARREL_SHIFTER__", TARGET_BARREL_SHIFTER)
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 8d90975..8c31fb2 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -288,7 +288,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
/* On the ARC the lower address bits are masked to 0 as necessary. The chip
won't croak when given an unaligned address, but the insn will still fail
to produce the correct result. */
-#define STRICT_ALIGNMENT 1
+#define STRICT_ALIGNMENT (!unaligned_access && !TARGET_HS)
/* Layout of source language data types. */
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 00/10][ARC] Critical fixes
@ 2017-11-27 11:15 Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
` (10 more replies)
0 siblings, 11 replies; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess
From: claziss <claziss@synopsys.com>
Hi,
This bunch of patches contain a number of critical patches to ARC backend:
- For ZOL: we have two patches which are avoiding the last ZOL instruction
to be placed into a delay slot, and update the number of uses for the ZOL labels.
Also, we enable the DBNZ instruction only for ARC HS Core3 cpus. Tests are provided.
- Update the legitimate constant hook.
- The trampoline implementation is revamped, and tested to work on ARC Linux.
Without this patch trampolines are not working for ARC linux.
- The accumulator registers usage can be controlled via -ffixed option, Also update
a number of patterns to reflect usage of accumulator regs.
- Add trap instruction, needed for ARC linux.
- Add TARGET_CANNOT_SUBSTITUE_MEM_EQUIV, to avoid LRA issues. Test provided.
Ok to apply?
Claudiu
Claudiu Zissulescu:
[ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
[ARC] Don't allow the last ZOL insn to be in a delay slot.
[ARC] Add trap instruction.
[ARC] Update legitimate constant hook.
[ARC] Enable unaligned access.
[ARC] Revamp trampoline implementation.
[ARC][ZOL] Update uses for hw-loop labels.
[ARC] Add ARCv2 core3 tune option.
[ARC][FIX] Consider command line ffixed- option.
[ARC] Update (u)maddsidi patterns.
gcc/config/arc/arc-arch.h | 3 +-
gcc/config/arc/arc-c.def | 1 +
gcc/config/arc/arc.c | 197 ++++++++++++++++++-----------
gcc/config/arc/arc.h | 6 +-
gcc/config/arc/arc.md | 74 +++++++----
gcc/config/arc/arc.opt | 40 +++---
gcc/testsuite/gcc.target/arc/loop-2.cpp | 18 +++
gcc/testsuite/gcc.target/arc/loop-3.c | 27 ++++
gcc/testsuite/gcc.target/arc/loop-4.c | 14 ++
gcc/testsuite/gcc.target/arc/lra-1.c | 17 +++
gcc/testsuite/gcc.target/arc/tls-1.c | 26 ++++
gcc/testsuite/gcc.target/arc/tumaddsidi4.c | 14 ++
12 files changed, 320 insertions(+), 117 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp
create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c
create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c
create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c
create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 04/10] [ARC] Add ARCv2 core3 tune option.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (3 preceding siblings ...)
2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
2017-11-27 23:35 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
` (5 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess
From: claziss <claziss@synopsys.com>
ARCv2 Core3 cpus are comming with dbnz support. Add this feature on
the tune option.
gcc/
2017-09-14 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc-arch.h (arc_tune_attr): Add ARC_TUNE_CORE_3.
* config/arc/arc.c (arc_sched_issue_rate): Use ARC_TUNE_... .
(arc_init): Likewise.
(arc_override_options): Likewise.
(arc_file_start): Choose Tag_ARC_CPU_variation based on arc_tune
value.
(hwloop_fail): Use TARGET_DBNZ when we want to check for dbnz insn
support.
* config/arc/arc.h (TARGET_DBNZ): Define.
* config/arc/arc.md (attr tune): Add core_3, use ARC_TUNE_... to
properly set the tune attribute.
(dbnz): Use TARGET_DBNZ guard.
* config/arc/arc.opt (mtune): Add core3 option.
---
gcc/config/arc/arc-arch.h | 3 ++-
gcc/config/arc/arc.c | 21 ++++++++++++---------
gcc/config/arc/arc.h | 2 ++
gcc/config/arc/arc.md | 22 ++++++++++++----------
gcc/config/arc/arc.opt | 40 ++++++++++++++++++++++------------------
5 files changed, 50 insertions(+), 38 deletions(-)
diff --git a/gcc/config/arc/arc-arch.h b/gcc/config/arc/arc-arch.h
index 7c3f47c..38d2bcb 100644
--- a/gcc/config/arc/arc-arch.h
+++ b/gcc/config/arc/arc-arch.h
@@ -75,7 +75,8 @@ enum arc_tune_attr
ARC_TUNE_ARC700_4_2_XMAC,
ARC_TUNE_ARCHS4X,
ARC_TUNE_ARCHS4XD,
- ARC_TUNE_ARCHS4XD_SLOW
+ ARC_TUNE_ARCHS4XD_SLOW,
+ ARC_TUNE_CORE_3
};
/* CPU specific properties. */
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 1479a8d..4d7a282 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -442,8 +442,8 @@ arc_sched_issue_rate (void)
{
switch (arc_tune)
{
- case TUNE_ARCHS4X:
- case TUNE_ARCHS4XD:
+ case ARC_TUNE_ARCHS4X:
+ case ARC_TUNE_ARCHS4XD:
return 3;
default:
break;
@@ -866,21 +866,21 @@ arc_init (void)
if (arc_multcost < 0)
switch (arc_tune)
{
- case TUNE_ARC700_4_2_STD:
+ case ARC_TUNE_ARC700_4_2_STD:
/* latency 7;
max throughput (1 multiply + 4 other insns) / 5 cycles. */
arc_multcost = COSTS_N_INSNS (4);
if (TARGET_NOMPY_SET)
arc_multcost = COSTS_N_INSNS (30);
break;
- case TUNE_ARC700_4_2_XMAC:
+ case ARC_TUNE_ARC700_4_2_XMAC:
/* latency 5;
max throughput (1 multiply + 2 other insns) / 3 cycles. */
arc_multcost = COSTS_N_INSNS (3);
if (TARGET_NOMPY_SET)
arc_multcost = COSTS_N_INSNS (30);
break;
- case TUNE_ARC600:
+ case ARC_TUNE_ARC600:
if (TARGET_MUL64_SET)
{
arc_multcost = COSTS_N_INSNS (4);
@@ -1196,8 +1196,8 @@ arc_override_options (void)
#undef ARC_OPT
/* Set Tune option. */
- if (arc_tune == TUNE_NONE)
- arc_tune = (enum attr_tune) arc_selected_cpu->tune;
+ if (arc_tune == ARC_TUNE_NONE)
+ arc_tune = (enum arc_tune_attr) arc_selected_cpu->tune;
if (arc_size_opt_level == 3)
optimize_size = 1;
@@ -5205,6 +5205,9 @@ static void arc_file_start (void)
TARGET_NO_SDATA_SET ? 0 : 2);
asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_ABI_exceptions, %d\n",
TARGET_OPTFPE ? 1 : 0);
+ if (TARGET_V2)
+ asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_CPU_variation, %d\n",
+ arc_tune == ARC_TUNE_CORE_3 ? 3 : 2);
}
/* Implement `TARGET_ASM_FILE_END'. */
@@ -7389,11 +7392,11 @@ hwloop_fail (hwloop_info loop)
rtx test;
rtx insn = loop->loop_end;
- if (TARGET_V2
+ if (TARGET_DBNZ
&& (loop->length && (loop->length <= ARC_MAX_LOOP_LENGTH))
&& REG_P (loop->iter_reg))
{
- /* TARGET_V2 has dbnz instructions. */
+ /* TARGET_V2 core3 has dbnz instructions. */
test = gen_dbnz (loop->iter_reg, loop->start_label);
insn = emit_jump_insn_before (test, loop->loop_end);
}
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index b5a8f84..8d90975 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1628,5 +1628,7 @@ enum
/* Custom FP instructions used by QuarkSE EM cpu. */
#define TARGET_FPX_QUARK (TARGET_EM && TARGET_SPFP \
&& (arc_fpu_build == FPX_QK))
+/* DBNZ support is available for ARCv2 core3 cpus. */
+#define TARGET_DBNZ (TARGET_V2 && (arc_tune == ARC_TUNE_CORE_3))
#endif /* GCC_ARC_H */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 6239483..b8fa44e 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -597,19 +597,21 @@
;; is made that makes conditional execution required.
(define_attr "tune" "none, arc600, arc700_4_2_std, arc700_4_2_xmac, archs4x, \
-archs4xd, archs4xd_slow"
+archs4xd, archs4xd_slow, core_3"
(const
- (cond [(symbol_ref "arc_tune == TUNE_ARC600")
+ (cond [(symbol_ref "arc_tune == ARC_TUNE_ARC600")
(const_string "arc600")
- (symbol_ref "arc_tune == TUNE_ARC700_4_2_STD")
+ (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_STD")
(const_string "arc700_4_2_std")
- (symbol_ref "arc_tune == TUNE_ARC700_4_2_XMAC")
+ (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_XMAC")
(const_string "arc700_4_2_xmac")
- (symbol_ref "arc_tune == TUNE_ARCHS4X")
+ (symbol_ref "arc_tune == ARC_TUNE_ARCHS4X")
(const_string "archs4x")
- (ior (symbol_ref "arc_tune == TUNE_ARCHS4XD")
- (symbol_ref "arc_tune == TUNE_ARCHS4XD_SLOW"))
- (const_string "archs4xd")]
+ (ior (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD")
+ (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD_SLOW"))
+ (const_string "archs4xd")
+ (symbol_ref "arc_tune == ARC_TUNE_CORE_3")
+ (const_string "core_3")]
(const_string "none"))))
(define_attr "tune_arc700" "false,true"
@@ -5200,11 +5202,11 @@ archs4xd, archs4xd_slow"
(plus:SI (match_dup 0)
(const_int -1)))
(clobber (match_scratch:SI 2 "=X,r"))]
- "TARGET_V2"
+ "TARGET_DBNZ"
"@
dbnz%#\\t%0,%l1
#"
- "TARGET_V2 && reload_completed && memory_operand (operands[0], SImode)"
+ "TARGET_DBNZ && reload_completed && memory_operand (operands[0], SImode)"
[(set (match_dup 2) (match_dup 0))
(set (match_dup 2) (plus:SI (match_dup 2) (const_int -1)))
(set (reg:CC CC_REG) (compare:CC (match_dup 2) (const_int 0)))
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index aacb599..6b0104a 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -249,29 +249,33 @@ mmultcost=
Target RejectNegative Joined UInteger Var(arc_multcost) Init(-1)
Cost to assume for a multiply instruction, with 4 being equal to a normal insn.
-mtune=ARC600
-Target RejectNegative Var(arc_tune, TUNE_ARC600)
-Tune for ARC600 cpu.
+mtune=
+Target RejectNegative ToLower Joined Var(arc_tune) Enum(arc_tune_attr) Init(ARC_TUNE_NONE)
+-mcpu=TUNE Tune code for given ARC variant.
-mtune=ARC601
-Target RejectNegative Var(arc_tune, TUNE_ARC600)
-Tune for ARC601 cpu.
+Enum
+Name(arc_tune_attr) Type(int)
+
+EnumValue
+Enum(arc_tune_attr) String(arc600) Value(ARC_TUNE_ARC600)
-mtune=ARC700
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_STD)
-Tune for ARC700 R4.2 Cpu with standard multiplier block.
+EnumValue
+Enum(arc_tune_attr) String(arc601) Value(ARC_TUNE_ARC600)
-mtune=ARC700-xmac
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc700) Value(ARC_TUNE_ARC700_4_2_STD)
-mtune=ARC725D
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc700-xmac) Value(ARC_TUNE_ARC700_4_2_XMAC)
-mtune=ARC750D
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc725d) Value(ARC_TUNE_ARC700_4_2_XMAC)
+
+EnumValue
+Enum(arc_tune_attr) String(arc750d) Value(ARC_TUNE_ARC700_4_2_XMAC)
+
+EnumValue
+Enum(arc_tune_attr) String(core3) Value(ARC_TUNE_CORE_3)
mindexed-loads
Target Var(TARGET_INDEXED_LOADS) Init(TARGET_INDEXED_LOADS_DEFAULT)
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 06/10] [ARC] Update legitimate constant hook.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (5 preceding siblings ...)
2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
2017-12-07 23:30 ` Andrew Burgess
2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
` (3 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
Make sure we check the constants in all cases.
gcc/
2017-10-14 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (arc_legitimate_constant_p): Always check all
constants.
testsuite/
2017-10-14 Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/tls-1.c: New test.
---
gcc/config/arc/arc.c | 6 ------
gcc/testsuite/gcc.target/arc/tls-1.c | 26 ++++++++++++++++++++++++++
2 files changed, 26 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 4d7a282..42ea921 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -6185,12 +6185,6 @@ arc_return_addr_rtx (int count, ATTRIBUTE_UNUSED rtx frame)
bool
arc_legitimate_constant_p (machine_mode mode, rtx x)
{
- if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x))
- return false;
-
- if (!flag_pic && mode != Pmode)
- return true;
-
switch (GET_CODE (x))
{
case CONST:
diff --git a/gcc/testsuite/gcc.target/arc/tls-1.c b/gcc/testsuite/gcc.target/arc/tls-1.c
new file mode 100644
index 0000000..3f7a6d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/tls-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target tls } */
+/* { dg-options "-O3 -std=gnu99" } */
+
+/* Check if addressing the `pos` member of struct is done via tls
+ mechanism. */
+
+struct callchain_cursor {
+ int last;
+ long long pos;
+} __thread a;
+void fn1(struct callchain_cursor *p1)
+{
+ p1->pos++;
+}
+
+extern void fn3 (void);
+
+void fn2(void) {
+ struct callchain_cursor *b = &a;
+ while (1) {
+ fn3();
+ fn1(b);
+ }
+}
+/* { dg-final { scan-assembler "r25,@a@tpoff" } } */
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 10/10] [ARC] Revamp trampoline implementation.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (6 preceding siblings ...)
2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
@ 2017-11-27 11:16 ` Claudiu Zissulescu
2018-01-02 12:16 ` Andrew Burgess
2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
` (2 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:16 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
The new implementation attempts to clean up the existing trampoline
implementation for ARC making it to work for linux type of systems.
gcc/
2017-11-10 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (TARGET_TRAMPOLINE_ADJUST_ADDRESS): Delete.
(emit_store_direct): Likewise.
(arc_trampoline_adjust_address): Likewise.
(arc_asm_trampoline_template): New function.
(arc_initialize_trampoline): Use asm_trampoline_template.
(TARGET_ASM_TRAMPOLINE_TEMPLATE): Define.
* config/arc/arc.h (TRAMPOLINE_SIZE): Adjust to 16.
*config/arc/arc.md (flush_icache): Delete pattern.
---
gcc/config/arc/arc.c | 89 +++++++++++++++++++++++++--------------------------
gcc/config/arc/arc.h | 2 +-
gcc/config/arc/arc.md | 9 ------
3 files changed, 44 insertions(+), 56 deletions(-)
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 0eeeb42..053f3c2 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -588,8 +588,6 @@ static void arc_finalize_pic (void);
#define TARGET_TRAMPOLINE_INIT arc_initialize_trampoline
-#define TARGET_TRAMPOLINE_ADJUST_ADDRESS arc_trampoline_adjust_address
-
#define TARGET_CAN_ELIMINATE arc_can_eliminate
#define TARGET_FRAME_POINTER_REQUIRED arc_frame_pointer_required
@@ -3727,69 +3725,65 @@ output_shift (rtx *operands)
\f
/* Nested function support. */
-/* Directly store VALUE into memory object BLOCK at OFFSET. */
-
-static void
-emit_store_direct (rtx block, int offset, int value)
-{
- emit_insn (gen_store_direct (adjust_address (block, SImode, offset),
- force_reg (SImode,
- gen_int_mode (value, SImode))));
-}
+/* Output assembler code for a block containing the constant parts of
+ a trampoline, leaving space for variable parts.
-/* Emit RTL insns to initialize the variable parts of a trampoline.
- FNADDR is an RTX for the address of the function's pure code.
- CXT is an RTX for the static chain value for the function. */
-/* With potentially multiple shared objects loaded, and multiple stacks
- present for multiple thereds where trampolines might reside, a simple
- range check will likely not suffice for the profiler to tell if a callee
- is a trampoline. We a speedier check by making the trampoline start at
- an address that is not 4-byte aligned.
A trampoline looks like this:
- nop_s 0x78e0
-entry:
ld_s r12,[pcl,12] 0xd403
ld r11,[pcl,12] 0x170c 700b
j_s [r12] 0x7c00
- nop_s 0x78e0
+ .word function's address
+ .word static chain value
+
+*/
+
+static void
+arc_asm_trampoline_template (FILE *f)
+{
+ asm_fprintf (f, "\tld_s\t%s,[pcl,8]\n", ARC_TEMP_SCRATCH_REG);
+ asm_fprintf (f, "\tld\t%s,[pcl,12]\n", reg_names[STATIC_CHAIN_REGNUM]);
+ asm_fprintf (f, "\tj_s\t[%s]\n", ARC_TEMP_SCRATCH_REG);
+ assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+ assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+}
+
+/* Emit RTL insns to initialize the variable parts of a trampoline.
+ FNADDR is an RTX for the address of the function's pure code. CXT
+ is an RTX for the static chain value for the function.
The fastest trampoline to execute for trampolines within +-8KB of CTX
would be:
+
add2 r11,pcl,s12
j [limm] 0x20200f80 limm
- and that would also be faster to write to the stack by computing the offset
- from CTX to TRAMP at compile time. However, it would really be better to
- get rid of the high cost of cache invalidation when generating trampolines,
- which requires that the code part of trampolines stays constant, and
- additionally either
- - making sure that no executable code but trampolines is on the stack,
- no icache entries linger for the area of the stack from when before the
- stack was allocated, and allocating trampolines in trampoline-only
- cache lines
- or
- - allocate trampolines fram a special pool of pre-allocated trampolines. */
+
+ and that would also be faster to write to the stack by computing
+ the offset from CTX to TRAMP at compile time. However, it would
+ really be better to get rid of the high cost of cache invalidation
+ when generating trampolines, which requires that the code part of
+ trampolines stays constant, and additionally either making sure
+ that no executable code but trampolines is on the stack, no icache
+ entries linger for the area of the stack from when before the stack
+ was allocated, and allocating trampolines in trampoline-only cache
+ lines or allocate trampolines fram a special pool of pre-allocated
+ trampolines. */
static void
arc_initialize_trampoline (rtx tramp, tree fndecl, rtx cxt)
{
rtx fnaddr = XEXP (DECL_RTL (fndecl), 0);
- emit_store_direct (tramp, 0, TARGET_BIG_ENDIAN ? 0x78e0d403 : 0xd40378e0);
- emit_store_direct (tramp, 4, TARGET_BIG_ENDIAN ? 0x170c700b : 0x700b170c);
- emit_store_direct (tramp, 8, TARGET_BIG_ENDIAN ? 0x7c0078e0 : 0x78e07c00);
- emit_move_insn (adjust_address (tramp, SImode, 12), fnaddr);
- emit_move_insn (adjust_address (tramp, SImode, 16), cxt);
- emit_insn (gen_flush_icache (adjust_address (tramp, SImode, 0)));
-}
+ emit_block_move (tramp, assemble_trampoline_template (),
+ GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
-/* Allow the profiler to easily distinguish trampolines from normal
- functions. */
+ emit_move_insn (adjust_address (tramp, SImode, 8), fnaddr);
+ emit_move_insn (adjust_address (tramp, SImode, 12), cxt);
-static rtx
-arc_trampoline_adjust_address (rtx addr)
-{
- return plus_constant (Pmode, addr, 2);
+ emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
+ LCT_NORMAL, VOIDmode, 2, XEXP (tramp, 0), Pmode,
+ plus_constant (Pmode, XEXP (tramp, 0), TRAMPOLINE_SIZE),
+ Pmode);
}
/* Add the given function declaration to emit code in JLI section. */
@@ -11412,6 +11406,9 @@ arc_cannot_substitute_mem_equiv_p (rtx)
#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
+#undef TARGET_ASM_TRAMPOLINE_TEMPLATE
+#define TARGET_ASM_TRAMPOLINE_TEMPLATE arc_asm_trampoline_template
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-arc.h"
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 8c31fb2..317a653 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -829,7 +829,7 @@ extern int arc_initial_elimination_offset(int from, int to);
/* Trampolines. */
/* Length in units of the trampoline for entering a nested function. */
-#define TRAMPOLINE_SIZE 20
+#define TRAMPOLINE_SIZE 16
/* Alignment required for a trampoline in bits . */
/* For actual data alignment we just need 32, no more than the stack;
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 155ee6c..e1418a9 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -4345,15 +4345,6 @@ archs4xd, archs4xd_slow, core_3"
(set_attr "iscompact" "true")
(set_attr "length" "2")])
-;; Special pattern to flush the icache.
-;; ??? Not sure what to do here. Some ARC's are known to support this.
-
-(define_insn "flush_icache"
- [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")] 0)]
- ""
- "* return \"\";"
- [(set_attr "type" "misc")])
-
;; Split up troublesome insns for better scheduling.
;; Peepholes go at the end.
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (7 preceding siblings ...)
2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
@ 2017-11-27 11:57 ` Claudiu Zissulescu
2017-11-27 23:27 ` Andrew Burgess
2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:57 UTC (permalink / raw)
To: gcc-patches
Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu
From: Claudiu Zissulescu <claziss@gmail.com>
Sometimes the memory equivalent is not valid due to a large offset.
For example replacing the ap register with its fp/sp-equivalent during
LRA step. To solve this we introduced TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
gcc/
2017-08-08 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (arc_cannot_substitute_mem_equiv_p): New function.
(TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define.
gcc/testsuite
2017-08-08 Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/lra-1.c: New test.
---
gcc/config/arc/arc.c | 12 ++++++++++++
gcc/testsuite/gcc.target/arc/lra-1.c | 17 +++++++++++++++++
2 files changed, 29 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index dd922a6..25f123c 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -11352,12 +11352,24 @@ arc_use_anchors_for_symbol_p (const_rtx symbol)
return default_use_anchors_for_symbol_p (symbol);
}
+/* Return true if SUBST can't safely replace its equivalent during RA. */
+static bool
+arc_cannot_substitute_mem_equiv_p (rtx)
+{
+ /* If SUBST is mem[base+index], the address may not fit iSA,
+ thus return true. */
+ return true;
+}
+
#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
#define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
#undef TARGET_CONSTANT_ALIGNMENT
#define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
+#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
+#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-arc.h"
diff --git a/gcc/testsuite/gcc.target/arc/lra-1.c b/gcc/testsuite/gcc.target/arc/lra-1.c
new file mode 100644
index 0000000..27336d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/lra-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -w -mlra" } */
+
+/* ap is replaced with an address like base+offset by lra,
+ where offset is larger than s9, resulting into an ICE. */
+
+typedef struct { char a[500] } b;
+c;
+struct d {
+ short e;
+ b f
+} g(int h, int i, int j, int k, char l, int m, int n, char *p) {
+again:;
+ struct d o;
+ *p = c = ({ q(o); });
+ goto again;
+}
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option.
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (8 preceding siblings ...)
2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
@ 2017-11-27 12:25 ` Claudiu Zissulescu
2017-12-07 23:32 ` Andrew Burgess
2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 12:25 UTC (permalink / raw)
To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess
From: claziss <claziss@synopsys.com>
Track which regs are set fixed/call saved/call used from commnad line.
Do not try to override their properties if user says otherwise.
gcc/
2017-06-08 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (overrideregs): New variable.
(arc_override_options): Track fixed/call saved/call options.
(arc_conditional_register_usage): Check against overrideregs
variable whenever we change register properties.
---
gcc/config/arc/arc.c | 60 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 48 insertions(+), 12 deletions(-)
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 42ea921..0eeeb42 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -79,6 +79,9 @@ typedef struct GTY (()) _arc_jli_section
static arc_jli_section *arc_jli_sections = NULL;
+/* Track which regs are set fixed/call saved/call used from commnad line. */
+HARD_REG_SET overrideregs;
+
/* Maximum size of a loop. */
#define ARC_MAX_LOOP_LENGTH 4095
@@ -1144,6 +1147,30 @@ arc_override_options (void)
}
}
+ CLEAR_HARD_REG_SET (overrideregs);
+ if (common_deferred_options)
+ {
+ vec<cl_deferred_option> v =
+ *((vec<cl_deferred_option> *) common_deferred_options);
+ int reg, nregs, j;
+
+ FOR_EACH_VEC_ELT (v, i, opt)
+ {
+ switch (opt->opt_index)
+ {
+ case OPT_ffixed_:
+ case OPT_fcall_used_:
+ case OPT_fcall_saved_:
+ if ((reg = decode_reg_name_and_count (opt->arg, &nregs)) >= 0)
+ for (j = reg; j < reg + nregs; j++)
+ SET_HARD_REG_BIT (overrideregs, j);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+
/* Set cpu flags accordingly to architecture/selected cpu. The cpu
specific flags are set in arc-common.c. The architecture forces
the default hardware configurations in, regardless what command
@@ -1673,14 +1700,20 @@ arc_conditional_register_usage (void)
/* For ARCv2 the core register set is changed. */
strcpy (rname29, "ilink");
strcpy (rname30, "r30");
- call_used_regs[30] = 1;
- fixed_regs[30] = 0;
-
- arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
- SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
- SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
- SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
- SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
+
+ if (!TEST_HARD_REG_BIT (overrideregs, 30))
+ {
+ /* No user interference. Set the r30 to be used by the
+ compiler. */
+ call_used_regs[30] = 1;
+ fixed_regs[30] = 0;
+
+ arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
+ SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
+ SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
+ SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
+ SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
+ }
}
if (TARGET_MUL64_SET)
@@ -1935,11 +1968,14 @@ arc_conditional_register_usage (void)
SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCL_REGNO);
SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCH_REGNO);
- /* Allow the compiler to freely use them. */
- fixed_regs[ACCL_REGNO] = 0;
- fixed_regs[ACCH_REGNO] = 0;
+ /* Allow the compiler to freely use them. */
+ if (!TEST_HARD_REG_BIT (overrideregs, ACCL_REGNO))
+ fixed_regs[ACCL_REGNO] = 0;
+ if (!TEST_HARD_REG_BIT (overrideregs, ACCH_REGNO))
+ fixed_regs[ACCH_REGNO] = 0;
- arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
+ if (!fixed_regs[ACCH_REGNO] && !fixed_regs[ACCL_REGNO])
+ arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
}
}
--
1.9.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
@ 2017-11-27 23:27 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:27 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:50 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> Sometimes the memory equivalent is not valid due to a large offset.
> For example replacing the ap register with its fp/sp-equivalent during
> LRA step. To solve this we introduced TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
>
> gcc/
> 2017-08-08 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (arc_cannot_substitute_mem_equiv_p): New function.
> (TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define.
>
> gcc/testsuite
> 2017-08-08 Claudiu Zissulescu <claziss@synopsys.com>
>
> * gcc.target/arc/lra-1.c: New test.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.c | 12 ++++++++++++
> gcc/testsuite/gcc.target/arc/lra-1.c | 17 +++++++++++++++++
> 2 files changed, 29 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index dd922a6..25f123c 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -11352,12 +11352,24 @@ arc_use_anchors_for_symbol_p (const_rtx symbol)
> return default_use_anchors_for_symbol_p (symbol);
> }
>
> +/* Return true if SUBST can't safely replace its equivalent during RA. */
> +static bool
> +arc_cannot_substitute_mem_equiv_p (rtx)
> +{
> + /* If SUBST is mem[base+index], the address may not fit iSA,
> + thus return true. */
> + return true;
> +}
> +
> #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
> #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
>
> #undef TARGET_CONSTANT_ALIGNMENT
> #define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
>
> +#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
> +#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
> +
> struct gcc_target targetm = TARGET_INITIALIZER;
>
> #include "gt-arc.h"
> diff --git a/gcc/testsuite/gcc.target/arc/lra-1.c b/gcc/testsuite/gcc.target/arc/lra-1.c
> new file mode 100644
> index 0000000..27336d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/lra-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -w -mlra" } */
> +
> +/* ap is replaced with an address like base+offset by lra,
> + where offset is larger than s9, resulting into an ICE. */
> +
> +typedef struct { char a[500] } b;
> +c;
> +struct d {
> + short e;
> + b f
> +} g(int h, int i, int j, int k, char l, int m, int n, char *p) {
> +again:;
> + struct d o;
> + *p = c = ({ q(o); });
> + goto again;
> +}
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels.
2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
@ 2017-11-27 23:29 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:29 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:51 +0100]:
> From: claziss <claziss@synopsys.com>
>
> Make sure we mark the hw-loop labels as beeing used.
>
> gcc/
> 2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (hwloop_optimize): Update hw-loop's end/start
> labels number of usages.
>
> gcc/testsuite
> 2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
>
> * gcc.target/arc/loop-2.cpp: New test.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.c | 3 +++
> gcc/testsuite/gcc.target/arc/loop-2.cpp | 18 ++++++++++++++++++
> 2 files changed, 21 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 25f123c..964815a 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7702,6 +7702,9 @@ hwloop_optimize (hwloop_info loop)
> /* Insert the loop end label before the last instruction of the
> loop. */
> emit_label_after (end_label, loop->last_insn);
> + /* Make sure we mark the begining and end label as used. */
> + LABEL_NUSES (loop->end_label)++;
> + LABEL_NUSES (loop->start_label)++;
>
> return true;
> }
> diff --git a/gcc/testsuite/gcc.target/arc/loop-2.cpp b/gcc/testsuite/gcc.target/arc/loop-2.cpp
> new file mode 100644
> index 0000000..d1dc917
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-2.cpp
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2" } *
> +/* { dg-do assemble } */
> +
> +/* This file fails to assemble if we forgot to increase the number of
> + uses for loop's start and end labels. */
> +int a, c, d;
> +int *b;
> +void fn1(int p1) {
> + if (d == 5)
> + for (int i; i < p1; ++i)
> + if (c)
> + b[i] = c;
> + else
> + int t = a = t;
> + else
> + for (int i; i < p1; ++i)
> + b[i] = 0;
> +}
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot.
2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
@ 2017-11-27 23:32 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:32 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:52 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> The ARC ZOL implementation doesn't allow the last instruction to be a
> control instruction or part of a delay slot. Thus, we add a note to
> the last ZOL instruction which will prevent it to finish into a delay
> slot.
>
> 2017-10-20 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (hwloop_optimize): Prevent the last
> ZOL instruction to end into a delay slot.
> * config/arc/arc.md (cond_delay_insn): Check if the instruction
> can be placed into a delay slot against reg_note.
> (in_delay_slot): Likewise.
>
> testsuite/
> 2017-10-20 Claudiu Zissulescu <claziss@synopsys.com>
>
> * gcc.target/arc/loop-3.c: New test.
> * gcc.target/arc/loop-4.c: Likewise.
OK.
Thanks,
Andrew
>
> [FIX][ZOL] fix checking for jumps
> ---
> gcc/config/arc/arc.c | 6 ++++++
> gcc/config/arc/arc.md | 4 ++++
> gcc/testsuite/gcc.target/arc/loop-3.c | 27 +++++++++++++++++++++++++++
> gcc/testsuite/gcc.target/arc/loop-4.c | 14 ++++++++++++++
> 4 files changed, 51 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
> create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 964815a..1479a8d 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7609,6 +7609,12 @@ hwloop_optimize (hwloop_info loop)
> loop->loop_no);
> last_insn = emit_insn_after (gen_nopv (), last_insn);
> }
> +
> + /* SAVE_NOTE is used by haifa scheduler. However, we are after it
> + and we can use it to indicate the last ZOL instruction cannot be
> + part of a delay slot. */
> + add_reg_note (last_insn, REG_SAVE_NOTE, GEN_INT (2));
> +
> loop->last_insn = last_insn;
>
> /* Get the loop iteration register. */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 2e0ac52..6239483 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -472,6 +472,8 @@
> (symbol_ref "(arc_hazard (prev_active_insn (insn), insn)
> + arc_hazard (insn, next_active_insn (insn)))"))
> (const_string "false")
> + (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
> + (const_string "false")
> (eq_attr "iscompact" "maybe") (const_string "true")
> ]
>
> @@ -499,6 +501,8 @@
> (cond [(eq_attr "cond" "!canuse") (const_string "no")
> (eq_attr "type" "call,branch,uncond_branch,jump,brcc")
> (const_string "no")
> + (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
> + (const_string "no")
> (eq_attr "length" "2,4") (const_string "yes")]
> (const_string "no")))
>
> diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c b/gcc/testsuite/gcc.target/arc/loop-3.c
> new file mode 100644
> index 0000000..bf7aec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-3.c
> @@ -0,0 +1,27 @@
> +/* { dg-do assemble } */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sdata" } *
> +
> +/* This example will fail to assemble if the last instruction is a
> + branch with delay slot. */
> +int d;
> +extern char * fn2 (void);
> +
> +void fn1(void)
> +{
> + char *a = fn2();
> + for (;;) {
> + long long b;
> + int e = 8;
> + for (; e <= 63; e += 7) {
> + long c = *a++;
> + b += c & e;
> + if (c & 28)
> + break;
> + }
> + d = b;
> + }
> +}
> +
> +/* { dg-final { scan-assembler "bne_s @.L2" } } */
> +/* { dg-final { scan-assembler-not "add.eq" } } */
> diff --git a/gcc/testsuite/gcc.target/arc/loop-4.c b/gcc/testsuite/gcc.target/arc/loop-4.c
> new file mode 100644
> index 0000000..99a93a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble } */
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +
> +void fn1(void *p1, int p2, int p3)
> +{
> + char *d = p1;
> + do
> + *d++ = p2;
> + while (--p3);
> +}
> +
> +/* { dg-final { scan-assembler "lp_count" } } */
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 04/10] [ARC] Add ARCv2 core3 tune option.
2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
@ 2017-11-27 23:35 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:35 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:53 +0100]:
> From: claziss <claziss@synopsys.com>
>
> ARCv2 Core3 cpus are comming with dbnz support. Add this feature on
> the tune option.
>
> gcc/
> 2017-09-14 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc-arch.h (arc_tune_attr): Add ARC_TUNE_CORE_3.
> * config/arc/arc.c (arc_sched_issue_rate): Use ARC_TUNE_... .
> (arc_init): Likewise.
> (arc_override_options): Likewise.
> (arc_file_start): Choose Tag_ARC_CPU_variation based on arc_tune
> value.
> (hwloop_fail): Use TARGET_DBNZ when we want to check for dbnz insn
> support.
> * config/arc/arc.h (TARGET_DBNZ): Define.
> * config/arc/arc.md (attr tune): Add core_3, use ARC_TUNE_... to
> properly set the tune attribute.
> (dbnz): Use TARGET_DBNZ guard.
> * config/arc/arc.opt (mtune): Add core3 option.
OK.
Thanks,
Andrew
> ---
> gcc/config/arc/arc-arch.h | 3 ++-
> gcc/config/arc/arc.c | 21 ++++++++++++---------
> gcc/config/arc/arc.h | 2 ++
> gcc/config/arc/arc.md | 22 ++++++++++++----------
> gcc/config/arc/arc.opt | 40 ++++++++++++++++++++++------------------
> 5 files changed, 50 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/config/arc/arc-arch.h b/gcc/config/arc/arc-arch.h
> index 7c3f47c..38d2bcb 100644
> --- a/gcc/config/arc/arc-arch.h
> +++ b/gcc/config/arc/arc-arch.h
> @@ -75,7 +75,8 @@ enum arc_tune_attr
> ARC_TUNE_ARC700_4_2_XMAC,
> ARC_TUNE_ARCHS4X,
> ARC_TUNE_ARCHS4XD,
> - ARC_TUNE_ARCHS4XD_SLOW
> + ARC_TUNE_ARCHS4XD_SLOW,
> + ARC_TUNE_CORE_3
> };
>
> /* CPU specific properties. */
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 1479a8d..4d7a282 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -442,8 +442,8 @@ arc_sched_issue_rate (void)
> {
> switch (arc_tune)
> {
> - case TUNE_ARCHS4X:
> - case TUNE_ARCHS4XD:
> + case ARC_TUNE_ARCHS4X:
> + case ARC_TUNE_ARCHS4XD:
> return 3;
> default:
> break;
> @@ -866,21 +866,21 @@ arc_init (void)
> if (arc_multcost < 0)
> switch (arc_tune)
> {
> - case TUNE_ARC700_4_2_STD:
> + case ARC_TUNE_ARC700_4_2_STD:
> /* latency 7;
> max throughput (1 multiply + 4 other insns) / 5 cycles. */
> arc_multcost = COSTS_N_INSNS (4);
> if (TARGET_NOMPY_SET)
> arc_multcost = COSTS_N_INSNS (30);
> break;
> - case TUNE_ARC700_4_2_XMAC:
> + case ARC_TUNE_ARC700_4_2_XMAC:
> /* latency 5;
> max throughput (1 multiply + 2 other insns) / 3 cycles. */
> arc_multcost = COSTS_N_INSNS (3);
> if (TARGET_NOMPY_SET)
> arc_multcost = COSTS_N_INSNS (30);
> break;
> - case TUNE_ARC600:
> + case ARC_TUNE_ARC600:
> if (TARGET_MUL64_SET)
> {
> arc_multcost = COSTS_N_INSNS (4);
> @@ -1196,8 +1196,8 @@ arc_override_options (void)
> #undef ARC_OPT
>
> /* Set Tune option. */
> - if (arc_tune == TUNE_NONE)
> - arc_tune = (enum attr_tune) arc_selected_cpu->tune;
> + if (arc_tune == ARC_TUNE_NONE)
> + arc_tune = (enum arc_tune_attr) arc_selected_cpu->tune;
>
> if (arc_size_opt_level == 3)
> optimize_size = 1;
> @@ -5205,6 +5205,9 @@ static void arc_file_start (void)
> TARGET_NO_SDATA_SET ? 0 : 2);
> asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_ABI_exceptions, %d\n",
> TARGET_OPTFPE ? 1 : 0);
> + if (TARGET_V2)
> + asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_CPU_variation, %d\n",
> + arc_tune == ARC_TUNE_CORE_3 ? 3 : 2);
> }
>
> /* Implement `TARGET_ASM_FILE_END'. */
> @@ -7389,11 +7392,11 @@ hwloop_fail (hwloop_info loop)
> rtx test;
> rtx insn = loop->loop_end;
>
> - if (TARGET_V2
> + if (TARGET_DBNZ
> && (loop->length && (loop->length <= ARC_MAX_LOOP_LENGTH))
> && REG_P (loop->iter_reg))
> {
> - /* TARGET_V2 has dbnz instructions. */
> + /* TARGET_V2 core3 has dbnz instructions. */
> test = gen_dbnz (loop->iter_reg, loop->start_label);
> insn = emit_jump_insn_before (test, loop->loop_end);
> }
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index b5a8f84..8d90975 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1628,5 +1628,7 @@ enum
> /* Custom FP instructions used by QuarkSE EM cpu. */
> #define TARGET_FPX_QUARK (TARGET_EM && TARGET_SPFP \
> && (arc_fpu_build == FPX_QK))
> +/* DBNZ support is available for ARCv2 core3 cpus. */
> +#define TARGET_DBNZ (TARGET_V2 && (arc_tune == ARC_TUNE_CORE_3))
>
> #endif /* GCC_ARC_H */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 6239483..b8fa44e 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -597,19 +597,21 @@
> ;; is made that makes conditional execution required.
>
> (define_attr "tune" "none, arc600, arc700_4_2_std, arc700_4_2_xmac, archs4x, \
> -archs4xd, archs4xd_slow"
> +archs4xd, archs4xd_slow, core_3"
> (const
> - (cond [(symbol_ref "arc_tune == TUNE_ARC600")
> + (cond [(symbol_ref "arc_tune == ARC_TUNE_ARC600")
> (const_string "arc600")
> - (symbol_ref "arc_tune == TUNE_ARC700_4_2_STD")
> + (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_STD")
> (const_string "arc700_4_2_std")
> - (symbol_ref "arc_tune == TUNE_ARC700_4_2_XMAC")
> + (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_XMAC")
> (const_string "arc700_4_2_xmac")
> - (symbol_ref "arc_tune == TUNE_ARCHS4X")
> + (symbol_ref "arc_tune == ARC_TUNE_ARCHS4X")
> (const_string "archs4x")
> - (ior (symbol_ref "arc_tune == TUNE_ARCHS4XD")
> - (symbol_ref "arc_tune == TUNE_ARCHS4XD_SLOW"))
> - (const_string "archs4xd")]
> + (ior (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD")
> + (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD_SLOW"))
> + (const_string "archs4xd")
> + (symbol_ref "arc_tune == ARC_TUNE_CORE_3")
> + (const_string "core_3")]
> (const_string "none"))))
>
> (define_attr "tune_arc700" "false,true"
> @@ -5200,11 +5202,11 @@ archs4xd, archs4xd_slow"
> (plus:SI (match_dup 0)
> (const_int -1)))
> (clobber (match_scratch:SI 2 "=X,r"))]
> - "TARGET_V2"
> + "TARGET_DBNZ"
> "@
> dbnz%#\\t%0,%l1
> #"
> - "TARGET_V2 && reload_completed && memory_operand (operands[0], SImode)"
> + "TARGET_DBNZ && reload_completed && memory_operand (operands[0], SImode)"
> [(set (match_dup 2) (match_dup 0))
> (set (match_dup 2) (plus:SI (match_dup 2) (const_int -1)))
> (set (reg:CC CC_REG) (compare:CC (match_dup 2) (const_int 0)))
> diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
> index aacb599..6b0104a 100644
> --- a/gcc/config/arc/arc.opt
> +++ b/gcc/config/arc/arc.opt
> @@ -249,29 +249,33 @@ mmultcost=
> Target RejectNegative Joined UInteger Var(arc_multcost) Init(-1)
> Cost to assume for a multiply instruction, with 4 being equal to a normal insn.
>
> -mtune=ARC600
> -Target RejectNegative Var(arc_tune, TUNE_ARC600)
> -Tune for ARC600 cpu.
> +mtune=
> +Target RejectNegative ToLower Joined Var(arc_tune) Enum(arc_tune_attr) Init(ARC_TUNE_NONE)
> +-mcpu=TUNE Tune code for given ARC variant.
>
> -mtune=ARC601
> -Target RejectNegative Var(arc_tune, TUNE_ARC600)
> -Tune for ARC601 cpu.
> +Enum
> +Name(arc_tune_attr) Type(int)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(arc600) Value(ARC_TUNE_ARC600)
>
> -mtune=ARC700
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_STD)
> -Tune for ARC700 R4.2 Cpu with standard multiplier block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc601) Value(ARC_TUNE_ARC600)
>
> -mtune=ARC700-xmac
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc700) Value(ARC_TUNE_ARC700_4_2_STD)
>
> -mtune=ARC725D
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc700-xmac) Value(ARC_TUNE_ARC700_4_2_XMAC)
>
> -mtune=ARC750D
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc725d) Value(ARC_TUNE_ARC700_4_2_XMAC)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(arc750d) Value(ARC_TUNE_ARC700_4_2_XMAC)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(core3) Value(ARC_TUNE_CORE_3)
>
> mindexed-loads
> Target Var(TARGET_INDEXED_LOADS) Init(TARGET_INDEXED_LOADS_DEFAULT)
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 05/10] [ARC] Add trap instruction.
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
@ 2017-11-27 23:40 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:40 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:54 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> 2017-11-07 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.md (trap): New pattern.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.md | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index b8fa44e..42c6a23 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -4321,6 +4321,13 @@ archs4xd, archs4xd_slow, core_3"
> ; use it for lack of inter-procedural branch shortening.
> ; Link-time relaxation would help...
>
> +(define_insn "trap"
> + [(trap_if (const_int 1) (const_int 0))]
> + "!TARGET_ARC600_FAMILY"
> + "trap_s\\t5"
> + [(set_attr "type" "misc")
> + (set_attr "length" "2")])
> +
> (define_insn "nop"
> [(const_int 0)]
> ""
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/10] [ARC] Update legitimate constant hook.
2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
@ 2017-12-07 23:30 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:30 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:55 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> Make sure we check the constants in all cases.
>
> gcc/
> 2017-10-14 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (arc_legitimate_constant_p): Always check all
> constants.
>
> testsuite/
> 2017-10-14 Claudiu Zissulescu <claziss@synopsys.com>
>
> * gcc.target/arc/tls-1.c: New test.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.c | 6 ------
> gcc/testsuite/gcc.target/arc/tls-1.c | 26 ++++++++++++++++++++++++++
> 2 files changed, 26 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 4d7a282..42ea921 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -6185,12 +6185,6 @@ arc_return_addr_rtx (int count, ATTRIBUTE_UNUSED rtx frame)
> bool
> arc_legitimate_constant_p (machine_mode mode, rtx x)
> {
> - if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x))
> - return false;
> -
> - if (!flag_pic && mode != Pmode)
> - return true;
> -
> switch (GET_CODE (x))
> {
> case CONST:
> diff --git a/gcc/testsuite/gcc.target/arc/tls-1.c b/gcc/testsuite/gcc.target/arc/tls-1.c
> new file mode 100644
> index 0000000..3f7a6d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/tls-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target tls } */
> +/* { dg-options "-O3 -std=gnu99" } */
> +
> +/* Check if addressing the `pos` member of struct is done via tls
> + mechanism. */
> +
> +struct callchain_cursor {
> + int last;
> + long long pos;
> +} __thread a;
> +void fn1(struct callchain_cursor *p1)
> +{
> + p1->pos++;
> +}
> +
> +extern void fn3 (void);
> +
> +void fn2(void) {
> + struct callchain_cursor *b = &a;
> + while (1) {
> + fn3();
> + fn1(b);
> + }
> +}
> +/* { dg-final { scan-assembler "r25,@a@tpoff" } } */
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option.
2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
@ 2017-12-07 23:32 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:32 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:56 +0100]:
> From: claziss <claziss@synopsys.com>
>
> Track which regs are set fixed/call saved/call used from commnad line.
> Do not try to override their properties if user says otherwise.
>
> gcc/
> 2017-06-08 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (overrideregs): New variable.
> (arc_override_options): Track fixed/call saved/call options.
> (arc_conditional_register_usage): Check against overrideregs
> variable whenever we change register properties.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.c | 60 +++++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 48 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 42ea921..0eeeb42 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -79,6 +79,9 @@ typedef struct GTY (()) _arc_jli_section
>
> static arc_jli_section *arc_jli_sections = NULL;
>
> +/* Track which regs are set fixed/call saved/call used from commnad line. */
> +HARD_REG_SET overrideregs;
> +
> /* Maximum size of a loop. */
> #define ARC_MAX_LOOP_LENGTH 4095
>
> @@ -1144,6 +1147,30 @@ arc_override_options (void)
> }
> }
>
> + CLEAR_HARD_REG_SET (overrideregs);
> + if (common_deferred_options)
> + {
> + vec<cl_deferred_option> v =
> + *((vec<cl_deferred_option> *) common_deferred_options);
> + int reg, nregs, j;
> +
> + FOR_EACH_VEC_ELT (v, i, opt)
> + {
> + switch (opt->opt_index)
> + {
> + case OPT_ffixed_:
> + case OPT_fcall_used_:
> + case OPT_fcall_saved_:
> + if ((reg = decode_reg_name_and_count (opt->arg, &nregs)) >= 0)
> + for (j = reg; j < reg + nregs; j++)
> + SET_HARD_REG_BIT (overrideregs, j);
> + break;
> + default:
> + break;
> + }
> + }
> + }
> +
> /* Set cpu flags accordingly to architecture/selected cpu. The cpu
> specific flags are set in arc-common.c. The architecture forces
> the default hardware configurations in, regardless what command
> @@ -1673,14 +1700,20 @@ arc_conditional_register_usage (void)
> /* For ARCv2 the core register set is changed. */
> strcpy (rname29, "ilink");
> strcpy (rname30, "r30");
> - call_used_regs[30] = 1;
> - fixed_regs[30] = 0;
> -
> - arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
> - SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
> - SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
> - SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
> - SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
> +
> + if (!TEST_HARD_REG_BIT (overrideregs, 30))
> + {
> + /* No user interference. Set the r30 to be used by the
> + compiler. */
> + call_used_regs[30] = 1;
> + fixed_regs[30] = 0;
> +
> + arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
> + SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
> + SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
> + SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
> + SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
> + }
> }
>
> if (TARGET_MUL64_SET)
> @@ -1935,11 +1968,14 @@ arc_conditional_register_usage (void)
> SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCL_REGNO);
> SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCH_REGNO);
>
> - /* Allow the compiler to freely use them. */
> - fixed_regs[ACCL_REGNO] = 0;
> - fixed_regs[ACCH_REGNO] = 0;
> + /* Allow the compiler to freely use them. */
> + if (!TEST_HARD_REG_BIT (overrideregs, ACCL_REGNO))
> + fixed_regs[ACCL_REGNO] = 0;
> + if (!TEST_HARD_REG_BIT (overrideregs, ACCH_REGNO))
> + fixed_regs[ACCH_REGNO] = 0;
>
> - arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
> + if (!fixed_regs[ACCH_REGNO] && !fixed_regs[ACCL_REGNO])
> + arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
> }
> }
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 09/10] [ARC] Update (u)maddsidi patterns.
2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
@ 2017-12-07 23:35 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:35 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:58 +0100]:
> From: claziss <claziss@synopsys.com>
>
> The accumulator registers are freely used by the compiler. However,
> there are a number of instructions which are having an intrinsic use
> of these registers. Update patterns to inform the compiler which ones.
>
> gcc/
> 2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.md (maddsidi4, maddsidi4_split): Update pattern.
> (umaddsidi4,umaddsidi4): Likewise.
>
> gcc/testsuite
> 2017-09-19 Claudiu Zissulescu <claziss@synopsys.com>
>
> * gcc.target/arc/tumaddsidi4.c: New test.
Looks good.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.md | 32 ++++++++++++++++++++++++++----
> gcc/testsuite/gcc.target/arc/tumaddsidi4.c | 14 +++++++++++++
> 2 files changed, 42 insertions(+), 4 deletions(-)
> create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c
>
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 42c6a23..155ee6c 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -6175,13 +6175,25 @@ archs4xd, archs4xd_slow, core_3"
> [(set_attr "length" "0")])
>
> ;; MAC and DMPY instructions
> -(define_insn_and_split "maddsidi4"
> +(define_expand "maddsidi4"
> + [(match_operand:DI 0 "register_operand" "")
> + (match_operand:SI 1 "register_operand" "")
> + (match_operand:SI 2 "extend_operand" "")
> + (match_operand:DI 3 "register_operand" "")]
> + "TARGET_PLUS_DMPY"
> + "{
> + emit_insn (gen_maddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
> + DONE;
> + }")
> +
> +(define_insn_and_split "maddsidi4_split"
> [(set (match_operand:DI 0 "register_operand" "=r")
> (plus:DI
> (mult:DI
> (sign_extend:DI (match_operand:SI 1 "register_operand" "%r"))
> (sign_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
> - (match_operand:DI 3 "register_operand" "r")))]
> + (match_operand:DI 3 "register_operand" "r")))
> + (clobber (reg:DI ARCV2_ACC))]
> "TARGET_PLUS_DMPY"
> "#"
> "TARGET_PLUS_DMPY && reload_completed"
> @@ -6263,13 +6275,25 @@ archs4xd, archs4xd_slow, core_3"
> (set_attr "predicable" "no")
> (set_attr "cond" "nocond")])
>
> -(define_insn_and_split "umaddsidi4"
> +(define_expand "umaddsidi4"
> + [(match_operand:DI 0 "register_operand" "")
> + (match_operand:SI 1 "register_operand" "")
> + (match_operand:SI 2 "extend_operand" "")
> + (match_operand:DI 3 "register_operand" "")]
> + "TARGET_PLUS_DMPY"
> + "{
> + emit_insn (gen_umaddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
> + DONE;
> + }")
> +
> +(define_insn_and_split "umaddsidi4_split"
> [(set (match_operand:DI 0 "register_operand" "=r")
> (plus:DI
> (mult:DI
> (zero_extend:DI (match_operand:SI 1 "register_operand" "%r"))
> (zero_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
> - (match_operand:DI 3 "register_operand" "r")))]
> + (match_operand:DI 3 "register_operand" "r")))
> + (clobber (reg:DI ARCV2_ACC))]
> "TARGET_PLUS_DMPY"
> "#"
> "TARGET_PLUS_DMPY && reload_completed"
> diff --git a/gcc/testsuite/gcc.target/arc/tumaddsidi4.c b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
> new file mode 100755
> index 0000000..40d2b33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=archs -O1 -mmpy-option=plus_dmpy" } */
> +
> +/* Check how we generate umaddsidi4 patterns. */
> +long a;
> +long long b;
> +unsigned c, d;
> +
> +void fn1(void)
> +{
> + b = d * (long long)c + a;
> +}
> +
> +/* { dg-final { scan-assembler "macu 0,r" } } */
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 08/10] [ARC] Enable unaligned access.
2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
@ 2018-01-02 12:05 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-02 12:05 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:57 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> Use munaligned-access to control if we can have unaligned accesses. For ARC
> HS family unaligned access is always on.
>
> 2017-10-19 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc-c.def (__ARC_UNALIGNED__): New define.
> * config/arc/arc.h (STRICT_ALIGNMENT): Control this macro using
> munaligned-access.
This looks fine,
Thanks,
Andrew
> ---
> gcc/config/arc/arc-c.def | 1 +
> gcc/config/arc/arc.h | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
> index c9443c9..86eab4e 100644
> --- a/gcc/config/arc/arc-c.def
> +++ b/gcc/config/arc/arc-c.def
> @@ -29,6 +29,7 @@ ARC_C_DEF ("__ARC_MUL64__", TARGET_MUL64_SET)
> ARC_C_DEF ("__ARC_MUL32BY16__", TARGET_MULMAC_32BY16_SET)
> ARC_C_DEF ("__ARC_SIMD__", TARGET_SIMD_SET)
> ARC_C_DEF ("__ARC_RF16__", TARGET_RF16)
> +ARC_C_DEF ("__ARC_UNALIGNED__", !STRICT_ALIGNMENT)
>
> ARC_C_DEF ("__ARC_BARREL_SHIFTER__", TARGET_BARREL_SHIFTER)
>
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 8d90975..8c31fb2 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -288,7 +288,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
> /* On the ARC the lower address bits are masked to 0 as necessary. The chip
> won't croak when given an unaligned address, but the insn will still fail
> to produce the correct result. */
> -#define STRICT_ALIGNMENT 1
> +#define STRICT_ALIGNMENT (!unaligned_access && !TARGET_HS)
>
> /* Layout of source language data types. */
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 10/10] [ARC] Revamp trampoline implementation.
2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
@ 2018-01-02 12:16 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-02 12:16 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:59 +0100]:
> From: Claudiu Zissulescu <claziss@gmail.com>
>
> The new implementation attempts to clean up the existing trampoline
> implementation for ARC making it to work for linux type of systems.
>
> gcc/
> 2017-11-10 Claudiu Zissulescu <claziss@synopsys.com>
>
> * config/arc/arc.c (TARGET_TRAMPOLINE_ADJUST_ADDRESS): Delete.
> (emit_store_direct): Likewise.
> (arc_trampoline_adjust_address): Likewise.
> (arc_asm_trampoline_template): New function.
> (arc_initialize_trampoline): Use asm_trampoline_template.
> (TARGET_ASM_TRAMPOLINE_TEMPLATE): Define.
> * config/arc/arc.h (TRAMPOLINE_SIZE): Adjust to 16.
> *config/arc/arc.md (flush_icache): Delete pattern.
^-- Missing space here.
Otherwise, looks fine.
Thanks,
Andrew
> ---
> gcc/config/arc/arc.c | 89 +++++++++++++++++++++++++--------------------------
> gcc/config/arc/arc.h | 2 +-
> gcc/config/arc/arc.md | 9 ------
> 3 files changed, 44 insertions(+), 56 deletions(-)
>
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 0eeeb42..053f3c2 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -588,8 +588,6 @@ static void arc_finalize_pic (void);
>
> #define TARGET_TRAMPOLINE_INIT arc_initialize_trampoline
>
> -#define TARGET_TRAMPOLINE_ADJUST_ADDRESS arc_trampoline_adjust_address
> -
> #define TARGET_CAN_ELIMINATE arc_can_eliminate
>
> #define TARGET_FRAME_POINTER_REQUIRED arc_frame_pointer_required
> @@ -3727,69 +3725,65 @@ output_shift (rtx *operands)
> \f
> /* Nested function support. */
>
> -/* Directly store VALUE into memory object BLOCK at OFFSET. */
> -
> -static void
> -emit_store_direct (rtx block, int offset, int value)
> -{
> - emit_insn (gen_store_direct (adjust_address (block, SImode, offset),
> - force_reg (SImode,
> - gen_int_mode (value, SImode))));
> -}
> +/* Output assembler code for a block containing the constant parts of
> + a trampoline, leaving space for variable parts.
>
> -/* Emit RTL insns to initialize the variable parts of a trampoline.
> - FNADDR is an RTX for the address of the function's pure code.
> - CXT is an RTX for the static chain value for the function. */
> -/* With potentially multiple shared objects loaded, and multiple stacks
> - present for multiple thereds where trampolines might reside, a simple
> - range check will likely not suffice for the profiler to tell if a callee
> - is a trampoline. We a speedier check by making the trampoline start at
> - an address that is not 4-byte aligned.
> A trampoline looks like this:
>
> - nop_s 0x78e0
> -entry:
> ld_s r12,[pcl,12] 0xd403
> ld r11,[pcl,12] 0x170c 700b
> j_s [r12] 0x7c00
> - nop_s 0x78e0
> + .word function's address
> + .word static chain value
> +
> +*/
> +
> +static void
> +arc_asm_trampoline_template (FILE *f)
> +{
> + asm_fprintf (f, "\tld_s\t%s,[pcl,8]\n", ARC_TEMP_SCRATCH_REG);
> + asm_fprintf (f, "\tld\t%s,[pcl,12]\n", reg_names[STATIC_CHAIN_REGNUM]);
> + asm_fprintf (f, "\tj_s\t[%s]\n", ARC_TEMP_SCRATCH_REG);
> + assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> + assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> +}
> +
> +/* Emit RTL insns to initialize the variable parts of a trampoline.
> + FNADDR is an RTX for the address of the function's pure code. CXT
> + is an RTX for the static chain value for the function.
>
> The fastest trampoline to execute for trampolines within +-8KB of CTX
> would be:
> +
> add2 r11,pcl,s12
> j [limm] 0x20200f80 limm
> - and that would also be faster to write to the stack by computing the offset
> - from CTX to TRAMP at compile time. However, it would really be better to
> - get rid of the high cost of cache invalidation when generating trampolines,
> - which requires that the code part of trampolines stays constant, and
> - additionally either
> - - making sure that no executable code but trampolines is on the stack,
> - no icache entries linger for the area of the stack from when before the
> - stack was allocated, and allocating trampolines in trampoline-only
> - cache lines
> - or
> - - allocate trampolines fram a special pool of pre-allocated trampolines. */
> +
> + and that would also be faster to write to the stack by computing
> + the offset from CTX to TRAMP at compile time. However, it would
> + really be better to get rid of the high cost of cache invalidation
> + when generating trampolines, which requires that the code part of
> + trampolines stays constant, and additionally either making sure
> + that no executable code but trampolines is on the stack, no icache
> + entries linger for the area of the stack from when before the stack
> + was allocated, and allocating trampolines in trampoline-only cache
> + lines or allocate trampolines fram a special pool of pre-allocated
> + trampolines. */
>
> static void
> arc_initialize_trampoline (rtx tramp, tree fndecl, rtx cxt)
> {
> rtx fnaddr = XEXP (DECL_RTL (fndecl), 0);
>
> - emit_store_direct (tramp, 0, TARGET_BIG_ENDIAN ? 0x78e0d403 : 0xd40378e0);
> - emit_store_direct (tramp, 4, TARGET_BIG_ENDIAN ? 0x170c700b : 0x700b170c);
> - emit_store_direct (tramp, 8, TARGET_BIG_ENDIAN ? 0x7c0078e0 : 0x78e07c00);
> - emit_move_insn (adjust_address (tramp, SImode, 12), fnaddr);
> - emit_move_insn (adjust_address (tramp, SImode, 16), cxt);
> - emit_insn (gen_flush_icache (adjust_address (tramp, SImode, 0)));
> -}
> + emit_block_move (tramp, assemble_trampoline_template (),
> + GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
>
> -/* Allow the profiler to easily distinguish trampolines from normal
> - functions. */
> + emit_move_insn (adjust_address (tramp, SImode, 8), fnaddr);
> + emit_move_insn (adjust_address (tramp, SImode, 12), cxt);
>
> -static rtx
> -arc_trampoline_adjust_address (rtx addr)
> -{
> - return plus_constant (Pmode, addr, 2);
> + emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
> + LCT_NORMAL, VOIDmode, 2, XEXP (tramp, 0), Pmode,
> + plus_constant (Pmode, XEXP (tramp, 0), TRAMPOLINE_SIZE),
> + Pmode);
> }
>
> /* Add the given function declaration to emit code in JLI section. */
> @@ -11412,6 +11406,9 @@ arc_cannot_substitute_mem_equiv_p (rtx)
> #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
> #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
>
> +#undef TARGET_ASM_TRAMPOLINE_TEMPLATE
> +#define TARGET_ASM_TRAMPOLINE_TEMPLATE arc_asm_trampoline_template
> +
> struct gcc_target targetm = TARGET_INITIALIZER;
>
> #include "gt-arc.h"
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 8c31fb2..317a653 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -829,7 +829,7 @@ extern int arc_initial_elimination_offset(int from, int to);
> /* Trampolines. */
>
> /* Length in units of the trampoline for entering a nested function. */
> -#define TRAMPOLINE_SIZE 20
> +#define TRAMPOLINE_SIZE 16
>
> /* Alignment required for a trampoline in bits . */
> /* For actual data alignment we just need 32, no more than the stack;
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 155ee6c..e1418a9 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -4345,15 +4345,6 @@ archs4xd, archs4xd_slow, core_3"
> (set_attr "iscompact" "true")
> (set_attr "length" "2")])
>
> -;; Special pattern to flush the icache.
> -;; ??? Not sure what to do here. Some ARC's are known to support this.
> -
> -(define_insn "flush_icache"
> - [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")] 0)]
> - ""
> - "* return \"\";"
> - [(set_attr "type" "misc")])
> -
> ;; Split up troublesome insns for better scheduling.
>
> ;; Peepholes go at the end.
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH 00/10][ARC] Critical fixes
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
` (9 preceding siblings ...)
2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
@ 2018-01-08 15:23 ` Claudiu Zissulescu
2018-01-16 10:20 ` Andrew Burgess
10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2018-01-08 15:23 UTC (permalink / raw)
To: gcc-patches; +Cc: Francois.Bedard, andrew.burgess
> [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
> [ARC] Don't allow the last ZOL insn to be in a delay slot.
> [ARC] Add trap instruction.
> [ARC] Update legitimate constant hook.
> [ARC] Enable unaligned access.
> [ARC] Revamp trampoline implementation.
> [ARC][ZOL] Update uses for hw-loop labels.
> [ARC] Add ARCv2 core3 tune option.
> [ARC][FIX] Consider command line ffixed- option.
> [ARC] Update (u)maddsidi patterns.
Hi Andrew,
Thank you for reviewing this batch of fixes. Any chance to check also these ones, they are hanging there for a long time now:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html
Thank you,
Claudiu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 00/10][ARC] Critical fixes
2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
@ 2018-01-16 10:20 ` Andrew Burgess
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-16 10:20 UTC (permalink / raw)
To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2018-01-08 15:18:30 +0000]:
> > [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
> > [ARC] Don't allow the last ZOL insn to be in a delay slot.
> > [ARC] Add trap instruction.
> > [ARC] Update legitimate constant hook.
> > [ARC] Enable unaligned access.
> > [ARC] Revamp trampoline implementation.
> > [ARC][ZOL] Update uses for hw-loop labels.
> > [ARC] Add ARCv2 core3 tune option.
> > [ARC][FIX] Consider command line ffixed- option.
> > [ARC] Update (u)maddsidi patterns.
>
> Hi Andrew,
>
> Thank you for reviewing this batch of fixes. Any chance to check also these ones, they are hanging there for a long time now:
>
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html
Sorry for missing these, they somehow didn't make it onto my todo
list.
I'll review these over the next couple of days.
Thanks,
Andrew
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-01-16 10:16 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
2017-11-27 23:40 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
2017-11-27 23:29 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
2017-12-07 23:35 ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
2017-11-27 23:32 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
2017-11-27 23:35 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
2018-01-02 12:05 ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
2017-12-07 23:30 ` Andrew Burgess
2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
2018-01-02 12:16 ` Andrew Burgess
2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
2017-11-27 23:27 ` Andrew Burgess
2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
2017-12-07 23:32 ` Andrew Burgess
2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2018-01-16 10:20 ` Andrew Burgess
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).