public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 09/10] [ARC] Update (u)maddsidi patterns.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
  2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
  2017-12-07 23:35   ` Andrew Burgess
  2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess

From: claziss <claziss@synopsys.com>

The accumulator registers are freely used by the compiler. However,
there are a number of instructions which are having an intrinsic use
of these registers. Update patterns to inform the compiler which ones.

gcc/
2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.md (maddsidi4, maddsidi4_split): Update pattern.
	(umaddsidi4,umaddsidi4): Likewise.

gcc/testsuite
2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>

	* gcc.target/arc/tumaddsidi4.c: New test.
---
 gcc/config/arc/arc.md                      | 32 ++++++++++++++++++++++++++----
 gcc/testsuite/gcc.target/arc/tumaddsidi4.c | 14 +++++++++++++
 2 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 42c6a23..155ee6c 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -6175,13 +6175,25 @@ archs4xd, archs4xd_slow, core_3"
   [(set_attr "length" "0")])
 
 ;; MAC and DMPY instructions
-(define_insn_and_split "maddsidi4"
+(define_expand "maddsidi4"
+  [(match_operand:DI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "extend_operand"   "")
+   (match_operand:DI 3 "register_operand" "")]
+  "TARGET_PLUS_DMPY"
+  "{
+   emit_insn (gen_maddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
+   DONE;
+  }")
+
+(define_insn_and_split "maddsidi4_split"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(plus:DI
 	 (mult:DI
 	  (sign_extend:DI (match_operand:SI 1 "register_operand" "%r"))
 	  (sign_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
-	 (match_operand:DI 3 "register_operand" "r")))]
+	 (match_operand:DI 3 "register_operand" "r")))
+   (clobber (reg:DI ARCV2_ACC))]
   "TARGET_PLUS_DMPY"
   "#"
   "TARGET_PLUS_DMPY && reload_completed"
@@ -6263,13 +6275,25 @@ archs4xd, archs4xd_slow, core_3"
    (set_attr "predicable" "no")
    (set_attr "cond" "nocond")])
 
-(define_insn_and_split "umaddsidi4"
+(define_expand "umaddsidi4"
+  [(match_operand:DI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "extend_operand"   "")
+   (match_operand:DI 3 "register_operand" "")]
+  "TARGET_PLUS_DMPY"
+  "{
+   emit_insn (gen_umaddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
+   DONE;
+  }")
+
+(define_insn_and_split "umaddsidi4_split"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(plus:DI
 	 (mult:DI
 	  (zero_extend:DI (match_operand:SI 1 "register_operand" "%r"))
 	  (zero_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
-	 (match_operand:DI 3 "register_operand" "r")))]
+	 (match_operand:DI 3 "register_operand" "r")))
+   (clobber (reg:DI ARCV2_ACC))]
   "TARGET_PLUS_DMPY"
   "#"
   "TARGET_PLUS_DMPY && reload_completed"
diff --git a/gcc/testsuite/gcc.target/arc/tumaddsidi4.c b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
new file mode 100755
index 0000000..40d2b33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=archs -O1 -mmpy-option=plus_dmpy" } */
+
+/* Check how we generate umaddsidi4 patterns.  */
+long a;
+long long b;
+unsigned c, d;
+
+void fn1(void)
+{
+  b = d * (long long)c + a;
+}
+
+/* { dg-final { scan-assembler "macu 0,r" } } */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
  2017-11-27 23:29   ` Andrew Burgess
  2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess

From: claziss <claziss@synopsys.com>

Make sure we mark the hw-loop labels as beeing used.

gcc/
2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (hwloop_optimize): Update hw-loop's end/start
	labels number of usages.

gcc/testsuite
2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>

	* gcc.target/arc/loop-2.cpp: New test.
---
 gcc/config/arc/arc.c                    |  3 +++
 gcc/testsuite/gcc.target/arc/loop-2.cpp | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 25f123c..964815a 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7702,6 +7702,9 @@ hwloop_optimize (hwloop_info loop)
   /* Insert the loop end label before the last instruction of the
      loop.  */
   emit_label_after (end_label, loop->last_insn);
+  /* Make sure we mark the begining and end label as used.  */
+  LABEL_NUSES (loop->end_label)++;
+  LABEL_NUSES (loop->start_label)++;
 
   return true;
 }
diff --git a/gcc/testsuite/gcc.target/arc/loop-2.cpp b/gcc/testsuite/gcc.target/arc/loop-2.cpp
new file mode 100644
index 0000000..d1dc917
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-2.cpp
@@ -0,0 +1,18 @@
+/* { dg-options "-O2" } *
+/* { dg-do assemble } */
+
+/* This file fails to assemble if we forgot to increase the number of
+   uses for loop's start and end labels.  */
+int a, c, d;
+int *b;
+void fn1(int p1) {
+  if (d == 5)
+    for (int i; i < p1; ++i)
+      if (c)
+        b[i] = c;
+      else
+        int t = a = t;
+  else
+    for (int i; i < p1; ++i)
+      b[i] = 0;
+}
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 05/10] [ARC] Add trap instruction.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (2 preceding siblings ...)
  2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
  2017-11-27 23:40   ` Andrew Burgess
  2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

2017-11-07  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.md (trap): New pattern.
---
 gcc/config/arc/arc.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index b8fa44e..42c6a23 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -4321,6 +4321,13 @@ archs4xd, archs4xd_slow, core_3"
 ; use it for lack of inter-procedural branch shortening.
 ; Link-time relaxation would help...
 
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  "!TARGET_ARC600_FAMILY"
+  "trap_s\\t5"
+  [(set_attr "type" "misc")
+   (set_attr "length" "2")])
+
 (define_insn "nop"
   [(const_int 0)]
   ""
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
  2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
  2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
@ 2017-11-27 11:14 ` Claudiu Zissulescu
  2017-11-27 23:32   ` Andrew Burgess
  2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

The ARC ZOL implementation doesn't allow the last instruction to be a
control instruction or part of a delay slot.  Thus, we add a note to
the last ZOL instruction which will prevent it to finish into a delay
slot.

2017-10-20  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (hwloop_optimize): Prevent the last
        ZOL instruction to end into a delay slot.
	* config/arc/arc.md (cond_delay_insn): Check if the instruction
	can be placed into a delay slot against reg_note.
	(in_delay_slot): Likewise.

testsuite/
2017-10-20  Claudiu Zissulescu  <claziss@synopsys.com>

	* gcc.target/arc/loop-3.c: New test.
	* gcc.target/arc/loop-4.c: Likewise.

[FIX][ZOL] fix checking for jumps
---
 gcc/config/arc/arc.c                  |  6 ++++++
 gcc/config/arc/arc.md                 |  4 ++++
 gcc/testsuite/gcc.target/arc/loop-3.c | 27 +++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/arc/loop-4.c | 14 ++++++++++++++
 4 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 964815a..1479a8d 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7609,6 +7609,12 @@ hwloop_optimize (hwloop_info loop)
 		 loop->loop_no);
       last_insn = emit_insn_after (gen_nopv (), last_insn);
     }
+
+  /* SAVE_NOTE is used by haifa scheduler.  However, we are after it
+     and we can use it to indicate the last ZOL instruction cannot be
+     part of a delay slot.  */
+  add_reg_note (last_insn, REG_SAVE_NOTE, GEN_INT (2));
+
   loop->last_insn = last_insn;
 
   /* Get the loop iteration register.  */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 2e0ac52..6239483 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -472,6 +472,8 @@
 	     (symbol_ref "(arc_hazard (prev_active_insn (insn), insn)
 			   + arc_hazard (insn, next_active_insn (insn)))"))
 	 (const_string "false")
+	 (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
+	 (const_string "false")
 	 (eq_attr "iscompact" "maybe") (const_string "true")
 	 ]
 
@@ -499,6 +501,8 @@
   (cond [(eq_attr "cond" "!canuse") (const_string "no")
 	 (eq_attr "type" "call,branch,uncond_branch,jump,brcc")
 	 (const_string "no")
+	 (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
+	 (const_string "no")
 	 (eq_attr "length" "2,4") (const_string "yes")]
 	(const_string "no")))
 
diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c b/gcc/testsuite/gcc.target/arc/loop-3.c
new file mode 100644
index 0000000..bf7aec9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-3.c
@@ -0,0 +1,27 @@
+/* { dg-do assemble } */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sdata" } *
+
+/* This example will fail to assemble if the last instruction is a
+   branch with delay slot.  */
+int d;
+extern char * fn2 (void);
+
+void fn1(void)
+{
+  char *a = fn2();
+  for (;;) {
+    long long b;
+    int e = 8;
+    for (; e <= 63; e += 7) {
+      long c = *a++;
+      b += c & e;
+      if (c & 28)
+        break;
+    }
+    d = b;
+  }
+}
+
+/* { dg-final { scan-assembler "bne_s @.L2" } } */
+/* { dg-final { scan-assembler-not "add.eq" } } */
diff --git a/gcc/testsuite/gcc.target/arc/loop-4.c b/gcc/testsuite/gcc.target/arc/loop-4.c
new file mode 100644
index 0000000..99a93a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-4.c
@@ -0,0 +1,14 @@
+/* { dg-do assemble } */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+
+void fn1(void *p1, int p2, int p3)
+{
+  char *d = p1;
+  do
+    *d++ = p2;
+  while (--p3);
+}
+
+/* { dg-final { scan-assembler "lp_count" } } */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 08/10] [ARC] Enable unaligned access.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (5 preceding siblings ...)
  2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
  2018-01-02 12:05   ` Andrew Burgess
  2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

Use munaligned-access to control if we can have unaligned accesses.  For ARC
HS family unaligned access is always on.

2017-10-19  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc-c.def (__ARC_UNALIGNED__): New define.
	* config/arc/arc.h (STRICT_ALIGNMENT): Control this macro using
	munaligned-access.
---
 gcc/config/arc/arc-c.def | 1 +
 gcc/config/arc/arc.h     | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
index c9443c9..86eab4e 100644
--- a/gcc/config/arc/arc-c.def
+++ b/gcc/config/arc/arc-c.def
@@ -29,6 +29,7 @@ ARC_C_DEF ("__ARC_MUL64__",	TARGET_MUL64_SET)
 ARC_C_DEF ("__ARC_MUL32BY16__", TARGET_MULMAC_32BY16_SET)
 ARC_C_DEF ("__ARC_SIMD__",	TARGET_SIMD_SET)
 ARC_C_DEF ("__ARC_RF16__",	TARGET_RF16)
+ARC_C_DEF ("__ARC_UNALIGNED__",	!STRICT_ALIGNMENT)
 
 ARC_C_DEF ("__ARC_BARREL_SHIFTER__", TARGET_BARREL_SHIFTER)
 
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 8d90975..8c31fb2 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -288,7 +288,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT		\
 /* On the ARC the lower address bits are masked to 0 as necessary.  The chip
    won't croak when given an unaligned address, but the insn will still fail
    to produce the correct result.  */
-#define STRICT_ALIGNMENT 1
+#define STRICT_ALIGNMENT (!unaligned_access && !TARGET_HS)
 
 /* Layout of source language data types.  */
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 00/10][ARC] Critical fixes
@ 2017-11-27 11:15 Claudiu Zissulescu
  2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess

From: claziss <claziss@synopsys.com>

Hi,

This bunch of patches contain a number of critical patches to ARC backend:
     - For ZOL: we have two patches which are avoiding the last ZOL instruction
to be placed into a delay slot, and update the number of uses for the ZOL labels.
Also, we enable the DBNZ instruction only for ARC HS Core3 cpus. Tests are provided.
     - Update the legitimate constant hook.
     - The trampoline implementation is revamped, and tested to work on ARC Linux.
Without this patch trampolines are not working for ARC linux.
     - The accumulator registers usage can be controlled via -ffixed  option, Also update
a number of patterns to reflect usage of accumulator regs.
     - Add trap instruction, needed for ARC linux.
     - Add TARGET_CANNOT_SUBSTITUE_MEM_EQUIV, to avoid LRA issues. Test provided.

Ok to apply?
Claudiu

Claudiu Zissulescu:
  [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
  [ARC] Don't allow the last ZOL insn to be in a delay slot.
  [ARC] Add trap instruction.
  [ARC] Update legitimate constant hook.
  [ARC] Enable unaligned access.
  [ARC] Revamp trampoline implementation.
  [ARC][ZOL] Update uses for hw-loop labels.
  [ARC] Add ARCv2 core3 tune option.
  [ARC][FIX] Consider command line ffixed- option.
  [ARC] Update (u)maddsidi patterns.

 gcc/config/arc/arc-arch.h                  |   3 +-
 gcc/config/arc/arc-c.def                   |   1 +
 gcc/config/arc/arc.c                       | 197 ++++++++++++++++++-----------
 gcc/config/arc/arc.h                       |   6 +-
 gcc/config/arc/arc.md                      |  74 +++++++----
 gcc/config/arc/arc.opt                     |  40 +++---
 gcc/testsuite/gcc.target/arc/loop-2.cpp    |  18 +++
 gcc/testsuite/gcc.target/arc/loop-3.c      |  27 ++++
 gcc/testsuite/gcc.target/arc/loop-4.c      |  14 ++
 gcc/testsuite/gcc.target/arc/lra-1.c       |  17 +++
 gcc/testsuite/gcc.target/arc/tls-1.c       |  26 ++++
 gcc/testsuite/gcc.target/arc/tumaddsidi4.c |  14 ++
 12 files changed, 320 insertions(+), 117 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c
 create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c
 create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 04/10] [ARC] Add ARCv2 core3 tune option.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (3 preceding siblings ...)
  2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
  2017-11-27 23:35   ` Andrew Burgess
  2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess

From: claziss <claziss@synopsys.com>

ARCv2 Core3 cpus are comming with dbnz support. Add this feature on
the tune option.

gcc/
2017-09-14  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc-arch.h (arc_tune_attr): Add ARC_TUNE_CORE_3.
	* config/arc/arc.c (arc_sched_issue_rate): Use ARC_TUNE_... .
	(arc_init): Likewise.
	(arc_override_options): Likewise.
	(arc_file_start): Choose Tag_ARC_CPU_variation based on arc_tune
	value.
	(hwloop_fail): Use TARGET_DBNZ when we want to check for dbnz insn
	support.
	* config/arc/arc.h (TARGET_DBNZ): Define.
	* config/arc/arc.md (attr tune): Add core_3, use ARC_TUNE_... to
	properly set the tune attribute.
	(dbnz): Use TARGET_DBNZ guard.
	* config/arc/arc.opt (mtune): Add core3 option.
---
 gcc/config/arc/arc-arch.h |  3 ++-
 gcc/config/arc/arc.c      | 21 ++++++++++++---------
 gcc/config/arc/arc.h      |  2 ++
 gcc/config/arc/arc.md     | 22 ++++++++++++----------
 gcc/config/arc/arc.opt    | 40 ++++++++++++++++++++++------------------
 5 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arc/arc-arch.h b/gcc/config/arc/arc-arch.h
index 7c3f47c..38d2bcb 100644
--- a/gcc/config/arc/arc-arch.h
+++ b/gcc/config/arc/arc-arch.h
@@ -75,7 +75,8 @@ enum arc_tune_attr
     ARC_TUNE_ARC700_4_2_XMAC,
     ARC_TUNE_ARCHS4X,
     ARC_TUNE_ARCHS4XD,
-    ARC_TUNE_ARCHS4XD_SLOW
+    ARC_TUNE_ARCHS4XD_SLOW,
+    ARC_TUNE_CORE_3
   };
 
 /* CPU specific properties.  */
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 1479a8d..4d7a282 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -442,8 +442,8 @@ arc_sched_issue_rate (void)
 {
   switch (arc_tune)
     {
-    case TUNE_ARCHS4X:
-    case TUNE_ARCHS4XD:
+    case ARC_TUNE_ARCHS4X:
+    case ARC_TUNE_ARCHS4XD:
       return 3;
     default:
       break;
@@ -866,21 +866,21 @@ arc_init (void)
   if (arc_multcost < 0)
     switch (arc_tune)
       {
-      case TUNE_ARC700_4_2_STD:
+      case ARC_TUNE_ARC700_4_2_STD:
 	/* latency 7;
 	   max throughput (1 multiply + 4 other insns) / 5 cycles.  */
 	arc_multcost = COSTS_N_INSNS (4);
 	if (TARGET_NOMPY_SET)
 	  arc_multcost = COSTS_N_INSNS (30);
 	break;
-      case TUNE_ARC700_4_2_XMAC:
+      case ARC_TUNE_ARC700_4_2_XMAC:
 	/* latency 5;
 	   max throughput (1 multiply + 2 other insns) / 3 cycles.  */
 	arc_multcost = COSTS_N_INSNS (3);
 	if (TARGET_NOMPY_SET)
 	  arc_multcost = COSTS_N_INSNS (30);
 	break;
-      case TUNE_ARC600:
+      case ARC_TUNE_ARC600:
 	if (TARGET_MUL64_SET)
 	  {
 	    arc_multcost = COSTS_N_INSNS (4);
@@ -1196,8 +1196,8 @@ arc_override_options (void)
 #undef ARC_OPT
 
   /* Set Tune option.  */
-  if (arc_tune == TUNE_NONE)
-    arc_tune = (enum attr_tune) arc_selected_cpu->tune;
+  if (arc_tune == ARC_TUNE_NONE)
+    arc_tune = (enum arc_tune_attr) arc_selected_cpu->tune;
 
   if (arc_size_opt_level == 3)
     optimize_size = 1;
@@ -5205,6 +5205,9 @@ static void arc_file_start (void)
 	       TARGET_NO_SDATA_SET ? 0 : 2);
   asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_ABI_exceptions, %d\n",
 	       TARGET_OPTFPE ? 1 : 0);
+  if (TARGET_V2)
+    asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_CPU_variation, %d\n",
+		 arc_tune == ARC_TUNE_CORE_3 ? 3 : 2);
 }
 
 /* Implement `TARGET_ASM_FILE_END'.  */
@@ -7389,11 +7392,11 @@ hwloop_fail (hwloop_info loop)
   rtx test;
   rtx insn = loop->loop_end;
 
-  if (TARGET_V2
+  if (TARGET_DBNZ
       && (loop->length && (loop->length <= ARC_MAX_LOOP_LENGTH))
       && REG_P (loop->iter_reg))
     {
-      /* TARGET_V2 has dbnz instructions.  */
+      /* TARGET_V2 core3 has dbnz instructions.  */
       test = gen_dbnz (loop->iter_reg, loop->start_label);
       insn = emit_jump_insn_before (test, loop->loop_end);
     }
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index b5a8f84..8d90975 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1628,5 +1628,7 @@ enum
 /* Custom FP instructions used by QuarkSE EM cpu.  */
 #define TARGET_FPX_QUARK    (TARGET_EM && TARGET_SPFP		\
 			     && (arc_fpu_build == FPX_QK))
+/* DBNZ support is available for ARCv2 core3 cpus.  */
+#define TARGET_DBNZ (TARGET_V2 && (arc_tune == ARC_TUNE_CORE_3))
 
 #endif /* GCC_ARC_H */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 6239483..b8fa44e 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -597,19 +597,21 @@
 ;;   is made that makes conditional execution required.
 
 (define_attr "tune" "none, arc600, arc700_4_2_std, arc700_4_2_xmac, archs4x, \
-archs4xd, archs4xd_slow"
+archs4xd, archs4xd_slow, core_3"
   (const
-   (cond [(symbol_ref "arc_tune == TUNE_ARC600")
+   (cond [(symbol_ref "arc_tune == ARC_TUNE_ARC600")
 	  (const_string "arc600")
-	  (symbol_ref "arc_tune == TUNE_ARC700_4_2_STD")
+	  (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_STD")
 	  (const_string "arc700_4_2_std")
-	  (symbol_ref "arc_tune == TUNE_ARC700_4_2_XMAC")
+	  (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_XMAC")
 	  (const_string "arc700_4_2_xmac")
-	  (symbol_ref "arc_tune == TUNE_ARCHS4X")
+	  (symbol_ref "arc_tune == ARC_TUNE_ARCHS4X")
 	  (const_string "archs4x")
-	  (ior (symbol_ref "arc_tune == TUNE_ARCHS4XD")
-	       (symbol_ref "arc_tune == TUNE_ARCHS4XD_SLOW"))
-	  (const_string "archs4xd")]
+	  (ior (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD")
+	       (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD_SLOW"))
+	  (const_string "archs4xd")
+	  (symbol_ref "arc_tune == ARC_TUNE_CORE_3")
+	  (const_string "core_3")]
 	 (const_string "none"))))
 
 (define_attr "tune_arc700" "false,true"
@@ -5200,11 +5202,11 @@ archs4xd, archs4xd_slow"
 	(plus:SI (match_dup 0)
 		 (const_int -1)))
    (clobber (match_scratch:SI 2 "=X,r"))]
-  "TARGET_V2"
+  "TARGET_DBNZ"
   "@
    dbnz%#\\t%0,%l1
    #"
-  "TARGET_V2 && reload_completed && memory_operand (operands[0], SImode)"
+  "TARGET_DBNZ && reload_completed && memory_operand (operands[0], SImode)"
   [(set (match_dup 2) (match_dup 0))
    (set (match_dup 2) (plus:SI (match_dup 2) (const_int -1)))
    (set (reg:CC CC_REG) (compare:CC (match_dup 2) (const_int 0)))
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index aacb599..6b0104a 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -249,29 +249,33 @@ mmultcost=
 Target RejectNegative Joined UInteger Var(arc_multcost) Init(-1)
 Cost to assume for a multiply instruction, with 4 being equal to a normal insn.
 
-mtune=ARC600
-Target RejectNegative Var(arc_tune, TUNE_ARC600)
-Tune for ARC600 cpu.
+mtune=
+Target RejectNegative ToLower Joined Var(arc_tune) Enum(arc_tune_attr) Init(ARC_TUNE_NONE)
+-mcpu=TUNE Tune code for given ARC variant.
 
-mtune=ARC601
-Target RejectNegative Var(arc_tune, TUNE_ARC600)
-Tune for ARC601 cpu.
+Enum
+Name(arc_tune_attr) Type(int)
+
+EnumValue
+Enum(arc_tune_attr) String(arc600) Value(ARC_TUNE_ARC600)
 
-mtune=ARC700
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_STD)
-Tune for ARC700 R4.2 Cpu with standard multiplier block.
+EnumValue
+Enum(arc_tune_attr) String(arc601) Value(ARC_TUNE_ARC600)
 
-mtune=ARC700-xmac
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc700) Value(ARC_TUNE_ARC700_4_2_STD)
 
-mtune=ARC725D
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc700-xmac) Value(ARC_TUNE_ARC700_4_2_XMAC)
 
-mtune=ARC750D
-Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
-Tune for ARC700 R4.2 Cpu with XMAC block.
+EnumValue
+Enum(arc_tune_attr) String(arc725d) Value(ARC_TUNE_ARC700_4_2_XMAC)
+
+EnumValue
+Enum(arc_tune_attr) String(arc750d) Value(ARC_TUNE_ARC700_4_2_XMAC)
+
+EnumValue
+Enum(arc_tune_attr) String(core3) Value(ARC_TUNE_CORE_3)
 
 mindexed-loads
 Target Var(TARGET_INDEXED_LOADS) Init(TARGET_INDEXED_LOADS_DEFAULT)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 06/10] [ARC] Update legitimate constant hook.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (4 preceding siblings ...)
  2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
@ 2017-11-27 11:15 ` Claudiu Zissulescu
  2017-12-07 23:30   ` Andrew Burgess
  2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

Make sure we check the constants in all cases.

gcc/
2017-10-14  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (arc_legitimate_constant_p): Always check all
	constants.

testsuite/
2017-10-14  Claudiu Zissulescu  <claziss@synopsys.com>

	* gcc.target/arc/tls-1.c: New test.
---
 gcc/config/arc/arc.c                 |  6 ------
 gcc/testsuite/gcc.target/arc/tls-1.c | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 4d7a282..42ea921 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -6185,12 +6185,6 @@ arc_return_addr_rtx (int count, ATTRIBUTE_UNUSED rtx frame)
 bool
 arc_legitimate_constant_p (machine_mode mode, rtx x)
 {
-  if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x))
-    return false;
-
-  if (!flag_pic && mode != Pmode)
-    return true;
-
   switch (GET_CODE (x))
     {
     case CONST:
diff --git a/gcc/testsuite/gcc.target/arc/tls-1.c b/gcc/testsuite/gcc.target/arc/tls-1.c
new file mode 100644
index 0000000..3f7a6d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/tls-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target tls } */
+/* { dg-options "-O3 -std=gnu99" } */
+
+/* Check if addressing the `pos` member of struct is done via tls
+   mechanism.  */
+
+struct callchain_cursor {
+  int last;
+  long long pos;
+} __thread a;
+void fn1(struct callchain_cursor *p1)
+{
+  p1->pos++;
+}
+
+extern void fn3 (void);
+
+void fn2(void) {
+  struct callchain_cursor *b = &a;
+  while (1) {
+    fn3();
+    fn1(b);
+  }
+}
+/* { dg-final { scan-assembler "r25,@a@tpoff" } } */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10/10] [ARC] Revamp trampoline implementation.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (6 preceding siblings ...)
  2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
@ 2017-11-27 11:16 ` Claudiu Zissulescu
  2018-01-02 12:16   ` Andrew Burgess
  2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

The new implementation attempts to clean up the existing trampoline
implementation for ARC making it to work for linux type of systems.

gcc/
2017-11-10  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (TARGET_TRAMPOLINE_ADJUST_ADDRESS): Delete.
	(emit_store_direct): Likewise.
	(arc_trampoline_adjust_address): Likewise.
	(arc_asm_trampoline_template): New function.
	(arc_initialize_trampoline): Use asm_trampoline_template.
	(TARGET_ASM_TRAMPOLINE_TEMPLATE): Define.
	* config/arc/arc.h (TRAMPOLINE_SIZE): Adjust to 16.
	*config/arc/arc.md (flush_icache): Delete pattern.
---
 gcc/config/arc/arc.c  | 89 +++++++++++++++++++++++++--------------------------
 gcc/config/arc/arc.h  |  2 +-
 gcc/config/arc/arc.md |  9 ------
 3 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 0eeeb42..053f3c2 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -588,8 +588,6 @@ static void arc_finalize_pic (void);
 
 #define TARGET_TRAMPOLINE_INIT arc_initialize_trampoline
 
-#define TARGET_TRAMPOLINE_ADJUST_ADDRESS arc_trampoline_adjust_address
-
 #define TARGET_CAN_ELIMINATE arc_can_eliminate
 
 #define TARGET_FRAME_POINTER_REQUIRED arc_frame_pointer_required
@@ -3727,69 +3725,65 @@ output_shift (rtx *operands)
 \f
 /* Nested function support.  */
 
-/* Directly store VALUE into memory object BLOCK at OFFSET.  */
-
-static void
-emit_store_direct (rtx block, int offset, int value)
-{
-  emit_insn (gen_store_direct (adjust_address (block, SImode, offset),
-			       force_reg (SImode,
-					  gen_int_mode (value, SImode))));
-}
+/* Output assembler code for a block containing the constant parts of
+   a trampoline, leaving space for variable parts.
 
-/* Emit RTL insns to initialize the variable parts of a trampoline.
-   FNADDR is an RTX for the address of the function's pure code.
-   CXT is an RTX for the static chain value for the function.  */
-/* With potentially multiple shared objects loaded, and multiple stacks
-   present for multiple thereds where trampolines might reside, a simple
-   range check will likely not suffice for the profiler to tell if a callee
-   is a trampoline.  We a speedier check by making the trampoline start at
-   an address that is not 4-byte aligned.
    A trampoline looks like this:
 
-   nop_s	     0x78e0
-entry:
    ld_s r12,[pcl,12] 0xd403
    ld   r11,[pcl,12] 0x170c 700b
    j_s [r12]         0x7c00
-   nop_s	     0x78e0
+   .word function's address
+   .word static chain value
+
+*/
+
+static void
+arc_asm_trampoline_template (FILE *f)
+{
+  asm_fprintf (f, "\tld_s\t%s,[pcl,8]\n", ARC_TEMP_SCRATCH_REG);
+  asm_fprintf (f, "\tld\t%s,[pcl,12]\n", reg_names[STATIC_CHAIN_REGNUM]);
+  asm_fprintf (f, "\tj_s\t[%s]\n", ARC_TEMP_SCRATCH_REG);
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+}
+
+/* Emit RTL insns to initialize the variable parts of a trampoline.
+   FNADDR is an RTX for the address of the function's pure code.  CXT
+   is an RTX for the static chain value for the function.
 
    The fastest trampoline to execute for trampolines within +-8KB of CTX
    would be:
+
    add2 r11,pcl,s12
    j [limm]           0x20200f80 limm
-   and that would also be faster to write to the stack by computing the offset
-   from CTX to TRAMP at compile time.  However, it would really be better to
-   get rid of the high cost of cache invalidation when generating trampolines,
-   which requires that the code part of trampolines stays constant, and
-   additionally either
-   - making sure that no executable code but trampolines is on the stack,
-     no icache entries linger for the area of the stack from when before the
-     stack was allocated, and allocating trampolines in trampoline-only
-     cache lines
-  or
-   - allocate trampolines fram a special pool of pre-allocated trampolines.  */
+
+   and that would also be faster to write to the stack by computing
+   the offset from CTX to TRAMP at compile time.  However, it would
+   really be better to get rid of the high cost of cache invalidation
+   when generating trampolines, which requires that the code part of
+   trampolines stays constant, and additionally either making sure
+   that no executable code but trampolines is on the stack, no icache
+   entries linger for the area of the stack from when before the stack
+   was allocated, and allocating trampolines in trampoline-only cache
+   lines or allocate trampolines fram a special pool of pre-allocated
+   trampolines.  */
 
 static void
 arc_initialize_trampoline (rtx tramp, tree fndecl, rtx cxt)
 {
   rtx fnaddr = XEXP (DECL_RTL (fndecl), 0);
 
-  emit_store_direct (tramp, 0, TARGET_BIG_ENDIAN ? 0x78e0d403 : 0xd40378e0);
-  emit_store_direct (tramp, 4, TARGET_BIG_ENDIAN ? 0x170c700b : 0x700b170c);
-  emit_store_direct (tramp, 8, TARGET_BIG_ENDIAN ? 0x7c0078e0 : 0x78e07c00);
-  emit_move_insn (adjust_address (tramp, SImode, 12), fnaddr);
-  emit_move_insn (adjust_address (tramp, SImode, 16), cxt);
-  emit_insn (gen_flush_icache (adjust_address (tramp, SImode, 0)));
-}
+  emit_block_move (tramp, assemble_trampoline_template (),
+		   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
 
-/* Allow the profiler to easily distinguish trampolines from normal
-  functions.  */
+  emit_move_insn (adjust_address (tramp, SImode, 8), fnaddr);
+  emit_move_insn (adjust_address (tramp, SImode, 12), cxt);
 
-static rtx
-arc_trampoline_adjust_address (rtx addr)
-{
-  return plus_constant (Pmode, addr, 2);
+  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
+		     LCT_NORMAL, VOIDmode, 2, XEXP (tramp, 0), Pmode,
+		     plus_constant (Pmode, XEXP (tramp, 0), TRAMPOLINE_SIZE),
+		     Pmode);
 }
 
 /* Add the given function declaration to emit code in JLI section.  */
@@ -11412,6 +11406,9 @@ arc_cannot_substitute_mem_equiv_p (rtx)
 #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
 #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
 
+#undef TARGET_ASM_TRAMPOLINE_TEMPLATE
+#define TARGET_ASM_TRAMPOLINE_TEMPLATE arc_asm_trampoline_template
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arc.h"
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 8c31fb2..317a653 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -829,7 +829,7 @@ extern int arc_initial_elimination_offset(int from, int to);
 /* Trampolines.  */
 
 /* Length in units of the trampoline for entering a nested function.  */
-#define TRAMPOLINE_SIZE 20
+#define TRAMPOLINE_SIZE 16
 
 /* Alignment required for a trampoline in bits .  */
 /* For actual data alignment we just need 32, no more than the stack;
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 155ee6c..e1418a9 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -4345,15 +4345,6 @@ archs4xd, archs4xd_slow, core_3"
    (set_attr "iscompact" "true")
    (set_attr "length" "2")])
 
-;; Special pattern to flush the icache.
-;; ??? Not sure what to do here.  Some ARC's are known to support this.
-
-(define_insn "flush_icache"
-  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")] 0)]
-  ""
-  "* return \"\";"
-  [(set_attr "type" "misc")])
-
 ;; Split up troublesome insns for better scheduling.
 
 ;; Peepholes go at the end.
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (7 preceding siblings ...)
  2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
@ 2017-11-27 11:57 ` Claudiu Zissulescu
  2017-11-27 23:27   ` Andrew Burgess
  2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
  2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 11:57 UTC (permalink / raw)
  To: gcc-patches
  Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess, Claudiu Zissulescu

From: Claudiu Zissulescu <claziss@gmail.com>

Sometimes the memory equivalent is not valid due to a large offset.
For example replacing the ap register with its fp/sp-equivalent during
LRA step. To solve this we introduced TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.

gcc/
2017-08-08  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (arc_cannot_substitute_mem_equiv_p): New function.
	(TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define.

gcc/testsuite
2017-08-08  Claudiu Zissulescu  <claziss@synopsys.com>

	* gcc.target/arc/lra-1.c: New test.
---
 gcc/config/arc/arc.c                 | 12 ++++++++++++
 gcc/testsuite/gcc.target/arc/lra-1.c | 17 +++++++++++++++++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index dd922a6..25f123c 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -11352,12 +11352,24 @@ arc_use_anchors_for_symbol_p (const_rtx symbol)
   return default_use_anchors_for_symbol_p (symbol);
 }
 
+/* Return true if SUBST can't safely replace its equivalent during RA.  */
+static bool
+arc_cannot_substitute_mem_equiv_p (rtx)
+{
+  /* If SUBST is mem[base+index], the address may not fit iSA, 
+     thus return true. */
+  return true;
+}
+
 #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
 #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
 
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
 
+#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
+#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arc.h"
diff --git a/gcc/testsuite/gcc.target/arc/lra-1.c b/gcc/testsuite/gcc.target/arc/lra-1.c
new file mode 100644
index 0000000..27336d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/lra-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -w -mlra" } */
+
+/* ap is replaced with an address like base+offset by lra,
+   where offset is larger than s9, resulting into an ICE.  */
+
+typedef struct { char a[500] } b;
+c;
+struct d {
+  short e;
+  b f
+} g(int h, int i, int j, int k, char l, int m, int n, char *p) {
+again:;
+  struct d o;
+  *p = c = ({ q(o); });
+  goto again;
+}
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option.
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (8 preceding siblings ...)
  2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
@ 2017-11-27 12:25 ` Claudiu Zissulescu
  2017-12-07 23:32   ` Andrew Burgess
  2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2017-11-27 12:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudiu.Zissulescu, Francois.Bedard, andrew.burgess

From: claziss <claziss@synopsys.com>

Track which regs are set fixed/call saved/call used from commnad line.
Do not try to override their properties if user says otherwise.

gcc/
2017-06-08  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc.c (overrideregs): New variable.
	(arc_override_options): Track fixed/call saved/call options.
	(arc_conditional_register_usage): Check against overrideregs
	variable whenever we change register properties.
---
 gcc/config/arc/arc.c | 60 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 42ea921..0eeeb42 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -79,6 +79,9 @@ typedef struct GTY (()) _arc_jli_section
 
 static arc_jli_section *arc_jli_sections = NULL;
 
+/* Track which regs are set fixed/call saved/call used from commnad line.  */
+HARD_REG_SET overrideregs;
+
 /* Maximum size of a loop.  */
 #define ARC_MAX_LOOP_LENGTH 4095
 
@@ -1144,6 +1147,30 @@ arc_override_options (void)
 	  }
       }
 
+  CLEAR_HARD_REG_SET (overrideregs);
+  if (common_deferred_options)
+    {
+      vec<cl_deferred_option> v =
+	*((vec<cl_deferred_option> *) common_deferred_options);
+      int reg, nregs, j;
+
+      FOR_EACH_VEC_ELT (v, i, opt)
+	{
+	  switch (opt->opt_index)
+	    {
+	    case OPT_ffixed_:
+	    case OPT_fcall_used_:
+	    case OPT_fcall_saved_:
+	      if ((reg = decode_reg_name_and_count (opt->arg, &nregs)) >= 0)
+		for (j = reg;  j < reg + nregs; j++)
+		  SET_HARD_REG_BIT (overrideregs, j);
+	      break;
+	    default:
+	      break;
+	    }
+	}
+    }
+
   /* Set cpu flags accordingly to architecture/selected cpu.  The cpu
      specific flags are set in arc-common.c.  The architecture forces
      the default hardware configurations in, regardless what command
@@ -1673,14 +1700,20 @@ arc_conditional_register_usage (void)
       /* For ARCv2 the core register set is changed.  */
       strcpy (rname29, "ilink");
       strcpy (rname30, "r30");
-      call_used_regs[30] = 1;
-      fixed_regs[30] = 0;
-
-      arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
-      SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
-      SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
-      SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
-      SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
+
+      if (!TEST_HARD_REG_BIT (overrideregs, 30))
+	{
+	  /* No user interference.  Set the r30 to be used by the
+	     compiler.  */
+	  call_used_regs[30] = 1;
+	  fixed_regs[30] = 0;
+
+	  arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
+	  SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
+	  SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
+	  SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
+	  SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
+	}
    }
 
   if (TARGET_MUL64_SET)
@@ -1935,11 +1968,14 @@ arc_conditional_register_usage (void)
     SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCL_REGNO);
     SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCH_REGNO);
 
-     /* Allow the compiler to freely use them.  */
-    fixed_regs[ACCL_REGNO] = 0;
-    fixed_regs[ACCH_REGNO] = 0;
+    /* Allow the compiler to freely use them.  */
+    if (!TEST_HARD_REG_BIT (overrideregs, ACCL_REGNO))
+      fixed_regs[ACCL_REGNO] = 0;
+    if (!TEST_HARD_REG_BIT (overrideregs, ACCH_REGNO))
+      fixed_regs[ACCH_REGNO] = 0;
 
-    arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
+    if (!fixed_regs[ACCH_REGNO] && !fixed_regs[ACCL_REGNO])
+      arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
   }
 }
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
  2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
@ 2017-11-27 23:27   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:27 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:50 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> Sometimes the memory equivalent is not valid due to a large offset.
> For example replacing the ap register with its fp/sp-equivalent during
> LRA step. To solve this we introduced TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
> 
> gcc/
> 2017-08-08  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (arc_cannot_substitute_mem_equiv_p): New function.
> 	(TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define.
> 
> gcc/testsuite
> 2017-08-08  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* gcc.target/arc/lra-1.c: New test.

Looks good.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.c                 | 12 ++++++++++++
>  gcc/testsuite/gcc.target/arc/lra-1.c | 17 +++++++++++++++++
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arc/lra-1.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index dd922a6..25f123c 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -11352,12 +11352,24 @@ arc_use_anchors_for_symbol_p (const_rtx symbol)
>    return default_use_anchors_for_symbol_p (symbol);
>  }
>  
> +/* Return true if SUBST can't safely replace its equivalent during RA.  */
> +static bool
> +arc_cannot_substitute_mem_equiv_p (rtx)
> +{
> +  /* If SUBST is mem[base+index], the address may not fit iSA, 
> +     thus return true. */
> +  return true;
> +}
> +
>  #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
>  #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
>  
>  #undef TARGET_CONSTANT_ALIGNMENT
>  #define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
>  
> +#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
> +#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  #include "gt-arc.h"
> diff --git a/gcc/testsuite/gcc.target/arc/lra-1.c b/gcc/testsuite/gcc.target/arc/lra-1.c
> new file mode 100644
> index 0000000..27336d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/lra-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -w -mlra" } */
> +
> +/* ap is replaced with an address like base+offset by lra,
> +   where offset is larger than s9, resulting into an ICE.  */
> +
> +typedef struct { char a[500] } b;
> +c;
> +struct d {
> +  short e;
> +  b f
> +} g(int h, int i, int j, int k, char l, int m, int n, char *p) {
> +again:;
> +  struct d o;
> +  *p = c = ({ q(o); });
> +  goto again;
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels.
  2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
@ 2017-11-27 23:29   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:29 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:51 +0100]:

> From: claziss <claziss@synopsys.com>
> 
> Make sure we mark the hw-loop labels as beeing used.
> 
> gcc/
> 2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (hwloop_optimize): Update hw-loop's end/start
> 	labels number of usages.
> 
> gcc/testsuite
> 2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* gcc.target/arc/loop-2.cpp: New test.

Looks good.

Thanks,
Andrew

> ---
>  gcc/config/arc/arc.c                    |  3 +++
>  gcc/testsuite/gcc.target/arc/loop-2.cpp | 18 ++++++++++++++++++
>  2 files changed, 21 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arc/loop-2.cpp
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 25f123c..964815a 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7702,6 +7702,9 @@ hwloop_optimize (hwloop_info loop)
>    /* Insert the loop end label before the last instruction of the
>       loop.  */
>    emit_label_after (end_label, loop->last_insn);
> +  /* Make sure we mark the begining and end label as used.  */
> +  LABEL_NUSES (loop->end_label)++;
> +  LABEL_NUSES (loop->start_label)++;
>  
>    return true;
>  }
> diff --git a/gcc/testsuite/gcc.target/arc/loop-2.cpp b/gcc/testsuite/gcc.target/arc/loop-2.cpp
> new file mode 100644
> index 0000000..d1dc917
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-2.cpp
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2" } *
> +/* { dg-do assemble } */
> +
> +/* This file fails to assemble if we forgot to increase the number of
> +   uses for loop's start and end labels.  */
> +int a, c, d;
> +int *b;
> +void fn1(int p1) {
> +  if (d == 5)
> +    for (int i; i < p1; ++i)
> +      if (c)
> +        b[i] = c;
> +      else
> +        int t = a = t;
> +  else
> +    for (int i; i < p1; ++i)
> +      b[i] = 0;
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot.
  2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
@ 2017-11-27 23:32   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:32 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:52 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> The ARC ZOL implementation doesn't allow the last instruction to be a
> control instruction or part of a delay slot.  Thus, we add a note to
> the last ZOL instruction which will prevent it to finish into a delay
> slot.
> 
> 2017-10-20  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (hwloop_optimize): Prevent the last
>         ZOL instruction to end into a delay slot.
> 	* config/arc/arc.md (cond_delay_insn): Check if the instruction
> 	can be placed into a delay slot against reg_note.
> 	(in_delay_slot): Likewise.
> 
> testsuite/
> 2017-10-20  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* gcc.target/arc/loop-3.c: New test.
> 	* gcc.target/arc/loop-4.c: Likewise.

OK.

Thanks,
Andrew



> 
> [FIX][ZOL] fix checking for jumps
> ---
>  gcc/config/arc/arc.c                  |  6 ++++++
>  gcc/config/arc/arc.md                 |  4 ++++
>  gcc/testsuite/gcc.target/arc/loop-3.c | 27 +++++++++++++++++++++++++++
>  gcc/testsuite/gcc.target/arc/loop-4.c | 14 ++++++++++++++
>  4 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arc/loop-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arc/loop-4.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 964815a..1479a8d 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7609,6 +7609,12 @@ hwloop_optimize (hwloop_info loop)
>  		 loop->loop_no);
>        last_insn = emit_insn_after (gen_nopv (), last_insn);
>      }
> +
> +  /* SAVE_NOTE is used by haifa scheduler.  However, we are after it
> +     and we can use it to indicate the last ZOL instruction cannot be
> +     part of a delay slot.  */
> +  add_reg_note (last_insn, REG_SAVE_NOTE, GEN_INT (2));
> +
>    loop->last_insn = last_insn;
>  
>    /* Get the loop iteration register.  */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 2e0ac52..6239483 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -472,6 +472,8 @@
>  	     (symbol_ref "(arc_hazard (prev_active_insn (insn), insn)
>  			   + arc_hazard (insn, next_active_insn (insn)))"))
>  	 (const_string "false")
> +	 (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
> +	 (const_string "false")
>  	 (eq_attr "iscompact" "maybe") (const_string "true")
>  	 ]
>  
> @@ -499,6 +501,8 @@
>    (cond [(eq_attr "cond" "!canuse") (const_string "no")
>  	 (eq_attr "type" "call,branch,uncond_branch,jump,brcc")
>  	 (const_string "no")
> +	 (match_test "find_reg_note (insn, REG_SAVE_NOTE, GEN_INT (2))")
> +	 (const_string "no")
>  	 (eq_attr "length" "2,4") (const_string "yes")]
>  	(const_string "no")))
>  
> diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c b/gcc/testsuite/gcc.target/arc/loop-3.c
> new file mode 100644
> index 0000000..bf7aec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-3.c
> @@ -0,0 +1,27 @@
> +/* { dg-do assemble } */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sdata" } *
> +
> +/* This example will fail to assemble if the last instruction is a
> +   branch with delay slot.  */
> +int d;
> +extern char * fn2 (void);
> +
> +void fn1(void)
> +{
> +  char *a = fn2();
> +  for (;;) {
> +    long long b;
> +    int e = 8;
> +    for (; e <= 63; e += 7) {
> +      long c = *a++;
> +      b += c & e;
> +      if (c & 28)
> +        break;
> +    }
> +    d = b;
> +  }
> +}
> +
> +/* { dg-final { scan-assembler "bne_s @.L2" } } */
> +/* { dg-final { scan-assembler-not "add.eq" } } */
> diff --git a/gcc/testsuite/gcc.target/arc/loop-4.c b/gcc/testsuite/gcc.target/arc/loop-4.c
> new file mode 100644
> index 0000000..99a93a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble } */
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +
> +void fn1(void *p1, int p2, int p3)
> +{
> +  char *d = p1;
> +  do
> +    *d++ = p2;
> +  while (--p3);
> +}
> +
> +/* { dg-final { scan-assembler "lp_count" } } */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/10] [ARC] Add ARCv2 core3 tune option.
  2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
@ 2017-11-27 23:35   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:35 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:53 +0100]:

> From: claziss <claziss@synopsys.com>
> 
> ARCv2 Core3 cpus are comming with dbnz support. Add this feature on
> the tune option.
> 
> gcc/
> 2017-09-14  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc-arch.h (arc_tune_attr): Add ARC_TUNE_CORE_3.
> 	* config/arc/arc.c (arc_sched_issue_rate): Use ARC_TUNE_... .
> 	(arc_init): Likewise.
> 	(arc_override_options): Likewise.
> 	(arc_file_start): Choose Tag_ARC_CPU_variation based on arc_tune
> 	value.
> 	(hwloop_fail): Use TARGET_DBNZ when we want to check for dbnz insn
> 	support.
> 	* config/arc/arc.h (TARGET_DBNZ): Define.
> 	* config/arc/arc.md (attr tune): Add core_3, use ARC_TUNE_... to
> 	properly set the tune attribute.
> 	(dbnz): Use TARGET_DBNZ guard.
> 	* config/arc/arc.opt (mtune): Add core3 option.

OK.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc-arch.h |  3 ++-
>  gcc/config/arc/arc.c      | 21 ++++++++++++---------
>  gcc/config/arc/arc.h      |  2 ++
>  gcc/config/arc/arc.md     | 22 ++++++++++++----------
>  gcc/config/arc/arc.opt    | 40 ++++++++++++++++++++++------------------
>  5 files changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/config/arc/arc-arch.h b/gcc/config/arc/arc-arch.h
> index 7c3f47c..38d2bcb 100644
> --- a/gcc/config/arc/arc-arch.h
> +++ b/gcc/config/arc/arc-arch.h
> @@ -75,7 +75,8 @@ enum arc_tune_attr
>      ARC_TUNE_ARC700_4_2_XMAC,
>      ARC_TUNE_ARCHS4X,
>      ARC_TUNE_ARCHS4XD,
> -    ARC_TUNE_ARCHS4XD_SLOW
> +    ARC_TUNE_ARCHS4XD_SLOW,
> +    ARC_TUNE_CORE_3
>    };
>  
>  /* CPU specific properties.  */
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 1479a8d..4d7a282 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -442,8 +442,8 @@ arc_sched_issue_rate (void)
>  {
>    switch (arc_tune)
>      {
> -    case TUNE_ARCHS4X:
> -    case TUNE_ARCHS4XD:
> +    case ARC_TUNE_ARCHS4X:
> +    case ARC_TUNE_ARCHS4XD:
>        return 3;
>      default:
>        break;
> @@ -866,21 +866,21 @@ arc_init (void)
>    if (arc_multcost < 0)
>      switch (arc_tune)
>        {
> -      case TUNE_ARC700_4_2_STD:
> +      case ARC_TUNE_ARC700_4_2_STD:
>  	/* latency 7;
>  	   max throughput (1 multiply + 4 other insns) / 5 cycles.  */
>  	arc_multcost = COSTS_N_INSNS (4);
>  	if (TARGET_NOMPY_SET)
>  	  arc_multcost = COSTS_N_INSNS (30);
>  	break;
> -      case TUNE_ARC700_4_2_XMAC:
> +      case ARC_TUNE_ARC700_4_2_XMAC:
>  	/* latency 5;
>  	   max throughput (1 multiply + 2 other insns) / 3 cycles.  */
>  	arc_multcost = COSTS_N_INSNS (3);
>  	if (TARGET_NOMPY_SET)
>  	  arc_multcost = COSTS_N_INSNS (30);
>  	break;
> -      case TUNE_ARC600:
> +      case ARC_TUNE_ARC600:
>  	if (TARGET_MUL64_SET)
>  	  {
>  	    arc_multcost = COSTS_N_INSNS (4);
> @@ -1196,8 +1196,8 @@ arc_override_options (void)
>  #undef ARC_OPT
>  
>    /* Set Tune option.  */
> -  if (arc_tune == TUNE_NONE)
> -    arc_tune = (enum attr_tune) arc_selected_cpu->tune;
> +  if (arc_tune == ARC_TUNE_NONE)
> +    arc_tune = (enum arc_tune_attr) arc_selected_cpu->tune;
>  
>    if (arc_size_opt_level == 3)
>      optimize_size = 1;
> @@ -5205,6 +5205,9 @@ static void arc_file_start (void)
>  	       TARGET_NO_SDATA_SET ? 0 : 2);
>    asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_ABI_exceptions, %d\n",
>  	       TARGET_OPTFPE ? 1 : 0);
> +  if (TARGET_V2)
> +    asm_fprintf (asm_out_file, "\t.arc_attribute Tag_ARC_CPU_variation, %d\n",
> +		 arc_tune == ARC_TUNE_CORE_3 ? 3 : 2);
>  }
>  
>  /* Implement `TARGET_ASM_FILE_END'.  */
> @@ -7389,11 +7392,11 @@ hwloop_fail (hwloop_info loop)
>    rtx test;
>    rtx insn = loop->loop_end;
>  
> -  if (TARGET_V2
> +  if (TARGET_DBNZ
>        && (loop->length && (loop->length <= ARC_MAX_LOOP_LENGTH))
>        && REG_P (loop->iter_reg))
>      {
> -      /* TARGET_V2 has dbnz instructions.  */
> +      /* TARGET_V2 core3 has dbnz instructions.  */
>        test = gen_dbnz (loop->iter_reg, loop->start_label);
>        insn = emit_jump_insn_before (test, loop->loop_end);
>      }
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index b5a8f84..8d90975 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1628,5 +1628,7 @@ enum
>  /* Custom FP instructions used by QuarkSE EM cpu.  */
>  #define TARGET_FPX_QUARK    (TARGET_EM && TARGET_SPFP		\
>  			     && (arc_fpu_build == FPX_QK))
> +/* DBNZ support is available for ARCv2 core3 cpus.  */
> +#define TARGET_DBNZ (TARGET_V2 && (arc_tune == ARC_TUNE_CORE_3))
>  
>  #endif /* GCC_ARC_H */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 6239483..b8fa44e 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -597,19 +597,21 @@
>  ;;   is made that makes conditional execution required.
>  
>  (define_attr "tune" "none, arc600, arc700_4_2_std, arc700_4_2_xmac, archs4x, \
> -archs4xd, archs4xd_slow"
> +archs4xd, archs4xd_slow, core_3"
>    (const
> -   (cond [(symbol_ref "arc_tune == TUNE_ARC600")
> +   (cond [(symbol_ref "arc_tune == ARC_TUNE_ARC600")
>  	  (const_string "arc600")
> -	  (symbol_ref "arc_tune == TUNE_ARC700_4_2_STD")
> +	  (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_STD")
>  	  (const_string "arc700_4_2_std")
> -	  (symbol_ref "arc_tune == TUNE_ARC700_4_2_XMAC")
> +	  (symbol_ref "arc_tune == ARC_TUNE_ARC700_4_2_XMAC")
>  	  (const_string "arc700_4_2_xmac")
> -	  (symbol_ref "arc_tune == TUNE_ARCHS4X")
> +	  (symbol_ref "arc_tune == ARC_TUNE_ARCHS4X")
>  	  (const_string "archs4x")
> -	  (ior (symbol_ref "arc_tune == TUNE_ARCHS4XD")
> -	       (symbol_ref "arc_tune == TUNE_ARCHS4XD_SLOW"))
> -	  (const_string "archs4xd")]
> +	  (ior (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD")
> +	       (symbol_ref "arc_tune == ARC_TUNE_ARCHS4XD_SLOW"))
> +	  (const_string "archs4xd")
> +	  (symbol_ref "arc_tune == ARC_TUNE_CORE_3")
> +	  (const_string "core_3")]
>  	 (const_string "none"))))
>  
>  (define_attr "tune_arc700" "false,true"
> @@ -5200,11 +5202,11 @@ archs4xd, archs4xd_slow"
>  	(plus:SI (match_dup 0)
>  		 (const_int -1)))
>     (clobber (match_scratch:SI 2 "=X,r"))]
> -  "TARGET_V2"
> +  "TARGET_DBNZ"
>    "@
>     dbnz%#\\t%0,%l1
>     #"
> -  "TARGET_V2 && reload_completed && memory_operand (operands[0], SImode)"
> +  "TARGET_DBNZ && reload_completed && memory_operand (operands[0], SImode)"
>    [(set (match_dup 2) (match_dup 0))
>     (set (match_dup 2) (plus:SI (match_dup 2) (const_int -1)))
>     (set (reg:CC CC_REG) (compare:CC (match_dup 2) (const_int 0)))
> diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
> index aacb599..6b0104a 100644
> --- a/gcc/config/arc/arc.opt
> +++ b/gcc/config/arc/arc.opt
> @@ -249,29 +249,33 @@ mmultcost=
>  Target RejectNegative Joined UInteger Var(arc_multcost) Init(-1)
>  Cost to assume for a multiply instruction, with 4 being equal to a normal insn.
>  
> -mtune=ARC600
> -Target RejectNegative Var(arc_tune, TUNE_ARC600)
> -Tune for ARC600 cpu.
> +mtune=
> +Target RejectNegative ToLower Joined Var(arc_tune) Enum(arc_tune_attr) Init(ARC_TUNE_NONE)
> +-mcpu=TUNE Tune code for given ARC variant.
>  
> -mtune=ARC601
> -Target RejectNegative Var(arc_tune, TUNE_ARC600)
> -Tune for ARC601 cpu.
> +Enum
> +Name(arc_tune_attr) Type(int)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(arc600) Value(ARC_TUNE_ARC600)
>  
> -mtune=ARC700
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_STD)
> -Tune for ARC700 R4.2 Cpu with standard multiplier block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc601) Value(ARC_TUNE_ARC600)
>  
> -mtune=ARC700-xmac
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc700) Value(ARC_TUNE_ARC700_4_2_STD)
>  
> -mtune=ARC725D
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc700-xmac) Value(ARC_TUNE_ARC700_4_2_XMAC)
>  
> -mtune=ARC750D
> -Target RejectNegative Var(arc_tune, TUNE_ARC700_4_2_XMAC)
> -Tune for ARC700 R4.2 Cpu with XMAC block.
> +EnumValue
> +Enum(arc_tune_attr) String(arc725d) Value(ARC_TUNE_ARC700_4_2_XMAC)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(arc750d) Value(ARC_TUNE_ARC700_4_2_XMAC)
> +
> +EnumValue
> +Enum(arc_tune_attr) String(core3) Value(ARC_TUNE_CORE_3)
>  
>  mindexed-loads
>  Target Var(TARGET_INDEXED_LOADS) Init(TARGET_INDEXED_LOADS_DEFAULT)
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/10] [ARC] Add trap instruction.
  2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
@ 2017-11-27 23:40   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-11-27 23:40 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:54 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> 2017-11-07  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.md (trap): New pattern.

Looks good.

Thanks,
Andrew

> ---
>  gcc/config/arc/arc.md | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index b8fa44e..42c6a23 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -4321,6 +4321,13 @@ archs4xd, archs4xd_slow, core_3"
>  ; use it for lack of inter-procedural branch shortening.
>  ; Link-time relaxation would help...
>  
> +(define_insn "trap"
> +  [(trap_if (const_int 1) (const_int 0))]
> +  "!TARGET_ARC600_FAMILY"
> +  "trap_s\\t5"
> +  [(set_attr "type" "misc")
> +   (set_attr "length" "2")])
> +
>  (define_insn "nop"
>    [(const_int 0)]
>    ""
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/10] [ARC] Update legitimate constant hook.
  2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
@ 2017-12-07 23:30   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:30 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:55 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> Make sure we check the constants in all cases.
> 
> gcc/
> 2017-10-14  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (arc_legitimate_constant_p): Always check all
> 	constants.
> 
> testsuite/
> 2017-10-14  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* gcc.target/arc/tls-1.c: New test.

Looks good.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.c                 |  6 ------
>  gcc/testsuite/gcc.target/arc/tls-1.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/tls-1.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 4d7a282..42ea921 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -6185,12 +6185,6 @@ arc_return_addr_rtx (int count, ATTRIBUTE_UNUSED rtx frame)
>  bool
>  arc_legitimate_constant_p (machine_mode mode, rtx x)
>  {
> -  if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x))
> -    return false;
> -
> -  if (!flag_pic && mode != Pmode)
> -    return true;
> -
>    switch (GET_CODE (x))
>      {
>      case CONST:
> diff --git a/gcc/testsuite/gcc.target/arc/tls-1.c b/gcc/testsuite/gcc.target/arc/tls-1.c
> new file mode 100644
> index 0000000..3f7a6d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/tls-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target tls } */
> +/* { dg-options "-O3 -std=gnu99" } */
> +
> +/* Check if addressing the `pos` member of struct is done via tls
> +   mechanism.  */
> +
> +struct callchain_cursor {
> +  int last;
> +  long long pos;
> +} __thread a;
> +void fn1(struct callchain_cursor *p1)
> +{
> +  p1->pos++;
> +}
> +
> +extern void fn3 (void);
> +
> +void fn2(void) {
> +  struct callchain_cursor *b = &a;
> +  while (1) {
> +    fn3();
> +    fn1(b);
> +  }
> +}
> +/* { dg-final { scan-assembler "r25,@a@tpoff" } } */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option.
  2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
@ 2017-12-07 23:32   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:32 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:56 +0100]:

> From: claziss <claziss@synopsys.com>
> 
> Track which regs are set fixed/call saved/call used from commnad line.
> Do not try to override their properties if user says otherwise.
> 
> gcc/
> 2017-06-08  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (overrideregs): New variable.
> 	(arc_override_options): Track fixed/call saved/call options.
> 	(arc_conditional_register_usage): Check against overrideregs
> 	variable whenever we change register properties.

Looks good.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.c | 60 +++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 42ea921..0eeeb42 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -79,6 +79,9 @@ typedef struct GTY (()) _arc_jli_section
>  
>  static arc_jli_section *arc_jli_sections = NULL;
>  
> +/* Track which regs are set fixed/call saved/call used from commnad line.  */
> +HARD_REG_SET overrideregs;
> +
>  /* Maximum size of a loop.  */
>  #define ARC_MAX_LOOP_LENGTH 4095
>  
> @@ -1144,6 +1147,30 @@ arc_override_options (void)
>  	  }
>        }
>  
> +  CLEAR_HARD_REG_SET (overrideregs);
> +  if (common_deferred_options)
> +    {
> +      vec<cl_deferred_option> v =
> +	*((vec<cl_deferred_option> *) common_deferred_options);
> +      int reg, nregs, j;
> +
> +      FOR_EACH_VEC_ELT (v, i, opt)
> +	{
> +	  switch (opt->opt_index)
> +	    {
> +	    case OPT_ffixed_:
> +	    case OPT_fcall_used_:
> +	    case OPT_fcall_saved_:
> +	      if ((reg = decode_reg_name_and_count (opt->arg, &nregs)) >= 0)
> +		for (j = reg;  j < reg + nregs; j++)
> +		  SET_HARD_REG_BIT (overrideregs, j);
> +	      break;
> +	    default:
> +	      break;
> +	    }
> +	}
> +    }
> +
>    /* Set cpu flags accordingly to architecture/selected cpu.  The cpu
>       specific flags are set in arc-common.c.  The architecture forces
>       the default hardware configurations in, regardless what command
> @@ -1673,14 +1700,20 @@ arc_conditional_register_usage (void)
>        /* For ARCv2 the core register set is changed.  */
>        strcpy (rname29, "ilink");
>        strcpy (rname30, "r30");
> -      call_used_regs[30] = 1;
> -      fixed_regs[30] = 0;
> -
> -      arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
> -      SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
> -      SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
> -      SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
> -      SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
> +
> +      if (!TEST_HARD_REG_BIT (overrideregs, 30))
> +	{
> +	  /* No user interference.  Set the r30 to be used by the
> +	     compiler.  */
> +	  call_used_regs[30] = 1;
> +	  fixed_regs[30] = 0;
> +
> +	  arc_regno_reg_class[30] = WRITABLE_CORE_REGS;
> +	  SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], 30);
> +	  SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], 30);
> +	  SET_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], 30);
> +	  SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], 30);
> +	}
>     }
>  
>    if (TARGET_MUL64_SET)
> @@ -1935,11 +1968,14 @@ arc_conditional_register_usage (void)
>      SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCL_REGNO);
>      SET_HARD_REG_BIT (reg_class_contents[MPY_WRITABLE_CORE_REGS], ACCH_REGNO);
>  
> -     /* Allow the compiler to freely use them.  */
> -    fixed_regs[ACCL_REGNO] = 0;
> -    fixed_regs[ACCH_REGNO] = 0;
> +    /* Allow the compiler to freely use them.  */
> +    if (!TEST_HARD_REG_BIT (overrideregs, ACCL_REGNO))
> +      fixed_regs[ACCL_REGNO] = 0;
> +    if (!TEST_HARD_REG_BIT (overrideregs, ACCH_REGNO))
> +      fixed_regs[ACCH_REGNO] = 0;
>  
> -    arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
> +    if (!fixed_regs[ACCH_REGNO] && !fixed_regs[ACCL_REGNO])
> +      arc_hard_regno_modes[ACC_REG_FIRST] = D_MODES;
>    }
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 09/10] [ARC] Update (u)maddsidi patterns.
  2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
@ 2017-12-07 23:35   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2017-12-07 23:35 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:58 +0100]:

> From: claziss <claziss@synopsys.com>
> 
> The accumulator registers are freely used by the compiler. However,
> there are a number of instructions which are having an intrinsic use
> of these registers. Update patterns to inform the compiler which ones.
> 
> gcc/
> 2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.md (maddsidi4, maddsidi4_split): Update pattern.
> 	(umaddsidi4,umaddsidi4): Likewise.
> 
> gcc/testsuite
> 2017-09-19  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* gcc.target/arc/tumaddsidi4.c: New test.

Looks good.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.md                      | 32 ++++++++++++++++++++++++++----
>  gcc/testsuite/gcc.target/arc/tumaddsidi4.c | 14 +++++++++++++
>  2 files changed, 42 insertions(+), 4 deletions(-)
>  create mode 100755 gcc/testsuite/gcc.target/arc/tumaddsidi4.c
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 42c6a23..155ee6c 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -6175,13 +6175,25 @@ archs4xd, archs4xd_slow, core_3"
>    [(set_attr "length" "0")])
>  
>  ;; MAC and DMPY instructions
> -(define_insn_and_split "maddsidi4"
> +(define_expand "maddsidi4"
> +  [(match_operand:DI 0 "register_operand" "")
> +   (match_operand:SI 1 "register_operand" "")
> +   (match_operand:SI 2 "extend_operand"   "")
> +   (match_operand:DI 3 "register_operand" "")]
> +  "TARGET_PLUS_DMPY"
> +  "{
> +   emit_insn (gen_maddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
> +   DONE;
> +  }")
> +
> +(define_insn_and_split "maddsidi4_split"
>    [(set (match_operand:DI 0 "register_operand" "=r")
>  	(plus:DI
>  	 (mult:DI
>  	  (sign_extend:DI (match_operand:SI 1 "register_operand" "%r"))
>  	  (sign_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
> -	 (match_operand:DI 3 "register_operand" "r")))]
> +	 (match_operand:DI 3 "register_operand" "r")))
> +   (clobber (reg:DI ARCV2_ACC))]
>    "TARGET_PLUS_DMPY"
>    "#"
>    "TARGET_PLUS_DMPY && reload_completed"
> @@ -6263,13 +6275,25 @@ archs4xd, archs4xd_slow, core_3"
>     (set_attr "predicable" "no")
>     (set_attr "cond" "nocond")])
>  
> -(define_insn_and_split "umaddsidi4"
> +(define_expand "umaddsidi4"
> +  [(match_operand:DI 0 "register_operand" "")
> +   (match_operand:SI 1 "register_operand" "")
> +   (match_operand:SI 2 "extend_operand"   "")
> +   (match_operand:DI 3 "register_operand" "")]
> +  "TARGET_PLUS_DMPY"
> +  "{
> +   emit_insn (gen_umaddsidi4_split (operands[0], operands[1], operands[2], operands[3]));
> +   DONE;
> +  }")
> +
> +(define_insn_and_split "umaddsidi4_split"
>    [(set (match_operand:DI 0 "register_operand" "=r")
>  	(plus:DI
>  	 (mult:DI
>  	  (zero_extend:DI (match_operand:SI 1 "register_operand" "%r"))
>  	  (zero_extend:DI (match_operand:SI 2 "extend_operand" "ri")))
> -	 (match_operand:DI 3 "register_operand" "r")))]
> +	 (match_operand:DI 3 "register_operand" "r")))
> +   (clobber (reg:DI ARCV2_ACC))]
>    "TARGET_PLUS_DMPY"
>    "#"
>    "TARGET_PLUS_DMPY && reload_completed"
> diff --git a/gcc/testsuite/gcc.target/arc/tumaddsidi4.c b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
> new file mode 100755
> index 0000000..40d2b33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/tumaddsidi4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=archs -O1 -mmpy-option=plus_dmpy" } */
> +
> +/* Check how we generate umaddsidi4 patterns.  */
> +long a;
> +long long b;
> +unsigned c, d;
> +
> +void fn1(void)
> +{
> +  b = d * (long long)c + a;
> +}
> +
> +/* { dg-final { scan-assembler "macu 0,r" } } */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/10] [ARC] Enable unaligned access.
  2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
@ 2018-01-02 12:05   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-02 12:05 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:57 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> Use munaligned-access to control if we can have unaligned accesses.  For ARC
> HS family unaligned access is always on.
> 
> 2017-10-19  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc-c.def (__ARC_UNALIGNED__): New define.
> 	* config/arc/arc.h (STRICT_ALIGNMENT): Control this macro using
> 	munaligned-access.

This looks fine,

Thanks,
Andrew


> ---
>  gcc/config/arc/arc-c.def | 1 +
>  gcc/config/arc/arc.h     | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
> index c9443c9..86eab4e 100644
> --- a/gcc/config/arc/arc-c.def
> +++ b/gcc/config/arc/arc-c.def
> @@ -29,6 +29,7 @@ ARC_C_DEF ("__ARC_MUL64__",	TARGET_MUL64_SET)
>  ARC_C_DEF ("__ARC_MUL32BY16__", TARGET_MULMAC_32BY16_SET)
>  ARC_C_DEF ("__ARC_SIMD__",	TARGET_SIMD_SET)
>  ARC_C_DEF ("__ARC_RF16__",	TARGET_RF16)
> +ARC_C_DEF ("__ARC_UNALIGNED__",	!STRICT_ALIGNMENT)
>  
>  ARC_C_DEF ("__ARC_BARREL_SHIFTER__", TARGET_BARREL_SHIFTER)
>  
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 8d90975..8c31fb2 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -288,7 +288,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT		\
>  /* On the ARC the lower address bits are masked to 0 as necessary.  The chip
>     won't croak when given an unaligned address, but the insn will still fail
>     to produce the correct result.  */
> -#define STRICT_ALIGNMENT 1
> +#define STRICT_ALIGNMENT (!unaligned_access && !TARGET_HS)
>  
>  /* Layout of source language data types.  */
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/10] [ARC] Revamp trampoline implementation.
  2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
@ 2018-01-02 12:16   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-02 12:16 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard, Claudiu Zissulescu

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2017-11-27 12:09:59 +0100]:

> From: Claudiu Zissulescu <claziss@gmail.com>
> 
> The new implementation attempts to clean up the existing trampoline
> implementation for ARC making it to work for linux type of systems.
> 
> gcc/
> 2017-11-10  Claudiu Zissulescu  <claziss@synopsys.com>
> 
> 	* config/arc/arc.c (TARGET_TRAMPOLINE_ADJUST_ADDRESS): Delete.
> 	(emit_store_direct): Likewise.
> 	(arc_trampoline_adjust_address): Likewise.
> 	(arc_asm_trampoline_template): New function.
> 	(arc_initialize_trampoline): Use asm_trampoline_template.
> 	(TARGET_ASM_TRAMPOLINE_TEMPLATE): Define.
> 	* config/arc/arc.h (TRAMPOLINE_SIZE): Adjust to 16.
> 	*config/arc/arc.md (flush_icache): Delete pattern.

         ^-- Missing space here.

Otherwise, looks fine.

Thanks,
Andrew



> ---
>  gcc/config/arc/arc.c  | 89 +++++++++++++++++++++++++--------------------------
>  gcc/config/arc/arc.h  |  2 +-
>  gcc/config/arc/arc.md |  9 ------
>  3 files changed, 44 insertions(+), 56 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 0eeeb42..053f3c2 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -588,8 +588,6 @@ static void arc_finalize_pic (void);
>  
>  #define TARGET_TRAMPOLINE_INIT arc_initialize_trampoline
>  
> -#define TARGET_TRAMPOLINE_ADJUST_ADDRESS arc_trampoline_adjust_address
> -
>  #define TARGET_CAN_ELIMINATE arc_can_eliminate
>  
>  #define TARGET_FRAME_POINTER_REQUIRED arc_frame_pointer_required
> @@ -3727,69 +3725,65 @@ output_shift (rtx *operands)
>  \f
>  /* Nested function support.  */
>  
> -/* Directly store VALUE into memory object BLOCK at OFFSET.  */
> -
> -static void
> -emit_store_direct (rtx block, int offset, int value)
> -{
> -  emit_insn (gen_store_direct (adjust_address (block, SImode, offset),
> -			       force_reg (SImode,
> -					  gen_int_mode (value, SImode))));
> -}
> +/* Output assembler code for a block containing the constant parts of
> +   a trampoline, leaving space for variable parts.
>  
> -/* Emit RTL insns to initialize the variable parts of a trampoline.
> -   FNADDR is an RTX for the address of the function's pure code.
> -   CXT is an RTX for the static chain value for the function.  */
> -/* With potentially multiple shared objects loaded, and multiple stacks
> -   present for multiple thereds where trampolines might reside, a simple
> -   range check will likely not suffice for the profiler to tell if a callee
> -   is a trampoline.  We a speedier check by making the trampoline start at
> -   an address that is not 4-byte aligned.
>     A trampoline looks like this:
>  
> -   nop_s	     0x78e0
> -entry:
>     ld_s r12,[pcl,12] 0xd403
>     ld   r11,[pcl,12] 0x170c 700b
>     j_s [r12]         0x7c00
> -   nop_s	     0x78e0
> +   .word function's address
> +   .word static chain value
> +
> +*/
> +
> +static void
> +arc_asm_trampoline_template (FILE *f)
> +{
> +  asm_fprintf (f, "\tld_s\t%s,[pcl,8]\n", ARC_TEMP_SCRATCH_REG);
> +  asm_fprintf (f, "\tld\t%s,[pcl,12]\n", reg_names[STATIC_CHAIN_REGNUM]);
> +  asm_fprintf (f, "\tj_s\t[%s]\n", ARC_TEMP_SCRATCH_REG);
> +  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> +  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> +}
> +
> +/* Emit RTL insns to initialize the variable parts of a trampoline.
> +   FNADDR is an RTX for the address of the function's pure code.  CXT
> +   is an RTX for the static chain value for the function.
>  
>     The fastest trampoline to execute for trampolines within +-8KB of CTX
>     would be:
> +
>     add2 r11,pcl,s12
>     j [limm]           0x20200f80 limm
> -   and that would also be faster to write to the stack by computing the offset
> -   from CTX to TRAMP at compile time.  However, it would really be better to
> -   get rid of the high cost of cache invalidation when generating trampolines,
> -   which requires that the code part of trampolines stays constant, and
> -   additionally either
> -   - making sure that no executable code but trampolines is on the stack,
> -     no icache entries linger for the area of the stack from when before the
> -     stack was allocated, and allocating trampolines in trampoline-only
> -     cache lines
> -  or
> -   - allocate trampolines fram a special pool of pre-allocated trampolines.  */
> +
> +   and that would also be faster to write to the stack by computing
> +   the offset from CTX to TRAMP at compile time.  However, it would
> +   really be better to get rid of the high cost of cache invalidation
> +   when generating trampolines, which requires that the code part of
> +   trampolines stays constant, and additionally either making sure
> +   that no executable code but trampolines is on the stack, no icache
> +   entries linger for the area of the stack from when before the stack
> +   was allocated, and allocating trampolines in trampoline-only cache
> +   lines or allocate trampolines fram a special pool of pre-allocated
> +   trampolines.  */
>  
>  static void
>  arc_initialize_trampoline (rtx tramp, tree fndecl, rtx cxt)
>  {
>    rtx fnaddr = XEXP (DECL_RTL (fndecl), 0);
>  
> -  emit_store_direct (tramp, 0, TARGET_BIG_ENDIAN ? 0x78e0d403 : 0xd40378e0);
> -  emit_store_direct (tramp, 4, TARGET_BIG_ENDIAN ? 0x170c700b : 0x700b170c);
> -  emit_store_direct (tramp, 8, TARGET_BIG_ENDIAN ? 0x7c0078e0 : 0x78e07c00);
> -  emit_move_insn (adjust_address (tramp, SImode, 12), fnaddr);
> -  emit_move_insn (adjust_address (tramp, SImode, 16), cxt);
> -  emit_insn (gen_flush_icache (adjust_address (tramp, SImode, 0)));
> -}
> +  emit_block_move (tramp, assemble_trampoline_template (),
> +		   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
>  
> -/* Allow the profiler to easily distinguish trampolines from normal
> -  functions.  */
> +  emit_move_insn (adjust_address (tramp, SImode, 8), fnaddr);
> +  emit_move_insn (adjust_address (tramp, SImode, 12), cxt);
>  
> -static rtx
> -arc_trampoline_adjust_address (rtx addr)
> -{
> -  return plus_constant (Pmode, addr, 2);
> +  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
> +		     LCT_NORMAL, VOIDmode, 2, XEXP (tramp, 0), Pmode,
> +		     plus_constant (Pmode, XEXP (tramp, 0), TRAMPOLINE_SIZE),
> +		     Pmode);
>  }
>  
>  /* Add the given function declaration to emit code in JLI section.  */
> @@ -11412,6 +11406,9 @@ arc_cannot_substitute_mem_equiv_p (rtx)
>  #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
>  #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P arc_cannot_substitute_mem_equiv_p
>  
> +#undef TARGET_ASM_TRAMPOLINE_TEMPLATE
> +#define TARGET_ASM_TRAMPOLINE_TEMPLATE arc_asm_trampoline_template
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  #include "gt-arc.h"
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 8c31fb2..317a653 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -829,7 +829,7 @@ extern int arc_initial_elimination_offset(int from, int to);
>  /* Trampolines.  */
>  
>  /* Length in units of the trampoline for entering a nested function.  */
> -#define TRAMPOLINE_SIZE 20
> +#define TRAMPOLINE_SIZE 16
>  
>  /* Alignment required for a trampoline in bits .  */
>  /* For actual data alignment we just need 32, no more than the stack;
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 155ee6c..e1418a9 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -4345,15 +4345,6 @@ archs4xd, archs4xd_slow, core_3"
>     (set_attr "iscompact" "true")
>     (set_attr "length" "2")])
>  
> -;; Special pattern to flush the icache.
> -;; ??? Not sure what to do here.  Some ARC's are known to support this.
> -
> -(define_insn "flush_icache"
> -  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")] 0)]
> -  ""
> -  "* return \"\";"
> -  [(set_attr "type" "misc")])
> -
>  ;; Split up troublesome insns for better scheduling.
>  
>  ;; Peepholes go at the end.
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 00/10][ARC] Critical fixes
  2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
                   ` (9 preceding siblings ...)
  2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
@ 2018-01-08 15:23 ` Claudiu Zissulescu
  2018-01-16 10:20   ` Andrew Burgess
  10 siblings, 1 reply; 23+ messages in thread
From: Claudiu Zissulescu @ 2018-01-08 15:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: Francois.Bedard, andrew.burgess

>   [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
>   [ARC] Don't allow the last ZOL insn to be in a delay slot.
>   [ARC] Add trap instruction.
>   [ARC] Update legitimate constant hook.
>   [ARC] Enable unaligned access.
>   [ARC] Revamp trampoline implementation.
>   [ARC][ZOL] Update uses for hw-loop labels.
>   [ARC] Add ARCv2 core3 tune option.
>   [ARC][FIX] Consider command line ffixed- option.
>   [ARC] Update (u)maddsidi patterns.

Hi Andrew,

Thank you for reviewing this batch of fixes. Any chance to check also these ones, they are hanging there for a long time now:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html

Thank you,
Claudiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10][ARC] Critical fixes
  2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
@ 2018-01-16 10:20   ` Andrew Burgess
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Burgess @ 2018-01-16 10:20 UTC (permalink / raw)
  To: Claudiu Zissulescu; +Cc: gcc-patches, Francois.Bedard

* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2018-01-08 15:18:30 +0000]:

> >   [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
> >   [ARC] Don't allow the last ZOL insn to be in a delay slot.
> >   [ARC] Add trap instruction.
> >   [ARC] Update legitimate constant hook.
> >   [ARC] Enable unaligned access.
> >   [ARC] Revamp trampoline implementation.
> >   [ARC][ZOL] Update uses for hw-loop labels.
> >   [ARC] Add ARCv2 core3 tune option.
> >   [ARC][FIX] Consider command line ffixed- option.
> >   [ARC] Update (u)maddsidi patterns.
> 
> Hi Andrew,
> 
> Thank you for reviewing this batch of fixes. Any chance to check also these ones, they are hanging there for a long time now:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html

Sorry for missing these, they somehow didn't make it onto my todo
list.

I'll review these over the next couple of days.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-01-16 10:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 11:15 [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2017-11-27 11:14 ` [PATCH 02/10] [ARC][ZOL] Update uses for hw-loop labels Claudiu Zissulescu
2017-11-27 23:29   ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 09/10] [ARC] Update (u)maddsidi patterns Claudiu Zissulescu
2017-12-07 23:35   ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 03/10] [ARC] Don't allow the last ZOL insn to be in a delay slot Claudiu Zissulescu
2017-11-27 23:32   ` Andrew Burgess
2017-11-27 11:14 ` [PATCH 05/10] [ARC] Add trap instruction Claudiu Zissulescu
2017-11-27 23:40   ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 04/10] [ARC] Add ARCv2 core3 tune option Claudiu Zissulescu
2017-11-27 23:35   ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 06/10] [ARC] Update legitimate constant hook Claudiu Zissulescu
2017-12-07 23:30   ` Andrew Burgess
2017-11-27 11:15 ` [PATCH 08/10] [ARC] Enable unaligned access Claudiu Zissulescu
2018-01-02 12:05   ` Andrew Burgess
2017-11-27 11:16 ` [PATCH 10/10] [ARC] Revamp trampoline implementation Claudiu Zissulescu
2018-01-02 12:16   ` Andrew Burgess
2017-11-27 11:57 ` [PATCH 01/10] [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV Claudiu Zissulescu
2017-11-27 23:27   ` Andrew Burgess
2017-11-27 12:25 ` [PATCH 07/10] [ARC][FIX] Consider command line ffixed- option Claudiu Zissulescu
2017-12-07 23:32   ` Andrew Burgess
2018-01-08 15:23 ` [PATCH 00/10][ARC] Critical fixes Claudiu Zissulescu
2018-01-16 10:20   ` Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).