public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/7]  ira/lra: Support subreg coalesce
@ 2023-11-08  3:47 Lehua Ding
  2023-11-08  3:47 ` [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general Lehua Ding
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):

```
#include <riscv_vector.h>

void
foo (int32_t *in, int32_t *out, size_t m)
{
  vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
  for (size_t i = 0; i < m; i++)
    {
      v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
      v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
    }
  *(vint32m1_t*)(out+4*0) = v0;
  *(vint32m1_t*)(out+4*1) = v1;
}
```

Before these patchs:

```
foo:
	li	a5,32
	vsetvli	zero,a5,e32,m2,ta,ma
	vle32.v	v4,0(a0)
	vmv1r.v	v2,v4
	vmv1r.v	v1,v5
	beq	a2,zero,.L2
	li	a5,0
	vsetivli	zero,4,e32,m1,ta,ma
.L3:
	addi	a5,a5,1
	vadd.vv	v2,v2,v2
	vmul.vv	v1,v1,v1
	bne	a2,a5,.L3
.L2:
	vs1r.v	v2,0(a1)
	addi	a1,a1,16
	vs1r.v	v1,0(a1)
	ret
```

After these patchs:

```
foo:
	li	a5,32
	vsetvli	zero,a5,e32,m2,ta,ma
	vle32.v	v2,0(a0)
	beq	a2,zero,.L2
	li	a5,0
	vsetivli	zero,4,e32,m1,ta,ma
.L3:
	addi	a5,a5,1
	vadd.vv	v2,v2,v2
	vmul.vv	v3,v3,v3
	bne	a2,a5,.L3
.L2:
	vs1r.v	v2,0(a1)
	addi	a1,a1,16
	vs1r.v	v3,0(a1)
	ret
```

As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.

If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.

```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole}
     (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole}
     (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
        (nil)))
```

ira dump:

;; a1(r136,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;;
;; ...
      Popping a1(r135,l0)  --         assign reg 97
      Popping a3(r136,l0)  --         assign reg 98
      Popping a4(r137,l0)  --         assign reg 15
      Popping a5(r140,l0)  --         assign reg 12
      Popping a10(r145,l0)  --         assign reg 12
      Popping a2(r139,l0)  --         assign reg 11
      Popping a9(r144,l0)  --         assign reg 11
      Popping a0(r142,l0)  --         assign reg 11
      Popping a6(r134,l0)  --         assign reg 100
      Popping a7(r143,l0)  --         assign reg 10
      Popping a8(r141,l0)  --         assign reg 15

The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):

```
#include <arm_sve.h>

int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out)
{
  svint64x4_t result = svld4_s64 (pg, base);
  svint64_t v0 = svget4_s64(result, 0);
  svint64_t v1 = svget4_s64(result, 1);
  svint64_t v2 = svget4_s64(result, 2);
  svint64_t v3 = svget4_s64(result, 3);

  for (int i = 0; i < n; i += 1)
    {
        svint64_t v18 = svld1_s64(pg, in1);
        svint64_t v19 = svld1_s64(pg, in2);
        v0 = svmad_s64_z(pg, v0, v18, v19);
        v1 = svmad_s64_z(pg, v1, v18, v19);
        v2 = svmad_s64_z(pg, v2, v18, v19);
        v3 = svmad_s64_z(pg, v3, v18, v19);
    }
  svst1_s64(pg, out+0,v0);
  svst1_s64(pg, out+1,v1);
  svst1_s64(pg, out+2,v2);
  svst1_s64(pg, out+3,v3);
}
```

Before these patchs:

```
bar:
	ld4d	{z4.d - z7.d}, p0/z, [x0]
	mov	z26.d, z4.d
	mov	z27.d, z5.d
	mov	z28.d, z6.d
	mov	z29.d, z7.d
	cmp	w1, 0
	...
```

After these patchs:

```
bar:
	ld4d	{z28.d - z31.d}, p0/z, [x0]
	cmp	w1, 0
	...
```

Lehua Ding (7):
  ira: Refactor the handling of register conflicts to make it more
    general
  ira: Add live_subreg problem and apply to ira pass
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Apply live_subreg df_problem to lra pass
  lra: Support subreg live range track and conflict detect

 gcc/Makefile.in          |   1 +
 gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
 gcc/df.h                 |  93 +++-
 gcc/hard-reg-set.h       |  33 ++
 gcc/ira-build.cc         | 458 ++++++++++++++++----
 gcc/ira-color.cc         | 851 ++++++++++++++++++++++++++-----------
 gcc/ira-conflicts.cc     | 221 +++++++---
 gcc/ira-emit.cc          |  24 +-
 gcc/ira-int.h            |  67 ++-
 gcc/ira-lives.cc         | 527 +++++++++++++++++------
 gcc/ira.cc               |  77 ++--
 gcc/lra-assigns.cc       | 111 ++++-
 gcc/lra-coalesce.cc      |  20 +-
 gcc/lra-constraints.cc   | 111 +++--
 gcc/lra-int.h            |  33 ++
 gcc/lra-lives.cc         | 661 ++++++++++++++++++++++++-----
 gcc/lra-remat.cc         |  13 +-
 gcc/lra-spills.cc        |  22 +-
 gcc/lra.cc               | 139 +++++-
 gcc/reginfo.cc           |  14 +
 gcc/rtl.h                |  14 +
 gcc/subreg-live-range.cc | 649 ++++++++++++++++++++++++++++
 gcc/subreg-live-range.h  | 343 +++++++++++++++
 gcc/timevar.def          |   1 +
 24 files changed, 4564 insertions(+), 808 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  7:57   ` Richard Biener
  2023-11-08  3:47 ` [PATCH 2/7] ira: Add live_subreg problem and apply to ira pass Lehua Ding
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch does not make any functional changes. It mainly refactor two parts:

1. The ira_allocno's objects field is expanded to an scalable array, and multi-word
   pseduo registers are split and tracked only when necessary.
2. Since the objects array has been expanded, there will be more subreg objects
   that pass through later, rather than the previous fixed two. Therefore, it
   is necessary to modify the detection of whether two objects conflict, and
   the check method is to pull back the registers occupied by the object to
   the first register of the allocno for judgment.

gcc/ChangeLog:

	* hard-reg-set.h (struct HARD_REG_SET): Add operator>>.
	* ira-build.cc (init_object_start_and_nregs): New func.
	(find_object): Ditto.
	(ira_create_allocno): Adjust.
	(ira_set_allocno_class): Set subreg info.
	(ira_create_allocno_objects): Adjust.
	(init_regs_with_subreg): Collect access in subreg.
	(ira_build): Call init_regs_with_subreg
	(ira_destroy): Clear regs_with_subreg
	* ira-color.cc (setup_profitable_hard_regs): Adjust.
	(get_conflict_and_start_profitable_regs): Adjust.
	(check_hard_reg_p): Adjust.
	(assign_hard_reg): Adjust.
	(improve_allocation): Adjust.
	* ira-int.h (struct ira_object): Adjust fields.
	(struct ira_allocno): Adjust objects filed.
	(ALLOCNO_NUM_OBJECTS): Adjust.
	(ALLOCNO_UNIT_SIZE): New.
	(ALLOCNO_TRACK_SUBREG_P): New.
	(ALLOCNO_NREGS): New.
	(OBJECT_SIZE): New.
	(OBJECT_OFFSET): New.
	(OBJECT_START): New.
	(OBJECT_NREGS): New.
	(find_object): New.
	(has_subreg_object_p): New.
	(get_full_object): New.
	* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h |  33 +++++++
 gcc/ira-build.cc   | 106 +++++++++++++++++++-
 gcc/ira-color.cc   | 234 ++++++++++++++++++++++++++++++---------------
 gcc/ira-int.h      |  45 ++++++++-
 gcc/ira.cc         |  52 ++++------
 5 files changed, 349 insertions(+), 121 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
     return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+    if (shift_amount == 0)
+      return *this;
+
+    HARD_REG_SET res;
+    unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+    if (shift_amount >= total_bits)
+      {
+	unsigned int n_elt = shift_amount % total_bits;
+	shift_amount -= n_elt * total_bits;
+	for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+	  res.elts[i] = elts[i + n_elt];
+	/* clear upper n_elt elements.  */
+	for (unsigned int i = 0; i < n_elt; i += 1)
+	  res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+      }
+
+    if (shift_amount > 0)
+      {
+	/* The left bits of an element be shifted.  */
+	HARD_REG_ELT_TYPE left = 0;
+	/* Total bits of an element.  */
+	for (int i = ARRAY_SIZE (elts); i >= 0; --i)
+	  {
+	    res.elts[i] = (elts[i] >> shift_amount) | left;
+	    left = elts[i] << (total_bits - shift_amount);
+	  }
+      }
+    return res;
+  }
+
   HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
 };
 typedef const HARD_REG_SET &const_hard_reg_set;
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..07aba27c1c9 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -440,6 +440,40 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
+/* Update OBJ's start and nregs field according A and OBJ info.  */
+static void
+init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+  if (ALLOCNO_TRACK_SUBREG_P (a))
+    {
+      poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
+      for (int i = 0; i < nregs; i += 1)
+	{
+	  poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
+	  if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
+	    {
+	      OBJECT_START (obj) = i;
+	    }
+	  if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
+	    {
+	      OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
+	      break;
+	    }
+	}
+      gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
+    }
+  else
+    {
+      OBJECT_START (obj) = 0;
+      OBJECT_NREGS (obj) = nregs;
+    }
+}
+
 /* Create and return an object corresponding to a new allocno A.  */
 static ira_object_t
 ira_create_object (ira_allocno_t a, int subword)
@@ -460,15 +494,36 @@ ira_create_object (ira_allocno_t a, int subword)
   OBJECT_MIN (obj) = INT_MAX;
   OBJECT_MAX (obj) = -1;
   OBJECT_LIVE_RANGES (obj) = NULL;
+  OBJECT_SIZE (obj) = UNITS_PER_WORD;
+  OBJECT_OFFSET (obj) = subword * UNITS_PER_WORD;
+  OBJECT_START (obj) = -1;
+  OBJECT_NREGS (obj) = -1;
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
     = ira_object_id_map_vec.address ();
   ira_objects_num = ira_object_id_map_vec.length ();
 
+  if (aclass != NO_REGS)
+    init_object_start_and_nregs (a, obj);
+
+  a->objects.push_back (obj);
+
   return obj;
 }
 
+/* Return the object in allocno A which match START & NREGS.  */
+ira_object_t
+find_object (ira_allocno_t a, int start, int nregs)
+{
+  for (ira_object_t obj : a->objects)
+    {
+      if (OBJECT_START (obj) == start && OBJECT_NREGS (obj) == nregs)
+	return obj;
+    }
+  return NULL;
+}
+
 /* Create and return the allocno corresponding to REGNO in
    LOOP_TREE_NODE.  Add the allocno to the list of allocnos with the
    same regno if CAP_P is FALSE.  */
@@ -525,7 +580,8 @@ ira_create_allocno (int regno, bool cap_p,
   ALLOCNO_MEMORY_COST (a) = 0;
   ALLOCNO_UPDATED_MEMORY_COST (a) = 0;
   ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) = 0;
-  ALLOCNO_NUM_OBJECTS (a) = 0;
+  ALLOCNO_UNIT_SIZE (a) = 0;
+  ALLOCNO_TRACK_SUBREG_P (a) = false;
 
   ALLOCNO_ADD_DATA (a) = NULL;
   allocno_vec.safe_push (a);
@@ -535,6 +591,9 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Record the regs referenced by subreg.  */
+static bitmap_head regs_with_subreg;
+
 /* Set up register class for A and update its conflict hard
    registers.  */
 void
@@ -549,6 +608,19 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
       OBJECT_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
       OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
     }
+
+  if (aclass == NO_REGS)
+    return;
+  /* SET the unit_size of one register.  */
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD)
+      && bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
+    {
+      ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+      ALLOCNO_TRACK_SUBREG_P (a) = true;
+      return;
+    }
 }
 
 /* Determine the number of objects we should associate with allocno A
@@ -561,12 +633,12 @@ ira_create_allocno_objects (ira_allocno_t a)
   int n = ira_reg_class_max_nregs[aclass][mode];
   int i;
 
-  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD))
+  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD)
+      || !bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
     n = 1;
 
-  ALLOCNO_NUM_OBJECTS (a) = n;
   for (i = 0; i < n; i++)
-    ALLOCNO_OBJECT (a, i) = ira_create_object (a, i);
+    ira_create_object (a, i);
 }
 
 /* For each allocno, set ALLOCNO_NUM_OBJECTS and create the
@@ -3460,6 +3532,30 @@ update_conflict_hard_reg_costs (void)
     }
 }
 
+/* Traverse all instructions to determine which ones have access through subreg.
+ */
+static void
+init_regs_with_subreg ()
+{
+  bitmap_initialize (&regs_with_subreg, &reg_obstack);
+  basic_block bb;
+  rtx_insn *insn;
+  df_ref def, use;
+  FOR_ALL_BB_FN (bb, cfun)
+    FOR_BB_INSNS (bb, insn)
+      {
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+	df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+	FOR_EACH_INSN_INFO_DEF (def, insn_info)
+	  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_SUBREG))
+	    bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (def));
+	FOR_EACH_INSN_INFO_USE (use, insn_info)
+	  if (DF_REF_FLAGS (use) & (DF_REF_PARTIAL | DF_REF_SUBREG))
+	    bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (use));
+      }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
    loop tree nodes).  The function returns TRUE if we generate loop
    structure (besides nodes representing all function and the basic
@@ -3475,6 +3571,7 @@ ira_build (void)
   initiate_allocnos ();
   initiate_prefs ();
   initiate_copies ();
+  init_regs_with_subreg ();
   create_loop_tree_nodes ();
   form_loop_tree ();
   create_allocnos ();
@@ -3565,4 +3662,5 @@ ira_destroy (void)
   finish_allocnos ();
   finish_cost_vectors ();
   ira_finish_allocno_live_ranges ();
+  bitmap_clear (&regs_with_subreg);
 }
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..6af8318e5f5 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -1031,7 +1031,7 @@ static void
 setup_profitable_hard_regs (void)
 {
   unsigned int i;
-  int j, k, nobj, hard_regno, nregs, class_size;
+  int j, k, nobj, hard_regno, class_size;
   ira_allocno_t a;
   bitmap_iterator bi;
   enum reg_class aclass;
@@ -1076,7 +1076,6 @@ setup_profitable_hard_regs (void)
 	  || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
 	continue;
       mode = ALLOCNO_MODE (a);
-      nregs = hard_regno_nregs (hard_regno, mode);
       nobj = ALLOCNO_NUM_OBJECTS (a);
       for (k = 0; k < nobj; k++)
 	{
@@ -1088,24 +1087,39 @@ setup_profitable_hard_regs (void)
 	    {
 	      ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
 
-	      /* We can process the conflict allocno repeatedly with
-		 the same result.  */
-	      if (nregs == nobj && nregs > 1)
+	      if (!has_subreg_object_p (a))
 		{
-		  int num = OBJECT_SUBWORD (conflict_obj);
-		  
-		  if (REG_WORDS_BIG_ENDIAN)
-		    CLEAR_HARD_REG_BIT
-		      (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
-		       hard_regno + nobj - num - 1);
-		  else
-		    CLEAR_HARD_REG_BIT
-		      (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
-		       hard_regno + num);
+		  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
+		    &= ~ira_reg_mode_hard_regset[hard_regno][mode];
+		  continue;
+		}
+
+	      /* Clear all hard regs occupied by obj.  */
+	      if (REG_WORDS_BIG_ENDIAN)
+		{
+		  int start_regno
+		    = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj);
+		  for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
+		    {
+		      int regno = start_regno - i;
+		      if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			CLEAR_HARD_REG_BIT (
+			  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
+			  regno);
+		    }
 		}
 	      else
-		ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
-		  &= ~ira_reg_mode_hard_regset[hard_regno][mode];
+		{
+		  int start_regno = hard_regno + OBJECT_START (obj);
+		  for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
+		    {
+		      int regno = start_regno + i;
+		      if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			CLEAR_HARD_REG_BIT (
+			  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
+			  regno);
+		    }
+		}
 	    }
 	}
     }
@@ -1677,18 +1691,25 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
    aligned.  */
 static inline void
 get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
-					HARD_REG_SET *conflict_regs,
+					HARD_REG_SET *start_conflict_regs,
 					HARD_REG_SET *start_profitable_regs)
 {
   int i, nwords;
   ira_object_t obj;
 
   nwords = ALLOCNO_NUM_OBJECTS (a);
-  for (i = 0; i < nwords; i++)
-    {
-      obj = ALLOCNO_OBJECT (a, i);
-      conflict_regs[i] = OBJECT_TOTAL_CONFLICT_HARD_REGS (obj);
-    }
+  CLEAR_HARD_REG_SET (*start_conflict_regs);
+  if (has_subreg_object_p (a))
+    for (i = 0; i < nwords; i++)
+      {
+	obj = ALLOCNO_OBJECT (a, i);
+	for (int j = 0; j < OBJECT_NREGS (obj); j += 1)
+	  *start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
+				  >> (OBJECT_START (obj) + j);
+      }
+  else
+    *start_conflict_regs
+      = OBJECT_TOTAL_CONFLICT_HARD_REGS (get_full_object (a));
   if (retry_p)
     *start_profitable_regs
       = (reg_class_contents[ALLOCNO_CLASS (a)]
@@ -1702,9 +1723,9 @@ get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
    PROFITABLE_REGS and whose objects have CONFLICT_REGS.  */
 static inline bool
 check_hard_reg_p (ira_allocno_t a, int hard_regno,
-		  HARD_REG_SET *conflict_regs, HARD_REG_SET profitable_regs)
+		  HARD_REG_SET start_conflict_regs,
+		  HARD_REG_SET profitable_regs)
 {
-  int j, nwords, nregs;
   enum reg_class aclass;
   machine_mode mode;
 
@@ -1716,28 +1737,17 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
   /* Checking only profitable hard regs.  */
   if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
     return false;
-  nregs = hard_regno_nregs (hard_regno, mode);
-  nwords = ALLOCNO_NUM_OBJECTS (a);
-  for (j = 0; j < nregs; j++)
+
+  if (has_subreg_object_p (a))
+    return !TEST_HARD_REG_BIT (start_conflict_regs, hard_regno);
+  else
     {
-      int k;
-      int set_to_test_start = 0, set_to_test_end = nwords;
-      
-      if (nregs == nwords)
-	{
-	  if (REG_WORDS_BIG_ENDIAN)
-	    set_to_test_start = nwords - j - 1;
-	  else
-	    set_to_test_start = j;
-	  set_to_test_end = set_to_test_start + 1;
-	}
-      for (k = set_to_test_start; k < set_to_test_end; k++)
-	if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j))
-	  break;
-      if (k != set_to_test_end)
-	break;
+      int nregs = hard_regno_nregs (hard_regno, mode);
+      for (int i = 0; i < nregs; i += 1)
+	if (TEST_HARD_REG_BIT (start_conflict_regs, hard_regno + i))
+	  return false;
+      return true;
     }
-  return j == nregs;
 }
 
 /* Return number of registers needed to be saved and restored at
@@ -1945,7 +1955,7 @@ spill_soft_conflicts (ira_allocno_t a, bitmap allocnos_to_spill,
 static bool
 assign_hard_reg (ira_allocno_t a, bool retry_p)
 {
-  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
+  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
   int i, j, hard_regno, best_hard_regno, class_size;
   int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word;
   int *a_costs;
@@ -1962,8 +1972,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
   HARD_REG_SET soft_conflict_regs = {};
 
   ira_assert (! ALLOCNO_ASSIGNED_P (a));
-  get_conflict_and_start_profitable_regs (a, retry_p,
-					  conflicting_regs,
+  get_conflict_and_start_profitable_regs (a, retry_p, &start_conflicting_regs,
 					  &profitable_hard_regs);
   aclass = ALLOCNO_CLASS (a);
   class_size = ira_class_hard_regs_num[aclass];
@@ -2041,7 +2050,6 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 		      (hard_regno, ALLOCNO_MODE (conflict_a),
 		       reg_class_contents[aclass])))
 		{
-		  int n_objects = ALLOCNO_NUM_OBJECTS (conflict_a);
 		  int conflict_nregs;
 
 		  mode = ALLOCNO_MODE (conflict_a);
@@ -2076,24 +2084,95 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 			    note_conflict (r);
 			}
 		    }
+		  else if (has_subreg_object_p (a))
+		    {
+		      /* Set start_conflicting_regs if that cause obj and
+			 conflict_obj overlap. the overlap position:
+					   +--------------+
+					   | conflict_obj |
+					   +--------------+
+
+			       +-----------+              +-----------+
+			       |   obj     |     ...      |   obj     |
+			       +-----------+              +-----------+
+
+			Point: A                  B       C
+
+			the hard regs from A to C point will cause overlap.
+			For REG_WORDS_BIG_ENDIAN:
+			   A = hard_regno + ALLOCNO_NREGS (conflict_a) - 1
+			       - OBJECT_START (conflict_obj)
+			       - OBJECT_NREGS (obj) + 1
+			   C = A + OBJECT_NREGS (obj)
+			       + OBJECT_NREGS (conflict_obj) - 2
+			For !REG_WORDS_BIG_ENDIAN:
+			   A = hard_regno + OBJECT_START (conflict_obj)
+			       - OBJECT_NREGS (obj) + 1
+			   C = A + OBJECT_NREGS (obj)
+			       + OBJECT_NREGS (conflict_obj) - 2
+			 */
+		      int start_regno;
+		      int conflict_allocno_nregs, conflict_object_nregs,
+			conflict_object_start;
+		      if (has_subreg_object_p (conflict_a))
+			{
+			  conflict_allocno_nregs = ALLOCNO_NREGS (conflict_a);
+			  conflict_object_nregs = OBJECT_NREGS (conflict_obj);
+			  conflict_object_start = OBJECT_START (conflict_obj);
+			}
+		      else
+			{
+			  conflict_allocno_nregs = conflict_object_nregs
+			    = hard_regno_nregs (hard_regno, mode);
+			  conflict_object_start = 0;
+			}
+		      if (REG_WORDS_BIG_ENDIAN)
+			{
+			  int A = hard_regno + conflict_allocno_nregs - 1
+				  - conflict_object_start - OBJECT_NREGS (obj)
+				  + 1;
+			  start_regno = A + OBJECT_NREGS (obj) - 1
+					+ OBJECT_START (obj) - ALLOCNO_NREGS (a)
+					+ 1;
+			}
+		      else
+			{
+			  int A = hard_regno + conflict_object_start
+				  - OBJECT_NREGS (obj) + 1;
+			  start_regno = A - OBJECT_START (obj);
+			}
+
+		      for (int i = 0;
+			   i <= OBJECT_NREGS (obj) + conflict_object_nregs - 2;
+			   i += 1)
+			{
+			  int regno = start_regno + i;
+			  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			    SET_HARD_REG_BIT (start_conflicting_regs, regno);
+			}
+		      if (hard_reg_set_subset_p (profitable_hard_regs,
+						 start_conflicting_regs))
+			goto fail;
+		    }
 		  else
 		    {
-		      if (conflict_nregs == n_objects && conflict_nregs > 1)
+		      if (has_subreg_object_p (conflict_a))
 			{
-			  int num = OBJECT_SUBWORD (conflict_obj);
-
-			  if (REG_WORDS_BIG_ENDIAN)
-			    SET_HARD_REG_BIT (conflicting_regs[word],
-					      hard_regno + n_objects - num - 1);
-			  else
-			    SET_HARD_REG_BIT (conflicting_regs[word],
-					      hard_regno + num);
+			  int start_hard_regno
+			    = REG_WORDS_BIG_ENDIAN
+				? hard_regno + ALLOCNO_NREGS (conflict_a)
+				    - OBJECT_START (conflict_obj)
+				: hard_regno + OBJECT_START (conflict_obj);
+			  for (int i = 0; i < OBJECT_NREGS (conflict_obj);
+			       i += 1)
+			    SET_HARD_REG_BIT (start_conflicting_regs,
+					      start_hard_regno + i);
 			}
 		      else
-			conflicting_regs[word]
+			start_conflicting_regs
 			  |= ira_reg_mode_hard_regset[hard_regno][mode];
 		      if (hard_reg_set_subset_p (profitable_hard_regs,
-						 conflicting_regs[word]))
+						 start_conflicting_regs))
 			goto fail;
 		    }
 		}
@@ -2160,8 +2239,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 	  && FIRST_STACK_REG <= hard_regno && hard_regno <= LAST_STACK_REG)
 	continue;
 #endif
-      if (! check_hard_reg_p (a, hard_regno,
-			      conflicting_regs, profitable_hard_regs))
+      if (!check_hard_reg_p (a, hard_regno, start_conflicting_regs,
+			     profitable_hard_regs))
 	continue;
       cost = costs[i];
       full_cost = full_costs[i];
@@ -3154,7 +3233,7 @@ improve_allocation (void)
   machine_mode mode;
   int *allocno_costs;
   int costs[FIRST_PSEUDO_REGISTER];
-  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
+  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
   ira_allocno_t a;
   bitmap_iterator bi;
   int saved_nregs;
@@ -3193,7 +3272,7 @@ improve_allocation (void)
 		     - allocno_copy_cost_saving (a, hregno));
       try_p = false;
       get_conflict_and_start_profitable_regs (a, false,
-					      conflicting_regs,
+					      &start_conflicting_regs,
 					      &profitable_hard_regs);
       class_size = ira_class_hard_regs_num[aclass];
       mode = ALLOCNO_MODE (a);
@@ -3202,8 +3281,8 @@ improve_allocation (void)
       for (j = 0; j < class_size; j++)
 	{
 	  hregno = ira_class_hard_regs[aclass][j];
-	  if (! check_hard_reg_p (a, hregno,
-				  conflicting_regs, profitable_hard_regs))
+	  if (!check_hard_reg_p (a, hregno, start_conflicting_regs,
+				 profitable_hard_regs))
 	    continue;
 	  ira_assert (ira_class_hard_reg_index[aclass][hregno] == j);
 	  k = allocno_costs == NULL ? 0 : j;
@@ -3287,16 +3366,15 @@ improve_allocation (void)
 		}
 	      conflict_nregs = hard_regno_nregs (conflict_hregno,
 						 ALLOCNO_MODE (conflict_a));
-	      auto note_conflict = [&](int r)
-		{
-		  if (check_hard_reg_p (a, r,
-					conflicting_regs, profitable_hard_regs))
-		    {
-		      if (spill_a)
-			SET_HARD_REG_BIT (soft_conflict_regs, r);
-		      costs[r] += spill_cost;
-		    }
-		};
+	      auto note_conflict = [&] (int r) {
+		if (check_hard_reg_p (a, r, start_conflicting_regs,
+				      profitable_hard_regs))
+		  {
+		    if (spill_a)
+		      SET_HARD_REG_BIT (soft_conflict_regs, r);
+		    costs[r] += spill_cost;
+		  }
+	      };
 	      for (r = conflict_hregno;
 		   r >= 0 && (int) end_hard_regno (mode, r) > conflict_hregno;
 		   r--)
@@ -3314,8 +3392,8 @@ improve_allocation (void)
       for (j = 0; j < class_size; j++)
 	{
 	  hregno = ira_class_hard_regs[aclass][j];
-	  if (check_hard_reg_p (a, hregno,
-				conflicting_regs, profitable_hard_regs)
+	  if (check_hard_reg_p (a, hregno, start_conflicting_regs,
+				profitable_hard_regs)
 	      && min_cost > costs[hregno])
 	    {
 	      best = hregno;
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 0685e1f4e8d..b6281d3df6d 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "recog.h"
 #include "function-abi.h"
+#include <vector>
 
 /* To provide consistency in naming, all IRA external variables,
    functions, common typedefs start with prefix ira_.  */
@@ -240,6 +241,13 @@ struct ira_object
      Zero means the lowest-order subword (or the entire allocno in case
      it is not being tracked in subwords).  */
   int subword;
+  /* Reprensent OBJECT occupied [start, start + nregs) registers of it's
+     ALLOCNO.  */
+  int start, nregs;
+  /* Reprensent the size and offset of current object, use to track subreg
+     range, For full reg, the size is GET_MODE_SIZE (ALLOCNO_MODE (allocno)),
+     offset is 0.  */
+  poly_int64 size, offset;
   /* Allocated size of the conflicts array.  */
   unsigned int conflicts_array_size;
   /* A unique number for every instance of this structure, which is used
@@ -295,6 +303,11 @@ struct ira_allocno
      reload (at this point pseudo-register has only one allocno) which
      did not get stack slot yet.  */
   signed int hard_regno : 16;
+  /* Unit size of one register that allocate for the allocno. Only use to
+     compute the start and nregs of subreg which be tracked.  */
+  poly_int64 unit_size;
+  /* Flag means need track subreg live range for the allocno.  */
+  bool track_subreg_p;
   /* A bitmask of the ABIs used by calls that occur while the allocno
      is live.  */
   unsigned int crossed_calls_abis : NUM_ABI_IDS;
@@ -353,8 +366,6 @@ struct ira_allocno
      register class living at the point than number of hard-registers
      of the class available for the allocation.  */
   int excess_pressure_points_num;
-  /* The number of objects tracked in the following array.  */
-  int num_objects;
   /* Accumulated frequency of calls which given allocno
      intersects.  */
   int call_freq;
@@ -387,8 +398,8 @@ struct ira_allocno
   /* An array of structures describing conflict information and live
      ranges for each object associated with the allocno.  There may be
      more than one such object in cases where the allocno represents a
-     multi-word register.  */
-  ira_object_t objects[2];
+     multi-hardreg pesudo.  */
+  std::vector<ira_object_t> objects;
   /* Registers clobbered by intersected calls.  */
    HARD_REG_SET crossed_calls_clobbered_regs;
   /* Array of usage costs (accumulated and the one updated during
@@ -468,8 +479,12 @@ struct ira_allocno
 #define ALLOCNO_EXCESS_PRESSURE_POINTS_NUM(A) \
   ((A)->excess_pressure_points_num)
 #define ALLOCNO_OBJECT(A,N) ((A)->objects[N])
-#define ALLOCNO_NUM_OBJECTS(A) ((A)->num_objects)
+#define ALLOCNO_NUM_OBJECTS(A) ((int) (A)->objects.size ())
 #define ALLOCNO_ADD_DATA(A) ((A)->add_data)
+#define ALLOCNO_UNIT_SIZE(A) ((A)->unit_size)
+#define ALLOCNO_TRACK_SUBREG_P(A) ((A)->track_subreg_p)
+#define ALLOCNO_NREGS(A)                                                       \
+  (ira_reg_class_max_nregs[ALLOCNO_CLASS (A)][ALLOCNO_MODE (A)])
 
 /* Typedef for pointer to the subsequent structure.  */
 typedef struct ira_emit_data *ira_emit_data_t;
@@ -511,6 +526,8 @@ allocno_emit_reg (ira_allocno_t a)
 }
 
 #define OBJECT_ALLOCNO(O) ((O)->allocno)
+#define OBJECT_SIZE(O) ((O)->size)
+#define OBJECT_OFFSET(O) ((O)->offset)
 #define OBJECT_SUBWORD(O) ((O)->subword)
 #define OBJECT_CONFLICT_ARRAY(O) ((O)->conflicts_array)
 #define OBJECT_CONFLICT_VEC(O) ((ira_object_t *)(O)->conflicts_array)
@@ -524,6 +541,8 @@ allocno_emit_reg (ira_allocno_t a)
 #define OBJECT_MAX(O) ((O)->max)
 #define OBJECT_CONFLICT_ID(O) ((O)->id)
 #define OBJECT_LIVE_RANGES(O) ((O)->live_ranges)
+#define OBJECT_START(O) ((O)->start)
+#define OBJECT_NREGS(O) ((O)->nregs)
 
 /* Map regno -> allocnos with given regno (see comments for
    allocno member `next_regno_allocno').  */
@@ -1041,6 +1060,8 @@ extern void ira_free_cost_vector (int *, reg_class_t);
 extern void ira_flattening (int, int);
 extern bool ira_build (void);
 extern void ira_destroy (void);
+extern ira_object_t
+find_object (ira_allocno_t, int, int);
 
 /* ira-costs.cc */
 extern void ira_init_costs_once (void);
@@ -1708,4 +1729,18 @@ ira_caller_save_loop_spill_p (ira_allocno_t a, ira_allocno_t subloop_a,
   return call_cost && call_cost >= spill_cost;
 }
 
+/* Return true if allocno A has subreg object.  */
+inline bool
+has_subreg_object_p (ira_allocno_t a)
+{
+  return ALLOCNO_NUM_OBJECTS (a) > 1;
+}
+
+/* Return the full object of allocno A.  */
+inline ira_object_t
+get_full_object (ira_allocno_t a)
+{
+  return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 #endif /* GCC_IRA_INT_H */
diff --git a/gcc/ira.cc b/gcc/ira.cc
index d7530f01380..2fa6e0e5c94 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2623,7 +2623,7 @@ static void
 check_allocation (void)
 {
   ira_allocno_t a;
-  int hard_regno, nregs, conflict_nregs;
+  int hard_regno;
   ira_allocno_iterator ai;
 
   FOR_EACH_ALLOCNO (a, ai)
@@ -2634,28 +2634,18 @@ check_allocation (void)
       if (ALLOCNO_CAP_MEMBER (a) != NULL
 	  || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
 	continue;
-      nregs = hard_regno_nregs (hard_regno, ALLOCNO_MODE (a));
-      if (nregs == 1)
-	/* We allocated a single hard register.  */
-	n = 1;
-      else if (n > 1)
-	/* We allocated multiple hard registers, and we will test
-	   conflicts in a granularity of single hard regs.  */
-	nregs = 1;
 
       for (i = 0; i < n; i++)
 	{
 	  ira_object_t obj = ALLOCNO_OBJECT (a, i);
 	  ira_object_t conflict_obj;
 	  ira_object_conflict_iterator oci;
-	  int this_regno = hard_regno;
-	  if (n > 1)
-	    {
-	      if (REG_WORDS_BIG_ENDIAN)
-		this_regno += n - i - 1;
-	      else
-		this_regno += i;
-	    }
+	  int this_regno;
+	  if (REG_WORDS_BIG_ENDIAN)
+	    this_regno = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj)
+			 - OBJECT_NREGS (obj) + 1;
+	  else
+	    this_regno = hard_regno + OBJECT_START (obj);
 	  FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	    {
 	      ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
@@ -2665,24 +2655,18 @@ check_allocation (void)
 	      if (ira_soft_conflict (a, conflict_a))
 		continue;
 
-	      conflict_nregs = hard_regno_nregs (conflict_hard_regno,
-						 ALLOCNO_MODE (conflict_a));
-
-	      if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1
-		  && conflict_nregs == ALLOCNO_NUM_OBJECTS (conflict_a))
-		{
-		  if (REG_WORDS_BIG_ENDIAN)
-		    conflict_hard_regno += (ALLOCNO_NUM_OBJECTS (conflict_a)
-					    - OBJECT_SUBWORD (conflict_obj) - 1);
-		  else
-		    conflict_hard_regno += OBJECT_SUBWORD (conflict_obj);
-		  conflict_nregs = 1;
-		}
+	      if (REG_WORDS_BIG_ENDIAN)
+		conflict_hard_regno = conflict_hard_regno
+				      + ALLOCNO_NREGS (conflict_a) - 1
+				      - OBJECT_START (conflict_obj)
+				      - OBJECT_NREGS (conflict_obj) + 1;
+	      else
+		conflict_hard_regno
+		  = conflict_hard_regno + OBJECT_START (conflict_obj);
 
-	      if ((conflict_hard_regno <= this_regno
-		 && this_regno < conflict_hard_regno + conflict_nregs)
-		|| (this_regno <= conflict_hard_regno
-		    && conflict_hard_regno < this_regno + nregs))
+	      if (!(this_regno + OBJECT_NREGS (obj) <= conflict_hard_regno
+		    || conflict_hard_regno + OBJECT_NREGS (conflict_obj)
+			 <= this_regno))
 		{
 		  fprintf (stderr, "bad allocation for %d and %d\n",
 			   ALLOCNO_REGNO (a), ALLOCNO_REGNO (conflict_a));
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 2/7] ira: Add live_subreg problem and apply to ira pass
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
  2023-11-08  3:47 ` [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:47 ` [PATCH 3/7] ira: Support subreg live range track Lehua Ding
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch adds a live_subreg problem to extend the original live_reg to
track the lifecycle of subreg. At the same time, this old live data is
replaced by the new live data in ira pass.

gcc/ChangeLog:

	* Makefile.in: Add subreg-live-range.o
	* df-problems.cc (struct df_live_subreg_problem_data): New df problem.
	(need_track_subreg): helper function.
	(get_range): helper function.
	(remove_subreg_range): helper function.
	(add_subreg_range): helper function.
	(df_live_subreg_free_bb_info): df function.
	(df_live_subreg_alloc): Ditto.
	(df_live_subreg_reset): Ditto.
	(df_live_subreg_bb_local_compute): Ditto.
	(df_live_subreg_local_compute): Ditto.
	(df_live_subreg_init): Ditto.
	(df_live_subreg_check_result): Ditto.
	(df_live_subreg_confluence_0): Ditto.
	(df_live_subreg_confluence_n): Ditto.
	(df_live_subreg_transfer_function): Ditto.
	(df_live_subreg_finalize): Ditto.
	(df_live_subreg_free): Ditto.
	(df_live_subreg_top_dump): Ditto.
	(df_live_subreg_bottom_dump): Ditto.
	(df_live_subreg_add_problem): Ditto.
	* df.h (enum df_problem_id): New df problem.
	(DF_LIVE_SUBREG_INFO): New macro.
	(DF_LIVE_SUBREG_IN): Ditto.
	(DF_LIVE_SUBREG_OUT): Ditto.
	(DF_LIVE_SUBREG_FULL_IN): Ditto.
	(DF_LIVE_SUBREG_FULL_OUT): Ditto.
	(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
	(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
	(DF_LIVE_SUBREG_RANGE_IN): Ditto.
	(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
	(class subregs_live): New class.
	(class basic_block_subreg_live_info): New class.
	(class df_live_subreg_bb_info): New class.
	(df_live_subreg): New function.
	(df_live_subreg_add_problem): Ditto.
	(df_live_subreg_finalize): Ditto.
	(class subreg_range): New class.
	(need_track_subreg): Exported.
	(remove_subreg_range): Exported.
	(add_subreg_range): Exported.
	(df_live_subreg_get_bb_info): Exported.
	* ira-build.cc (create_bb_allocnos): Use new live data.
	(create_loop_allocnos): Ditto.
	* ira-color.cc (ira_loop_edge_freq): Ditto.
	* ira-emit.cc (generate_edge_moves): Ditto.
	(add_ranges_and_copies): Ditto.
	* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
	(process_bb_node_lives): Ditto.
	* ira.cc (find_moveable_pseudos): Ditto.
	(interesting_dest_for_shprep_1): Ditto.
	(allocate_initial_values): Ditto.
	(ira): Ditto.
	* reginfo.cc (get_nblocks_slow): Helper function.
	* rtl.h (get_nblocks_slow): Helper function.
	(get_nblocks): Helper function.
	* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
	* subreg-live-range.cc: New file.
	* subreg-live-range.h: New file.

---
 gcc/Makefile.in          |   1 +
 gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
 gcc/df.h                 |  93 +++-
 gcc/ira-build.cc         |  14 +-
 gcc/ira-color.cc         |   8 +-
 gcc/ira-emit.cc          |  12 +-
 gcc/ira-lives.cc         |   7 +-
 gcc/ira.cc               |  20 +-
 gcc/reginfo.cc           |  14 +
 gcc/rtl.h                |  14 +
 gcc/subreg-live-range.cc | 649 ++++++++++++++++++++++++++++
 gcc/subreg-live-range.h  | 326 ++++++++++++++
 gcc/timevar.def          |   1 +
 13 files changed, 2008 insertions(+), 40 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 29cec21c825..e4403b5a30c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1675,6 +1675,7 @@ OBJS = \
 	store-motion.o \
 	streamer-hooks.o \
 	stringpool.o \
+        subreg-live-range.o \
 	substring-locations.o \
 	target-globals.o \
 	targhooks.o \
diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f..2585c762fd1 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "df.h"
+#include "subreg-live-range.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
@@ -1344,8 +1345,894 @@ df_lr_verify_transfer_functions (void)
   bitmap_clear (&all_blocks);
 }
 
+/*----------------------------------------------------------------------------
+   REGISTER AND SUBREG LIVES
+   Like DF_RL, but fine-grained tracking of subreg lifecycle.
+   ----------------------------------------------------------------------------*/
+
+/* Private data used to verify the solution for this problem.  */
+struct df_live_subreg_problem_data
+{
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack live_subreg_bitmaps;
+  bool has_subreg_live_p;
+};
+
+/* Helper functions */
+
+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+	 && multiple_p (total_size, natural_size)
+	 && regno >= FIRST_PSEUDO_REGISTER;
+}
+
+/* Return subreg_range of REF.  */
+static subreg_range
+get_range (df_ref ref)
+{
+  rtx reg = DF_REF_REAL_REG (ref);
+  machine_mode reg_mode = GET_MODE (reg);
+
+  if (!read_modify_subreg_p (DF_REF_REG (ref)))
+    return subreg_range (0, get_nblocks (reg_mode));
+
+  rtx subreg = DF_REF_REG (ref);
+  machine_mode subreg_mode = GET_MODE (subreg);
+  poly_int64 offset = SUBREG_BYTE (subreg);
+  int nblocks = get_nblocks (reg_mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+  poly_int64 subreg_size = GET_MODE_SIZE (subreg_mode);
+  poly_int64 left = offset + subreg_size;
+
+  int subreg_start = -1;
+  int subreg_nblocks = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nblocks < 0 && maybe_le (left, right))
+	{
+	  subreg_nblocks = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nblocks > 0);
+
+  return subreg_range (subreg_start, subreg_start + subreg_nblocks);
+}
+
+/* Remove REF from BB_INFO use.  */
+void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		     machine_mode mode, const subreg_range &range)
+{
+  int max = get_nblocks (mode);
+  bitmap full = &bb_info->full_use;
+  bitmap partial = &bb_info->partial_use;
+  subregs_live *range_live = bb_info->range_use;
+
+  if (!range.full_p (max))
+    {
+      if (bitmap_bit_p (full, regno))
+	{
+	  bitmap_clear_bit (full, regno);
+	  gcc_assert (!bitmap_bit_p (partial, regno));
+	  gcc_assert (range_live->empty_p (regno));
+	  subreg_ranges temp = subreg_ranges (max);
+	  temp.make_full ();
+	  temp.remove_range (max, range);
+	  range_live->add_ranges (regno, temp);
+	  bitmap_set_bit (partial, regno);
+	  return;
+	}
+      else if (bitmap_bit_p (partial, regno))
+	{
+	  range_live->remove_range (regno, max, range);
+	  if (range_live->empty_p (regno))
+	    bitmap_clear_bit (partial, regno);
+	}
+    }
+  else if (bitmap_bit_p (full, regno))
+    {
+      bitmap_clear_bit (full, regno);
+      gcc_assert (!bitmap_bit_p (partial, regno));
+    }
+  else if (bitmap_bit_p (partial, regno))
+    {
+      bitmap_clear_bit (partial, regno);
+      range_live->remove_live (regno);
+    }
+}
+
+/* Return true if ref is a tracked subreg access.  */
+bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref)
+{
+  unsigned int regno = DF_REF_REGNO (ref);
+  machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+  bool subreg_p = read_modify_subreg_p (DF_REF_REG (ref));
+
+  if (need_track_subreg (regno, mode))
+    {
+      remove_subreg_range (bb_info, regno, mode, get_range (ref));
+      return subreg_p;
+    }
+  else
+    {
+      bitmap_clear_bit (&bb_info->full_use, regno);
+      gcc_assert (!bitmap_bit_p (&bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&bb_info->partial_def, regno));
+      return false;
+    }
+}
+
+/* add REF to BB_INFO def/use.  */
+void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		  machine_mode mode, const subreg_range &range, bool is_def)
+{
+  int max = get_nblocks (mode);
+  bitmap full = is_def ? &bb_info->full_def : &bb_info->full_use;
+  bitmap partial = is_def ? &bb_info->partial_def : &bb_info->partial_use;
+  subregs_live *range_live = is_def ? bb_info->range_def : bb_info->range_use;
+
+  if (!range.full_p (max))
+    {
+      if (bitmap_bit_p (full, regno))
+	return;
+      range_live->add_range (regno, max, range);
+      if (range_live->full_p (regno))
+	{
+	  bitmap_set_bit (full, regno);
+	  gcc_assert (bitmap_bit_p (partial, regno));
+	  bitmap_clear_bit (partial, regno);
+	  range_live->remove_live (regno);
+	}
+      else if (!bitmap_bit_p (partial, regno))
+	bitmap_set_bit (partial, regno);
+    }
+  else if (!bitmap_bit_p (full, regno))
+    {
+      bitmap_set_bit (full, regno);
+      if (bitmap_bit_p (partial, regno))
+	{
+	  bitmap_clear_bit (partial, regno);
+	  range_live->remove_live (regno);
+	}
+    }
+}
+
+/* Return true if ref is a tracked subreg access.  */
+bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+		  bool is_def)
+{
+  unsigned int regno = DF_REF_REGNO (ref);
+  machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+  bool subreg_p = read_modify_subreg_p (DF_REF_REG (ref));
+
+  if (need_track_subreg (regno, mode))
+    {
+      add_subreg_range (bb_info, regno, mode, get_range (ref), is_def);
+      return subreg_p;
+    }
+  else
+    {
+      bitmap full = is_def ? &bb_info->full_def : &bb_info->full_use;
+      bitmap partial = is_def ? &bb_info->partial_def : &bb_info->partial_use;
+      bitmap_set_bit (full, regno);
+      gcc_assert (!bitmap_bit_p (partial, regno));
+
+      if (is_def && DF_REF_FLAGS (ref) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+	add_subreg_range (bb_info, ref, false);
+      return false;
+    }
+}
+
+/* Free basic block info.  */
+
+static void
+df_live_subreg_free_bb_info (basic_block bb ATTRIBUTE_UNUSED, void *vbb_info)
+{
+  df_live_subreg_bb_info *bb_info = (df_live_subreg_bb_info *) vbb_info;
+  if (bb_info)
+    {
+      delete bb_info->range_def;
+      bb_info->range_def = NULL;
+      delete bb_info->range_use;
+      bb_info->range_use = NULL;
+      delete bb_info->range_in;
+      bb_info->range_in = NULL;
+      delete bb_info->range_out;
+      bb_info->range_out = NULL;
+
+      bitmap_clear (&bb_info->full_use);
+      bitmap_clear (&bb_info->partial_use);
+      bitmap_clear (&bb_info->full_def);
+      bitmap_clear (&bb_info->partial_def);
+      bitmap_clear (&bb_info->all_in);
+      bitmap_clear (&bb_info->full_in);
+      bitmap_clear (&bb_info->partial_in);
+      bitmap_clear (&bb_info->all_out);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+    }
+}
+
+/* Allocate or reset bitmaps for DF_LIVE_SUBREG blocks. The solution bits are
+   not touched unless the block is new.  */
+
+static void
+df_live_subreg_alloc (bitmap all_blocks ATTRIBUTE_UNUSED)
+{
+  struct df_live_subreg_problem_data *problem_data;
+  df_grow_bb_info (df_live_subreg);
+  if (df_live_subreg->problem_data)
+    problem_data
+      = (struct df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  else
+    {
+      problem_data = XNEW (struct df_live_subreg_problem_data);
+      df_live_subreg->problem_data = problem_data;
+
+      bitmap_obstack_initialize (&problem_data->live_subreg_bitmaps);
+      problem_data->has_subreg_live_p = false;
+    }
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, bb->index);
+
+  bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, ENTRY_BLOCK);
+  bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, EXIT_BLOCK);
+
+  unsigned int bb_index;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (df_live_subreg->out_of_date_transfer_functions, 0,
+			    bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+
+      /* When bitmaps are already initialized, just clear them.  */
+      if (bb_info->full_use.obstack)
+	{
+	  bitmap_clear (&bb_info->full_def);
+	  bitmap_clear (&bb_info->partial_def);
+	  bitmap_clear (&bb_info->full_use);
+	  bitmap_clear (&bb_info->partial_use);
+	  bitmap_clear (&bb_info->all_in);
+	  bitmap_clear (&bb_info->full_in);
+	  bitmap_clear (&bb_info->partial_in);
+	  bitmap_clear (&bb_info->all_out);
+	  bitmap_clear (&bb_info->full_out);
+	  bitmap_clear (&bb_info->partial_out);
+	}
+      else
+	{
+	  bitmap_initialize (&bb_info->full_def,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_def,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_use,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_use,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->all_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->all_out,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_out,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_out,
+			     &problem_data->live_subreg_bitmaps);
+	}
+
+      if (bb_info->range_def)
+	{
+	  bb_info->range_def->clear ();
+	  bb_info->range_use->clear ();
+	  bb_info->range_in->clear ();
+	  bb_info->range_out->clear ();
+	}
+      else
+	{
+	  bb_info->range_def = new subregs_live ();
+	  bb_info->range_use = new subregs_live ();
+	  bb_info->range_in = new subregs_live ();
+	  bb_info->range_out = new subregs_live ();
+	}
+    }
+  df_live_subreg->optional_p = true;
+}
+
+/* Reset the global solution for recalculation.  */
+
+static void
+df_live_subreg_reset (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+      gcc_assert (bb_info);
+      bitmap_clear (&bb_info->all_in);
+      bitmap_clear (&bb_info->full_in);
+      bitmap_clear (&bb_info->partial_in);
+      bitmap_clear (&bb_info->all_out);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+      bb_info->range_in->clear ();
+      bb_info->range_out->clear ();
+    }
+}
+
+/* Compute local live register info for basic block BB.  */
+
+static void
+df_live_subreg_bb_local_compute (unsigned int bb_index)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bb_index);
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  rtx_insn *insn;
+  df_ref def, use;
+
+  /* Process the registers set in an exception handler.  */
+  FOR_EACH_ARTIFICIAL_DEF (def, bb_index)
+    if ((DF_REF_FLAGS (def) & DF_REF_AT_TOP) == 0)
+      {
+	problem_data->has_subreg_live_p
+	  |= add_subreg_range (bb_info, def, true);
+	problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+      }
+
+  /* Process the hardware registers that are always live.  */
+  FOR_EACH_ARTIFICIAL_USE (use, bb_index)
+    /* Add use to set of uses in this BB.  */
+    if ((DF_REF_FLAGS (use) & DF_REF_AT_TOP) == 0)
+      problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+
+  FOR_BB_INSNS_REVERSE (bb, insn)
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+      FOR_EACH_INSN_INFO_DEF (def, insn_info)
+	{
+	  problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+	  problem_data->has_subreg_live_p
+	    |= add_subreg_range (bb_info, def, true);
+	}
+
+      FOR_EACH_INSN_INFO_USE (use, insn_info)
+	{
+	  unsigned int regno = DF_REF_REGNO (use);
+	  machine_mode mode = GET_MODE (DF_REF_REAL_REG (use));
+	  /* Ignore the use of SET_DEST which is (subreg (reg) offset).  */
+	  if (need_track_subreg (regno, mode)
+	      && DF_REF_FLAGS (use) & (DF_REF_READ_WRITE | DF_REF_SUBREG))
+	    continue;
+	  problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+	}
+    }
+
+  /* Process the registers set in an exception handler or the hard
+     frame pointer if this block is the target of a non local
+     goto.  */
+  FOR_EACH_ARTIFICIAL_DEF (def, bb_index)
+    if (DF_REF_FLAGS (def) & DF_REF_AT_TOP)
+      {
+	problem_data->has_subreg_live_p
+	  |= add_subreg_range (bb_info, def, true);
+	problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+      }
+
+#ifdef EH_USES
+  /* Process the uses that are live into an exception handler.  */
+  FOR_EACH_ARTIFICIAL_USE (use, bb_index)
+    /* Add use to set of uses in this BB.  */
+    if (DF_REF_FLAGS (use) & DF_REF_AT_TOP)
+      problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+#endif
+}
+
+/* Compute local live register info for each basic block within BLOCKS.  */
+
+static void
+df_live_subreg_local_compute (bitmap all_blocks ATTRIBUTE_UNUSED)
+{
+  unsigned int bb_index, i;
+  bitmap_iterator bi;
+
+  bitmap_clear (&df->hardware_regs_used);
+
+  /* The all-important stack pointer must always be live.  */
+  bitmap_set_bit (&df->hardware_regs_used, STACK_POINTER_REGNUM);
+
+  /* Global regs are always live, too.  */
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (global_regs[i])
+      bitmap_set_bit (&df->hardware_regs_used, i);
+
+  /* Before reload, there are a few registers that must be forced
+     live everywhere -- which might not already be the case for
+     blocks within infinite loops.  */
+  if (!reload_completed)
+    {
+      unsigned int pic_offset_table_regnum = PIC_OFFSET_TABLE_REGNUM;
+      /* Any reference to any pseudo before reload is a potential
+	 reference of the frame pointer.  */
+      bitmap_set_bit (&df->hardware_regs_used, FRAME_POINTER_REGNUM);
+
+      /* Pseudos with argument area equivalences may require
+	 reloading via the argument pointer.  */
+      if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
+	  && fixed_regs[ARG_POINTER_REGNUM])
+	bitmap_set_bit (&df->hardware_regs_used, ARG_POINTER_REGNUM);
+
+      /* Any constant, or pseudo with constant equivalences, may
+	 require reloading from memory using the pic register.  */
+      if (pic_offset_table_regnum != INVALID_REGNUM
+	  && fixed_regs[pic_offset_table_regnum])
+	bitmap_set_bit (&df->hardware_regs_used, pic_offset_table_regnum);
+    }
+
+  EXECUTE_IF_SET_IN_BITMAP (df_live_subreg->out_of_date_transfer_functions, 0,
+			    bb_index, bi)
+    {
+      if (bb_index == EXIT_BLOCK)
+	{
+	  /* The exit block is special for this problem and its bits are
+	     computed from thin air.  */
+	  class df_live_subreg_bb_info *bb_info
+	    = df_live_subreg_get_bb_info (EXIT_BLOCK);
+	  bitmap_copy (&bb_info->full_use, df->exit_block_uses);
+	}
+      else
+	df_live_subreg_bb_local_compute (bb_index);
+    }
+
+  bitmap_clear (df_live_subreg->out_of_date_transfer_functions);
+}
+
+/* Initialize the solution vectors.  */
+
+static void
+df_live_subreg_init (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+      bitmap_copy (&bb_info->full_in, &bb_info->full_use);
+      bitmap_copy (&bb_info->partial_in, &bb_info->partial_use);
+      bb_info->range_in->copy_lives (*bb_info->range_use);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+      bb_info->range_out->clear ();
+    }
+}
+
+/* Check the result is golden.  */
+static void
+df_live_subreg_check_result (bitmap full, bitmap partial,
+			     subregs_live *partial_live)
+{
+  unsigned int regno;
+  bitmap_iterator bi;
+  gcc_assert (!bitmap_intersect_p (full, partial));
+  EXECUTE_IF_SET_IN_BITMAP (full, 0, regno, bi)
+    gcc_assert (partial_live->empty_p (regno));
+  EXECUTE_IF_SET_IN_BITMAP (partial, 0, regno, bi)
+    gcc_assert (!partial_live->empty_p (regno));
+}
+
+/* Confluence function that processes infinite loops.  This might be a
+   noreturn function that throws.  And even if it isn't, getting the
+   unwind info right helps debugging.  */
+static void
+df_live_subreg_confluence_0 (basic_block bb)
+{
+  bitmap full_out = &df_live_subreg_get_bb_info (bb->index)->full_out;
+  if (bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
+    bitmap_copy (full_out, &df->hardware_regs_used);
+}
+
+/* Confluence function that ignores fake edges.  */
+
+static bool
+df_live_subreg_confluence_n (edge e)
+{
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  class df_live_subreg_bb_info *src_bb_info
+    = df_live_subreg_get_bb_info (e->src->index);
+  class df_live_subreg_bb_info *dest_bb_info
+    = df_live_subreg_get_bb_info (e->dest->index);
+
+  if (!problem_data->has_subreg_live_p)
+    {
+      bool changed = false;
+
+      /* Call-clobbered registers die across exception and call edges.
+	 Conservatively treat partially-clobbered registers as surviving
+	 across the edges; they might or might not, depending on what
+	 mode they have.  */
+      /* ??? Abnormal call edges ignored for the moment, as this gets
+	 confused by sibling call edges, which crashes reg-stack.  */
+      if (e->flags & EDGE_EH)
+	{
+	  bitmap_view<HARD_REG_SET> eh_kills (eh_edge_abi.full_reg_clobbers ());
+	  changed
+	    = bitmap_ior_and_compl_into (&src_bb_info->full_out,
+					 &dest_bb_info->full_in, eh_kills);
+	}
+      else
+	changed
+	  = bitmap_ior_into (&src_bb_info->full_out, &dest_bb_info->full_in);
+
+      changed
+	|= bitmap_ior_into (&src_bb_info->full_out, &df->hardware_regs_used);
+      return changed;
+    }
+
+  /* If there has subreg live need be tracked. Calculation formula:
+       temp_full mean:
+	 1. partial in out/in, full in other in/out
+	 2. partial in out and in, and mrege range is full
+       temp_range mean:
+	 the range of regno which partial live
+       src_bb_info->partial_out = (src_bb_info->partial_out |
+				   dest_bb_info->partial_in) & ~temp_full
+       src_bb_info->range_out = copy(temp_range)
+       src_bb_info->full_out |= dest_bb_info->full_in | temp_full
+       */
+  subregs_live temp_range;
+  temp_range.add_lives (*src_bb_info->range_out);
+  temp_range.add_lives (*dest_bb_info->range_in);
+
+  bitmap_head temp_partial_all;
+  bitmap_initialize (&temp_partial_all, &bitmap_default_obstack);
+  bitmap_ior (&temp_partial_all, &src_bb_info->partial_out,
+	      &dest_bb_info->partial_in);
+
+  bitmap_head temp_full;
+  bitmap_initialize (&temp_full, &bitmap_default_obstack);
+
+  /* Collect regno that become full after merge src_bb_info->partial_out
+     and dest_bb_info->partial_in.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_all, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      if (bitmap_bit_p (&src_bb_info->full_out, regno)
+	  || bitmap_bit_p (&dest_bb_info->full_in, regno))
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	  continue;
+	}
+      else if (!bitmap_bit_p (&src_bb_info->partial_out, regno)
+	       || !bitmap_bit_p (&dest_bb_info->partial_in, regno))
+	continue;
+
+      subreg_ranges temp = src_bb_info->range_out->lives.at (regno);
+      temp.add_ranges (dest_bb_info->range_in->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	}
+    }
+
+  /* Calculating src_bb_info->partial_out and src_bb_info->range_out.  */
+  bool changed = bitmap_and_compl (&src_bb_info->partial_out, &temp_partial_all,
+				   &temp_full);
+  changed |= src_bb_info->range_out->copy_lives (temp_range);
+
+  /* Calculating src_bb_info->full_out.  */
+  bitmap_ior_into (&temp_full, &dest_bb_info->full_in);
+
+  /* Call-clobbered registers die across exception and call edges.
+     Conservatively treat partially-clobbered registers as surviving
+     across the edges; they might or might not, depending on what
+     mode they have.  */
+  /* ??? Abnormal call edges ignored for the moment, as this gets
+     confused by sibling call edges, which crashes reg-stack.  */
+  if (e->flags & EDGE_EH)
+    {
+      bitmap_view<HARD_REG_SET> eh_kills (eh_edge_abi.full_reg_clobbers ());
+      changed |= bitmap_ior_and_compl_into (&src_bb_info->full_out, &temp_full,
+					    eh_kills);
+    }
+  else
+    changed |= bitmap_ior_into (&src_bb_info->full_out, &temp_full);
+
+  changed |= bitmap_ior_into (&src_bb_info->full_out, &df->hardware_regs_used);
+
+  bitmap_clear (&temp_full);
+  bitmap_clear (&temp_partial_all);
+
+  df_live_subreg_check_result (&src_bb_info->full_out,
+			       &src_bb_info->partial_out,
+			       src_bb_info->range_out);
+  return changed;
+}
+
+/* Transfer function.  */
+
+static bool
+df_live_subreg_transfer_function (int bb_index)
+{
+  class df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  if (!problem_data->has_subreg_live_p)
+    {
+      bitmap in = &bb_info->full_in;
+      bitmap out = &bb_info->full_out;
+      bitmap use = &bb_info->full_use;
+      bitmap def = &bb_info->full_def;
+
+      return bitmap_ior_and_compl (in, use, out, def);
+    }
+
+  /* If there has subreg live need be tracked, follow the bellow calculation
+     formula:
+       all_def = full_def | partial_def
+       temp_partial_out = ((full_out & partail_def)
+			   | (partail_out & ~all_def)
+			   | (partial_out remove partail_def not empty))
+			  & ~full_use
+       temp_partial_be_full = (temp_partial_out & partial_use) merge be full
+       full_in = full_use | full_out & ~all_def | temp_partial_be_full
+       partail_in = (temp_partial_out | partial_use) & ~temp_partial_be_full  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  bool changed = false;
+  bitmap_head temp_partial_out;
+  bitmap_head temp_partial_be_full;
+  bitmap_head all_def;
+  subregs_live temp_range_out;
+  bitmap_initialize (&temp_partial_out, &bitmap_default_obstack);
+  bitmap_initialize (&temp_partial_be_full, &bitmap_default_obstack);
+  bitmap_initialize (&all_def, &bitmap_default_obstack);
+
+  bitmap_ior (&all_def, &bb_info->full_def, &bb_info->partial_def);
+
+  /* temp_partial_out = (full_out & partail_def) */
+  bitmap_and (&temp_partial_out, &bb_info->full_out, &bb_info->partial_def);
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_out, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      subreg_ranges temp (bb_info->range_def->lives.at (regno).max);
+      temp.make_full ();
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      temp_range_out.add_ranges (regno, temp);
+    }
+
+  /* temp_partial_out |= (partail_out & ~all_def) */
+  bitmap_ior_and_compl_into (&temp_partial_out, &bb_info->partial_out,
+			     &all_def);
+  EXECUTE_IF_AND_COMPL_IN_BITMAP (&bb_info->partial_out, &all_def,
+				  FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      temp_range_out.add_ranges (regno, bb_info->range_out->lives.at (regno));
+    }
+
+  /* temp_partial_out |= (partial_out remove partail_def not empty) */
+  EXECUTE_IF_AND_IN_BITMAP (&bb_info->partial_out, &bb_info->partial_def, 0,
+			    regno, bi)
+    {
+      subreg_ranges temp = bb_info->range_out->lives.at (regno);
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      if (!temp.empty_p ())
+	{
+	  bitmap_set_bit (&temp_partial_out, regno);
+	  temp_range_out.add_ranges (regno, temp);
+	}
+    }
+
+  /* temp_partial_out = temp_partial_out & ~full_use */
+  bitmap_and_compl_into (&temp_partial_out, &bb_info->full_use);
+  EXECUTE_IF_SET_IN_BITMAP (&bb_info->full_use, 0, regno, bi)
+    if (!temp_range_out.empty_p (regno))
+      temp_range_out.remove_live (regno);
+
+  /* temp_partial_be_full = (temp_partial_out & partial_use) merge become full
+   */
+  temp_range_out.add_lives (*bb_info->range_use);
+  /* Remove all range which in partial_use and in full_out and not in all_def.
+   */
+  EXECUTE_IF_SET_IN_BITMAP (&bb_info->full_out, 0, regno, bi)
+    if (!bitmap_bit_p (&all_def, regno) && !temp_range_out.empty_p (regno))
+      temp_range_out.remove_live (regno);
+
+  EXECUTE_IF_AND_IN_BITMAP (&temp_partial_out, &bb_info->partial_use, 0, regno,
+			    bi)
+    {
+      subreg_ranges temp = temp_range_out.lives.at (regno);
+      temp.add_ranges (bb_info->range_use->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_partial_be_full, regno);
+	  temp_range_out.remove_live (regno);
+	}
+    }
+
+  /* Calculating full_in.  */
+  bitmap_ior_and_compl_into (&temp_partial_be_full, &bb_info->full_out,
+			     &all_def);
+  changed |= bitmap_ior (&bb_info->full_in, &temp_partial_be_full,
+			 &bb_info->full_use);
+
+  /* Calculating partial_in and range_in.  */
+  bitmap_ior_into (&temp_partial_out, &bb_info->partial_use);
+  changed |= bitmap_and_compl (&bb_info->partial_in, &temp_partial_out,
+			       &temp_partial_be_full);
+  changed |= bb_info->range_in->copy_lives (temp_range_out);
+
+  bitmap_clear (&temp_partial_out);
+  bitmap_clear (&temp_partial_be_full);
+  bitmap_clear (&all_def);
+
+  df_live_subreg_check_result (&bb_info->full_in, &bb_info->partial_in,
+			       bb_info->range_in);
+
+  return changed;
+}
+
+/* Run the fast dce as a side effect of building LR.  */
+
+void
+df_live_subreg_finalize (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      class df_live_subreg_bb_info *bb_info
+	= df_live_subreg_get_bb_info (bb_index);
+      gcc_assert (bb_info);
+      bitmap_ior (&bb_info->all_in, &bb_info->full_in, &bb_info->partial_in);
+      bitmap_ior (&bb_info->all_out, &bb_info->full_out, &bb_info->partial_out);
+    }
+}
+
+/* Free all storage associated with the problem.  */
+
+static void
+df_live_subreg_free (void)
+{
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  if (df_live_subreg->block_info)
+    {
+      df_live_subreg->block_info_size = 0;
+      free (df_live_subreg->block_info);
+      df_live_subreg->block_info = NULL;
+      bitmap_obstack_release (&problem_data->live_subreg_bitmaps);
+      free (df_live_subreg->problem_data);
+      df_live_subreg->problem_data = NULL;
+    }
+
+  BITMAP_FREE (df_live_subreg->out_of_date_transfer_functions);
+  free (df_live_subreg);
+}
+
+/* Debugging info at top of bb.  */
+
+static void
+df_live_subreg_top_dump (basic_block bb, FILE *file)
+{
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb->index);
+  if (!bb_info)
+    return;
+
+  fprintf (file, ";; subreg live all in  \t");
+  df_print_regset (file, &bb_info->all_in);
+  fprintf (file, ";;   subreg live full in  \t");
+  df_print_regset (file, &bb_info->full_in);
+  fprintf (file, ";;   subreg live partial in  \t");
+  df_print_regset (file, &bb_info->partial_in);
+  fprintf (file, ";;   subreg live range in  \t");
+  bb_info->range_in->dump (file, "");
+
+  fprintf (file, "\n;;   subreg live full use  \t");
+  df_print_regset (file, &bb_info->full_use);
+  fprintf (file, ";;   subreg live partial use  \t");
+  df_print_regset (file, &bb_info->partial_use);
+  fprintf (file, ";;   subreg live range use  \t");
+  bb_info->range_use->dump (file, "");
+
+  fprintf (file, "\n;;   subreg live full def  \t");
+  df_print_regset (file, &bb_info->full_def);
+  fprintf (file, ";;   subreg live partial def  \t");
+  df_print_regset (file, &bb_info->partial_def);
+  fprintf (file, ";;   subreg live range def \t");
+  bb_info->range_def->dump (file, "");
+}
+
+/* Debugging info at bottom of bb.  */
+
+static void
+df_live_subreg_bottom_dump (basic_block bb, FILE *file)
+{
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb->index);
+  if (!bb_info)
+    return;
+
+  fprintf (file, ";; subreg live all out  \t");
+  df_print_regset (file, &bb_info->all_out);
+  fprintf (file, ";;   subreg live full out  \t");
+  df_print_regset (file, &bb_info->full_out);
+  fprintf (file, ";;   subreg live partial out  \t");
+  df_print_regset (file, &bb_info->partial_out);
+  fprintf (file, ";;   subreg live range out  \t");
+  bb_info->range_out->dump (file, "");
+}
+
+/* All of the information associated with every instance of the problem.  */
+
+static const struct df_problem problem_LIVE_SUBREG = {
+  DF_LIVE_SUBREG,		    /* Problem id.  */
+  DF_BACKWARD,			    /* Direction.  */
+  df_live_subreg_alloc,		    /* Allocate the problem specific data.  */
+  df_live_subreg_reset,		    /* Reset global information.  */
+  df_live_subreg_free_bb_info,	    /* Free basic block info.  */
+  df_live_subreg_local_compute,	    /* Local compute function.  */
+  df_live_subreg_init,		    /* Init the solution specific data.  */
+  df_worklist_dataflow,		    /* Worklist solver.  */
+  df_live_subreg_confluence_0,	    /* Confluence operator 0.  */
+  df_live_subreg_confluence_n,	    /* Confluence operator n.  */
+  df_live_subreg_transfer_function, /* Transfer function.  */
+  df_live_subreg_finalize,	    /* Finalize function.  */
+  df_live_subreg_free,		    /* Free all of the problem information.  */
+  df_live_subreg_free,	      /* Remove this problem from the stack of dataflow
+				 problems.  */
+  NULL,			      /* Debugging.  */
+  df_live_subreg_top_dump,    /* Debugging start block.  */
+  df_live_subreg_bottom_dump, /* Debugging end block.  */
+  NULL,			      /* Debugging start insn.  */
+  NULL,			      /* Debugging end insn.  */
+  NULL,			      /* Incremental solution verify start.  */
+  NULL,			      /* Incremental solution verify end.  */
+  &problem_LR,		      /* Dependent problem.  */
+  sizeof (df_live_subreg_bb_info), /* Size of entry of block_info array. */
+  TV_DF_LIVE_SUBREG,		   /* Timing variable.  */
+  false /* Reset blocks on dropping out of blocks_to_analyze.  */
+};
+
+/* Create a new DATAFLOW instance and add it to an existing instance
+   of DF.  The returned structure is what is used to get at the
+   solution.  */
+
+void
+df_live_subreg_add_problem (void)
+{
+  df_add_problem (&problem_LIVE_SUBREG);
+
+  /* These will be initialized when df_scan_blocks processes each
+     block.  */
+  df_live_subreg->out_of_date_transfer_functions
+    = BITMAP_ALLOC (&df_bitmap_obstack);
+}
 
-\f
 /*----------------------------------------------------------------------------
    LIVE AND MAY-INITIALIZED REGISTERS.
 
diff --git a/gcc/df.h b/gcc/df.h
index 402657a7076..e5daf91a25e 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -44,19 +44,20 @@ union df_ref_d;
    at any time are always defined (though LIVE is always there at -O2
    or higher); the others are always there.  */
 enum df_problem_id
-  {
-    DF_SCAN,
-    DF_LR,                /* Live Registers backward. */
-    DF_LIVE,              /* Live Registers & Uninitialized Registers */
-    DF_RD,                /* Reaching Defs. */
-    DF_CHAIN,             /* Def-Use and/or Use-Def Chains. */
-    DF_WORD_LR,           /* Subreg tracking lr.  */
-    DF_NOTE,              /* REG_DEAD and REG_UNUSED notes.  */
-    DF_MD,                /* Multiple Definitions. */
-    DF_MIR,               /* Must-initialized Registers.  */
-
-    DF_LAST_PROBLEM_PLUS1
-  };
+{
+  DF_SCAN,
+  DF_LR,	  /* Live Registers backward. */
+  DF_LIVE,	  /* Live Registers & Uninitialized Registers */
+  DF_LIVE_SUBREG, /* Live Ranges and Live Subreg */
+  DF_RD,	  /* Reaching Defs. */
+  DF_CHAIN,	  /* Def-Use and/or Use-Def Chains. */
+  DF_WORD_LR,	  /* Subreg tracking lr.  */
+  DF_NOTE,	  /* REG_DEAD and REG_UNUSED notes.  */
+  DF_MD,	  /* Multiple Definitions. */
+  DF_MIR,	  /* Must-initialized Registers.  */
+
+  DF_LAST_PROBLEM_PLUS1
+};
 
 /* Dataflow direction.  */
 enum df_flow_dir
@@ -619,6 +620,7 @@ public:
 #define DF_SCAN_BB_INFO(BB) (df_scan_get_bb_info ((BB)->index))
 #define DF_RD_BB_INFO(BB) (df_rd_get_bb_info ((BB)->index))
 #define DF_LR_BB_INFO(BB) (df_lr_get_bb_info ((BB)->index))
+#define DF_LIVE_SUBREG_INFO(BB) (df_live_subreg_get_bb_info ((BB)->index))
 #define DF_LIVE_BB_INFO(BB) (df_live_get_bb_info ((BB)->index))
 #define DF_WORD_LR_BB_INFO(BB) (df_word_lr_get_bb_info ((BB)->index))
 #define DF_MD_BB_INFO(BB) (df_md_get_bb_info ((BB)->index))
@@ -632,6 +634,15 @@ public:
 #define DF_MIR_IN(BB) (&DF_MIR_BB_INFO (BB)->in)
 #define DF_MIR_OUT(BB) (&DF_MIR_BB_INFO (BB)->out)
 
+#define DF_LIVE_SUBREG_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->all_in)
+#define DF_LIVE_SUBREG_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->all_out)
+#define DF_LIVE_SUBREG_FULL_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->full_in)
+#define DF_LIVE_SUBREG_FULL_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->full_out)
+#define DF_LIVE_SUBREG_PARTIAL_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->partial_in)
+#define DF_LIVE_SUBREG_PARTIAL_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->partial_out)
+#define DF_LIVE_SUBREG_RANGE_IN(BB) (DF_LIVE_SUBREG_INFO (BB)->range_in)
+#define DF_LIVE_SUBREG_RANGE_OUT(BB) (DF_LIVE_SUBREG_INFO (BB)->range_out)
+
 /* These macros are used by passes that are not tolerant of
    uninitialized variables.  This intolerance should eventually
    be fixed.  */
@@ -878,6 +889,32 @@ public:
   bitmap_head out;   /* At the bottom of the block.  */
 };
 
+class subregs_live;
+
+class basic_block_subreg_live_info
+{
+public:
+  bitmap_head full_def;
+  bitmap_head full_use;
+  /* Only for pseudo registers.  */
+  bitmap_head partial_def;
+  bitmap_head partial_use;
+  subregs_live *range_def = NULL;
+  subregs_live *range_use = NULL;
+};
+
+/* Live registers and live ranges including specifial subreg.  */
+class df_live_subreg_bb_info : public basic_block_subreg_live_info
+{
+public:
+  bitmap_head all_in, full_in;
+  bitmap_head all_out, full_out;
+  /* Only for pseudo registers.  */
+  bitmap_head partial_in;
+  bitmap_head partial_out;
+  subregs_live *range_in = NULL;
+  subregs_live *range_out = NULL;
+};
 
 /* Uninitialized registers.  All bitmaps are referenced by the
    register number.  Anded results of the forwards and backward live
@@ -946,6 +983,7 @@ extern class df_d *df;
 #define df_note    (df->problems_by_index[DF_NOTE])
 #define df_md      (df->problems_by_index[DF_MD])
 #define df_mir     (df->problems_by_index[DF_MIR])
+#define df_live_subreg (df->problems_by_index[DF_LIVE_SUBREG])
 
 /* This symbol turns on checking that each modification of the cfg has
   been identified to the appropriate df routines.  It is not part of
@@ -1031,6 +1069,25 @@ extern void df_lr_add_problem (void);
 extern void df_lr_verify_transfer_functions (void);
 extern void df_live_verify_transfer_functions (void);
 extern void df_live_add_problem (void);
+extern void
+df_live_subreg_add_problem (void);
+extern void
+df_live_subreg_finalize (bitmap all_blocks);
+class subreg_range;
+extern bool
+need_track_subreg (int regno, machine_mode mode);
+extern void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		     machine_mode mode, const subreg_range &range);
+extern bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref);
+extern void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		  machine_mode mode, const subreg_range &range,
+		  bool is_def = false);
+extern bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+		  bool is_def = false);
 extern void df_live_set_all_dirty (void);
 extern void df_chain_add_problem (unsigned int);
 extern void df_word_lr_add_problem (void);
@@ -1124,6 +1181,16 @@ df_lr_get_bb_info (unsigned int index)
     return NULL;
 }
 
+inline class df_live_subreg_bb_info *
+df_live_subreg_get_bb_info (unsigned int index)
+{
+  if (index < df_live_subreg->block_info_size)
+    return &(
+      (class df_live_subreg_bb_info *) df_live_subreg->block_info)[index];
+  else
+    return NULL;
+}
+
 inline class df_md_bb_info *
 df_md_get_bb_info (unsigned int index)
 {
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 07aba27c1c9..7df98164503 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1991,7 +1991,13 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
       create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
      another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_FULL_IN (bb), FIRST_PSEUDO_REGISTER,
+			     i, bi)
+    if (ira_curr_regno_allocno_map[i] == NULL)
+      ira_create_allocno (i, false, ira_curr_loop_tree_node);
+
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_PARTIAL_IN (bb),
+			     FIRST_PSEUDO_REGISTER, i, bi)
     if (ira_curr_regno_allocno_map[i] == NULL)
       ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -2007,10 +2013,10 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = DF_LIVE_SUBREG_IN (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
-			     FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_OUT (e->src), FIRST_PSEUDO_REGISTER,
+			     i, bi)
     if (bitmap_bit_p (live_in_regs, i))
       {
 	if (ira_curr_regno_allocno_map[i] == NULL)
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 6af8318e5f5..f1b96d1aee6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2862,8 +2862,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p)
       FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
 	if (e->src != loop_node->loop->latch
 	    && (regno < 0
-		|| (bitmap_bit_p (df_get_live_out (e->src), regno)
-		    && bitmap_bit_p (df_get_live_in (e->dest), regno))))
+		|| (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+		    && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno))))
 	  freq += EDGE_FREQUENCY (e);
     }
   else
@@ -2871,8 +2871,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p)
       auto_vec<edge> edges = get_loop_exit_edges (loop_node->loop);
       FOR_EACH_VEC_ELT (edges, i, e)
 	if (regno < 0
-	    || (bitmap_bit_p (df_get_live_out (e->src), regno)
-		&& bitmap_bit_p (df_get_live_in (e->dest), regno)))
+	    || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+		&& bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno)))
 	  freq += EDGE_FREQUENCY (e);
     }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index bcc4f09f7c4..84ed482e568 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
     return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = DF_LIVE_SUBREG_IN (e->dest);
+  regs_live_out_src = DF_LIVE_SUBREG_OUT (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 			     FIRST_PSEUDO_REGISTER, regno, bi)
     if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 	 destination block) to use for searching allocnos by their
 	 regnos because of subsequent IR flattening.  */
       node = IRA_BB_NODE (bb)->parent;
-      bitmap_copy (live_through, df_get_live_in (bb));
+      bitmap_copy (live_through, DF_LIVE_SUBREG_IN (bb));
       add_range_and_copies_from_move_list
 	(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-      bitmap_copy (live_through, df_get_live_out (bb));
+      bitmap_copy (live_through, DF_LIVE_SUBREG_OUT (bb));
       add_range_and_copies_from_move_list
 	(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
       FOR_EACH_EDGE (e, ei, bb->succs)
 	{
-	  bitmap_and (live_through,
-		      df_get_live_in (e->dest), df_get_live_out (bb));
+	  bitmap_and (live_through, DF_LIVE_SUBREG_IN (e->dest),
+		      DF_LIVE_SUBREG_OUT (bb));
 	  add_range_and_copies_from_move_list
 	    ((move_t) e->aux, node, live_through,
 	     REG_FREQ_FROM_EDGE_FREQ (EDGE_FREQUENCY (e)));
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index bc8493856a4..60e6be0b0ae 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -1194,13 +1194,14 @@ process_out_of_region_eh_regs (basic_block bb)
   if (! eh_p)
     return;
 
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_out (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_OUT (bb), FIRST_PSEUDO_REGISTER, i,
+			    bi)
     {
       ira_allocno_t a = ira_curr_regno_allocno_map[i];
       for (int n = ALLOCNO_NUM_OBJECTS (a) - 1; n >= 0; n--)
 	{
 	  ira_object_t obj = ALLOCNO_OBJECT (a, n);
-	  for (int k = 0; ; k++)
+	  for (int k = 0;; k++)
 	    {
 	      unsigned int regno = EH_RETURN_DATA_REGNO (k);
 	      if (regno == INVALID_REGNUM)
@@ -1239,7 +1240,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	  high_pressure_start_point[ira_pressure_classes[i]] = -1;
 	}
       curr_bb_node = loop_tree_node;
-      reg_live_out = df_get_live_out (bb);
+      reg_live_out = DF_LIVE_SUBREG_OUT (bb);
       sparseset_clear (objects_live);
       REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
       hard_regs_live &= ~(eliminable_regset | ira_no_alloc_regs);
diff --git a/gcc/ira.cc b/gcc/ira.cc
index 2fa6e0e5c94..b9159d089c3 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -4719,8 +4719,8 @@ find_moveable_pseudos (void)
       bitmap_initialize (local, 0);
       bitmap_initialize (transp, 0);
       bitmap_initialize (moveable, 0);
-      bitmap_copy (live, df_get_live_out (bb));
-      bitmap_and_into (live, df_get_live_in (bb));
+      bitmap_copy (live, DF_LIVE_SUBREG_OUT (bb));
+      bitmap_and_into (live, DF_LIVE_SUBREG_IN (bb));
       bitmap_copy (transp, live);
       bitmap_clear (moveable);
       bitmap_clear (live);
@@ -5018,9 +5018,10 @@ interesting_dest_for_shprep_1 (rtx set, basic_block call_dom)
 {
   rtx src = SET_SRC (set);
   rtx dest = SET_DEST (set);
-  if (!REG_P (src) || !HARD_REGISTER_P (src)
-      || !REG_P (dest) || HARD_REGISTER_P (dest)
-      || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest))))
+  if (!REG_P (src) || !HARD_REGISTER_P (src) || !REG_P (dest)
+      || HARD_REGISTER_P (dest)
+      || (call_dom
+	  && !bitmap_bit_p (DF_LIVE_SUBREG_IN (call_dom), REGNO (dest))))
     return NULL;
   return dest;
 }
@@ -5498,10 +5499,10 @@ allocate_initial_values (void)
 		  /* Update global register liveness information.  */
 		  FOR_EACH_BB_FN (bb, cfun)
 		    {
-		      if (REGNO_REG_SET_P (df_get_live_in (bb), regno))
-			SET_REGNO_REG_SET (df_get_live_in (bb), new_regno);
-		      if (REGNO_REG_SET_P (df_get_live_out (bb), regno))
-			SET_REGNO_REG_SET (df_get_live_out (bb), new_regno);
+		      if (REGNO_REG_SET_P (DF_LIVE_SUBREG_IN (bb), regno))
+			SET_REGNO_REG_SET (DF_LIVE_SUBREG_IN (bb), new_regno);
+		      if (REGNO_REG_SET_P (DF_LIVE_SUBREG_OUT (bb), regno))
+			SET_REGNO_REG_SET (DF_LIVE_SUBREG_OUT (bb), new_regno);
 		    }
 		}
 	    }
@@ -5663,6 +5664,7 @@ ira (FILE *f)
   if (optimize > 1)
     df_remove_problem (df_live);
   gcc_checking_assert (df_live == NULL);
+  df_live_subreg_add_problem ();
 
   if (flag_checking)
     df->changeable_flags |= DF_VERIFY_SCHEDULED;
diff --git a/gcc/reginfo.cc b/gcc/reginfo.cc
index d472a35946a..66b269fee41 100644
--- a/gcc/reginfo.cc
+++ b/gcc/reginfo.cc
@@ -485,6 +485,20 @@ init_reg_modes_target (void)
     }
 }
 
+/* For a machine mode, map to the number of blocks the MODE overlap. Use to
+   cache the result by get_nblocks_slow.  */
+int regmode_nblocks[MAX_MACHINE_MODE];
+
+/* Return the number of blocks the MODE overlap. One block equal mode's natural
+   size.  */
+int
+get_nblocks_slow (machine_mode mode)
+{
+  poly_int64 mode_size = GET_MODE_SIZE (mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (mode);
+  return exact_div (mode_size, unit_size).to_constant ();
+}
+
 /* Finish initializing the register sets and initialize the register modes.
    This function might be invoked more than once, if the target has support
    for changing register usage conventions on a per-function basis.
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..a7d37cfb764 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4627,6 +4627,20 @@ struct rtx_comparison {
   machine_mode mode;
 };
 
+extern int regmode_nblocks[MAX_MACHINE_MODE];
+extern int get_nblocks_slow (machine_mode);
+/* Return the number of blocks the MODE overlap. One block equal mode's natural
+   size. So, satisfy the following equation:
+     (nblocks - 1) * natural_size < GET_MODE_SIZE (mode)
+       <= nblocks * natural_size. */
+inline int
+get_nblocks (machine_mode mode)
+{
+  if (regmode_nblocks[mode] > 0)
+    return regmode_nblocks[mode];
+  return get_nblocks_slow (mode);
+}
+
 /* gtype-desc.cc.  */
 extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 00000000000..eb14f2ec1ea
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,649 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.ding@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+#include "selftest.h"
+#include "print-rtl.h"
+
+/* class subreg_range */
+void
+subreg_range::dump (FILE *file) const
+{
+  fprintf (file, "[%d, %d)", start, end);
+}
+
+/* class subreg_ranges */
+bool
+subreg_ranges::add_range (int max, const subreg_range &new_range)
+{
+  subreg_range range = new_range;
+  if (full_p ())
+    return false;
+  else if (maybe_eq (max, 1))
+    {
+      gcc_assert (maybe_eq (range.start, 0) && maybe_eq (range.end, 1));
+      make_full ();
+      return true;
+    }
+
+  if (maybe_eq (this->max, 1))
+    change_max (max);
+
+  gcc_assert (maybe_eq (this->max, max));
+  gcc_assert (maybe_lt (range.start, range.end));
+
+  bool changed = empty_p ();
+  auto it = ranges.begin ();
+  while (it != ranges.end ())
+    {
+      const subreg_range &r = *it;
+      gcc_assert (maybe_lt (r.start, r.end));
+
+      /* The possible positional relationship of R and RANGE.
+	 1~5 means R.start's possible position relative to RANGE
+	 A~G means R.end's possible position relative to RANGE
+	 caseN means when R.start at N positon, the R.end can be in which
+	 positions.
+
+		     RANGE.start     RANGE.end
+			  [               )
+			  |               |
+	R.start   1       2       3       4       5
+	R.end             |               |
+	  case1       A   B       C       D       E
+	  case2           |       C       D       E
+	  case3           |           F   D       E
+	  case4           |               |       E
+	  case5           |               |               G
+
+	*/
+
+      /* R.start at 1 position.   */
+      if (maybe_lt (r.start, range.start))
+	{
+	  /* R.end at A position. That means R and RANGE do not overlap.  */
+	  if (maybe_lt (r.end, range.start))
+	    it++;
+	  /* R.end at B/C position. That means RANGE's left part overlap R's
+	     right part. Expand RANGE.start to R.start and remove R.  */
+	  else if (maybe_lt (r.end, range.end))
+	    {
+	      changed = true;
+	      range.start = r.start;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at D/E position. That means R already contains RANGE, nothing
+	     todo.  */
+	  else
+	    return false;
+	}
+      /* R.start at 2 position.  */
+      else if (maybe_eq (r.start, range.start))
+	{
+	  /* R.end at C/D position. That means RANGE contains R, remove R and
+	     insert RANGE.  */
+	  if (maybe_lt (r.end, range.end))
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R already contains RANGE, nothing
+	     todo.  */
+	  else
+	    return false;
+	}
+      /* R.start at 3 position.  */
+      else if (maybe_gt (r.start, range.start) && maybe_lt (r.start, range.end))
+	{
+	  /* R.end at F/D position. That means RANGE contains R, just remove R
+	     and insert RANGE later.  */
+	  if (maybe_le (r.end, range.end))
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position.  That means RANGE's right part overlap R's
+	     left part. Expand RANGE.end to R.end and remove R.  */
+	  else if (maybe_gt (r.end, range.end))
+	    {
+	      changed = true;
+	      range.end = r.end;
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 4 position and R.end at E position. That means RANGE and R
+	 are adjacent and can be merged. */
+      else if (maybe_eq (r.start, range.end))
+	{
+	  changed = true;
+	  range.end = r.end;
+	  it = ranges.erase (it);
+	}
+      /* R.start at 5 position and R.end at G position. That means R and RANGE
+	 do not overlap.  */
+      else
+	break;
+    }
+  ranges.insert (range);
+  return changed;
+}
+
+bool
+subreg_ranges::remove_range (int max, const subreg_range &range)
+{
+  if (empty_p ())
+    return false;
+  else if (maybe_eq (max, 1))
+    {
+      gcc_assert (maybe_eq (range.start, 0) && maybe_eq (range.end, 1));
+      make_empty ();
+      return true;
+    }
+
+  if (maybe_eq (this->max, 1))
+    {
+      gcc_assert (full_p ());
+      change_max (max);
+    }
+  gcc_assert (maybe_eq (this->max, max));
+  gcc_assert (maybe_lt (range.start, range.end));
+
+  bool changed = false;
+  auto it = ranges.begin ();
+  std::set<subreg_range> new_ranges;
+  while (it != ranges.end ())
+    {
+      auto &r = *it;
+      gcc_assert (maybe_lt (r.start, r.end));
+
+      /* The possible positional relationship of R and RANGE.
+	 1~5 means R.start's possible position relative to RANGE
+	 A~G means R.end's possible position relative to RANGE
+	 caseN means when R.start at N positon, the R.end can be in which
+	 positions.
+
+		     RANGE.start     RANGE.end
+			  [               )
+			  |               |
+	R.start   1       2       3       4       5
+	R.end             |               |
+	  case1       A   B       C       D       E
+	  case2           |       C       D       E
+	  case3           |           F   D       E
+	  case4           |               |       E
+	  case5           |               |               G
+
+	*/
+
+      /* R.start at 1 position.  */
+      if (maybe_lt (r.start, range.start))
+	{
+	  /* R.end at A/B position. That means RANGE and R do not overlap,
+	     nothing to remove.  */
+	  if (maybe_le (r.end, range.start))
+	    it++;
+	  /* R.end at C/D position. That means R's rigth part contains RANGE,
+	     need shrink R.end to RANGE.start.  */
+	  else if (maybe_le (r.end, range.end))
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (r.start, range.start));
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R's center part contains RANGE,
+	     need split R to two range, one range is [R.start, RANGE.start),
+	     another range is [RANGE.end, R.end).  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (r.start, range.start));
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 2 position.  */
+      else if (maybe_eq (r.start, range.start))
+	{
+	  /* R.end at C/D position. That means RANGE contains R, remove R.  */
+	  if (maybe_le (r.end, range.end))
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R's left part contains RANGE,
+	     shrink R.start to RANGE.end.  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 3 position. */
+      else if (maybe_gt (r.start, range.start) && maybe_lt (r.start, range.end))
+	{
+	  /* R.end at F/D position. That means RANGE contains R, remove R.  */
+	  if (maybe_le (r.end, range.end))
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means RANGE's right part overlap R's left
+	     part, shrink R.start to RANGE.end.  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 4/5 position. That means RANGE and R do not overlap.  */
+      else
+	break;
+    }
+  for (auto &r : new_ranges)
+    add_range (this->max, r);
+  return changed;
+}
+
+bool
+subreg_ranges::add_ranges (const subreg_ranges &sr)
+{
+  gcc_assert (maybe_eq (max, sr.max) || maybe_eq (max, 1)
+	      || maybe_eq (sr.max, 1));
+
+  if (full_p () || sr.empty_p ())
+    return false;
+  else if (sr.full_p ())
+    {
+      make_full ();
+      return true;
+    }
+
+  bool changed = false;
+  for (auto &r : sr.ranges)
+    {
+      changed |= add_range (sr.max, r);
+    }
+  return changed;
+}
+
+bool
+subreg_ranges::remove_ranges (const subreg_ranges &sr)
+{
+  if (empty_p () || sr.empty_p ())
+    return false;
+  else if (sr.full_p ())
+    {
+      make_empty ();
+      return true;
+    }
+
+  gcc_assert (maybe_eq (max, sr.max) || maybe_eq (max, 1)
+	      || maybe_eq (sr.max, 1));
+
+  bool changed = false;
+  for (auto &r : sr.ranges)
+    {
+      changed |= remove_range (sr.max, r);
+    }
+  return changed;
+}
+
+bool
+subreg_ranges::same_p (const subreg_ranges &sr) const
+{
+  if (maybe_eq (max, 1) || maybe_eq (sr.max, 1))
+    {
+      return (empty_p () && sr.empty_p ()) || (full_p () && sr.full_p ());
+    }
+  else if (maybe_eq (max, sr.max))
+    {
+      if (ranges.size () != sr.ranges.size ())
+	return false;
+      /* Make sure that the elements in each position are the same.  */
+      auto it1 = ranges.begin ();
+      auto it2 = sr.ranges.begin ();
+      while (it1 != ranges.end ())
+	{
+	  const subreg_range &r1 = *it1;
+	  const subreg_range &r2 = *it2;
+	  if (maybe_ne (r1.start, r2.start) || maybe_ne (r1.end, r2.end))
+	    return false;
+	  it1++;
+	  it2++;
+	}
+      return true;
+    }
+  else
+    gcc_unreachable ();
+}
+
+bool
+subreg_ranges::include_ranges_p (const subreg_ranges &sr) const
+{
+  gcc_assert (maybe_eq (max, sr.max) || maybe_eq (max, 1)
+	      || maybe_eq (sr.max, 1));
+  if (full_p ())
+    return true;
+  if (empty_p () && sr.empty_p ())
+    return true;
+  if (same_p (sr))
+    return true;
+
+  for (const auto &r : sr.ranges)
+    if (!include_range_p (sr.max, r))
+      return false;
+  return true;
+}
+
+bool
+subreg_ranges::include_range_p (int max, const subreg_range &range) const
+{
+  gcc_assert (maybe_eq (this->max, max));
+  for (const auto &r : ranges)
+    {
+      if (maybe_le (r.start, range.start) && maybe_ge (r.end, range.end))
+	{
+	  return true;
+	}
+      else if (maybe_ge (r.start, range.end))
+	return false;
+    }
+  return false;
+}
+
+void
+subreg_ranges::dump (FILE *file) const
+{
+  if (empty_p ())
+    {
+      fprintf (file, "empty");
+      return;
+    }
+  else if (full_p ())
+    {
+      fprintf (file, "full");
+      return;
+    }
+
+  fprintf (file, "patial(max:%d", max);
+  fprintf (file, " {");
+  for (auto &range : ranges)
+    {
+      fprintf (file, " ");
+      range.dump (file);
+    }
+  fprintf (file, " })");
+}
+
+/* class subregs_live */
+bool
+subregs_live::copy_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  subregs_live temp;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (lives.count (regno) == 0 && !sr.empty_p ())
+	{
+	  changed = true;
+	  temp.add_ranges (regno, sr);
+	}
+      else if (lives.count (regno) != 0)
+	{
+	  changed |= !lives.at (regno).same_p (sr);
+	  temp.add_ranges (regno, sr);
+	}
+    }
+
+  for (auto &kv : lives)
+    {
+      unsigned int regno = kv.first;
+      subreg_ranges &sr = kv.second;
+      if (temp.lives.count (regno) == 0 && !sr.empty_p ())
+	changed = true;
+    }
+  lives = temp.lives;
+  return changed;
+}
+
+bool
+subregs_live::add_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+
+      if (lives.count (regno) == 0)
+	{
+	  changed = true;
+	  lives.insert ({regno, sr});
+	}
+      else
+	{
+	  changed |= lives.at (regno).add_ranges (sr);
+	}
+    }
+  return changed;
+}
+
+bool
+subregs_live::remove_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+
+      if (lives.count (regno) != 0)
+	{
+	  changed |= lives.at (regno).remove_ranges (sr);
+	  if (lives.at (regno).empty_p ())
+	    lives.erase (regno);
+	}
+    }
+  return changed;
+}
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+    {
+      fprintf (file, "%sempty\n", indent);
+      return;
+    }
+  fprintf (file, "%s", indent);
+  for (auto &kv : lives)
+    {
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+      fprintf (file, "%d ", kv.first);
+      if (!sr.full_p ())
+	{
+	  sr.dump (file);
+	  fprintf (file, "  ");
+	}
+    }
+  fprintf (file, "\n");
+}
+
+/* class live_point */
+void
+live_point::dump (FILE *file) const
+{
+  if (!use_reg.empty_p ())
+    {
+      fprintf (file, "use ");
+      use_reg.dump (file);
+      if (!def_reg.empty_p ())
+	{
+	  fprintf (file, ", def ");
+	  def_reg.dump (file);
+	}
+    }
+  else if (!def_reg.empty_p ())
+    {
+      fprintf (file, "def ");
+      def_reg.dump (file);
+    }
+  else
+    {
+      gcc_unreachable ();
+    }
+}
+
+/* class live_points */
+void
+live_points::dump (FILE *file) const
+{
+  fprintf (file, "%u :", id);
+  if (points.empty ())
+    {
+      fprintf (file, " empty");
+      return;
+    }
+  for (const auto &kv : points)
+    {
+      fprintf (file, " ");
+      kv.second.dump (file);
+      fprintf (file, " at point %u;", kv.first);
+    }
+}
+
+/* class reg_live_ranges */
+void
+subregs_live_points::dump (FILE *file) const
+{
+  if (subreg_points.empty ())
+    {
+      fprintf (file, ";;     empty\n");
+      return;
+    }
+  for (const auto &kv : subreg_points)
+    {
+      fprintf (file, ";;     ");
+      kv.second.dump (file);
+      fprintf (file, "\n");
+    }
+}
+
+/* Define some usefull debug functions.  */
+
+DEBUG_FUNCTION void
+debug (const subreg_range &r)
+{
+  r.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subreg_ranges &sr)
+{
+  sr.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live &l)
+{
+  l.dump (stderr, "");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live *l)
+{
+  debug (*l);
+}
+
+DEBUG_FUNCTION void
+debug (const live_point &l)
+{
+  l.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const live_points &ls)
+{
+  ls.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live_points &sls)
+{
+  sls.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live_points *sls)
+{
+  debug (*sls);
+}
+
+#if CHECKING_P
+
+namespace selftest {
+
+template <unsigned int N> class poly_int_tests
+{
+public:
+  static void run () {}
+};
+
+template <> class poly_int_tests<1>
+{
+public:
+  static void run ()
+  {
+    /* class subreg_range tests.  */
+    subreg_range r1 = subreg_range (1, 2);
+    subreg_range r2 = subreg_range (2, 3);
+    subreg_range r3 = subreg_range (2, 3);
+    ASSERT_FALSE (r1.same_p (r2));
+    ASSERT_TRUE (r2.same_p (r3));
+    ASSERT_TRUE (r1 < r2);
+    ASSERT_FALSE (r2 < r1);
+
+    /* class subreg_ranges tests.  */
+  }
+};
+
+void
+subreg_live_range_tests ()
+{
+  poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 00000000000..56931b53550
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,326 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.ding@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SUBREG_LIVE_RANGE_H
+#define GCC_SUBREG_LIVE_RANGE_H
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include <set>
+#include <map>
+
+/* class subreg_range represent bytes range [start, end) of a reg.  */
+class subreg_range
+{
+public:
+  int start; /* Range start point.  */
+  int end;   /* Range start point.  */
+
+  subreg_range (int start, int end) : start (start), end (end)
+  {
+    gcc_assert (maybe_lt (start, end));
+  }
+
+  /* For sorting.  */
+  bool operator<(const subreg_range &r) const
+  {
+    if (maybe_le (end, r.start))
+      return true;
+    else if (maybe_ge (start, r.end))
+      return false;
+    else
+      /* Cannot sorting for overlap range.  */
+      gcc_unreachable ();
+  }
+  /* Return true if R same with self.  */
+  bool same_p (const subreg_range &r) const
+  {
+    return maybe_eq (start, r.start) && maybe_eq (end, r.end);
+  }
+
+  /* Return true if range is full for 0-MAX range.  */
+  bool full_p (int max) const
+  {
+    return maybe_eq (start, 0) && maybe_eq (end, max);
+  }
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+/* class subreg_ranges represent multiple disjoint and discontinuous
+   subreg_range.  */
+class subreg_ranges
+{
+public:
+  /* The maximum boundary value of range. If for a unknown mode hard register,
+     the max set to 1.  */
+  int max;
+  std::set<subreg_range> ranges;
+
+  subreg_ranges (int max) : max (max) { gcc_assert (maybe_ge (max, 1)); }
+
+  /* Modify ranges.  */
+  /* Return true if ranges changed.  */
+  bool add_range (int max, const subreg_range &range);
+  /* Return true if ranges changed.  */
+  bool remove_range (int max, const subreg_range &range);
+  /* Add SR, return true if ranges changed.  */
+  bool add_ranges (const subreg_ranges &sr);
+  /* Clear ranges of SR, return true if ranges changed.  */
+  bool remove_ranges (const subreg_ranges &sr);
+  /* Make range empty.  */
+  void make_empty () { ranges.clear (); }
+  /* Make range full.  */
+  void make_full ()
+  {
+    make_empty ();
+    ranges.insert (subreg_range (0, max));
+  }
+  /* Change max to MAX, corresponding adjust ranges.  */
+  void change_max (int max)
+  {
+    gcc_assert (maybe_eq (this->max, 1));
+    this->max = max;
+    if (full_p ())
+      make_full ();
+  }
+
+  /* Predicators.  */
+  bool full_p () const
+  {
+    if (ranges.size () != 1)
+      return false;
+    const subreg_range &r = *ranges.begin ();
+    return maybe_eq (r.start, 0) && maybe_eq (r.end, max);
+  }
+  bool empty_p () const { return ranges.empty (); }
+  bool same_p (const subreg_ranges &sr) const;
+  bool same_p (int max, const subreg_range &range) const
+  {
+    subreg_ranges sr = subreg_ranges (max);
+    sr.add_range (max, range);
+    return same_p (sr);
+  }
+  bool include_ranges_p (const subreg_ranges &sr) const;
+  bool include_range_p (int max, const subreg_range &range) const;
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+/* class subregs_live record the live subreg_ranges of registers.  */
+class subregs_live
+{
+public:
+  /* The key is usually the register's regno.  */
+  std::map<unsigned int, subreg_ranges> lives;
+
+  /* Add/clear live range.  */
+  bool add_range (unsigned int regno, int max, const subreg_range &range)
+  {
+    if (lives.count (regno) == 0)
+      {
+	lives.insert ({regno, subreg_ranges (max)});
+      }
+    return lives.at (regno).add_range (max, range);
+  }
+  bool remove_range (unsigned int regno, int max, const subreg_range &range)
+  {
+    if (lives.count (regno) != 0)
+      {
+	bool changed = lives.at (regno).remove_range (max, range);
+	if (lives.at (regno).empty_p ())
+	  remove_live (regno);
+	return changed;
+      }
+    return false;
+  }
+  /* Add a unexist register live range.  */
+  void add_ranges (unsigned int regno, const subreg_ranges &ranges)
+  {
+    if (lives.count (regno) == 0)
+      lives.insert ({regno, ranges});
+    else
+      lives.at (regno).add_ranges (ranges);
+  }
+  bool copy_lives (const subregs_live &sl);
+  bool add_lives (const subregs_live &sl);
+  bool remove_lives (const subregs_live &sl);
+  void remove_live (unsigned int regno) { lives.erase (regno); }
+  /* Remove all register live range.  */
+  void clear () { lives.clear (); }
+  void clear (unsigned min_regno)
+  {
+    if (lives.lower_bound (min_regno) != lives.end ())
+      lives.erase (lives.lower_bound (min_regno), lives.end ());
+  }
+
+  /* Return true if regno's live range is full.  */
+  bool full_p (unsigned int regno) const
+  {
+    return lives.count (regno) != 0 && lives.at (regno).full_p ();
+  }
+  /* Return true if regno's live range is empty.  */
+  bool empty_p (unsigned int regno) const
+  {
+    return lives.count (regno) == 0 || lives.at (regno).empty_p ();
+  }
+  /* Return true if SL same with this.  */
+  bool same_p (const subregs_live &sl)
+  {
+    if (lives.size () != sl.lives.size ())
+      return false;
+    for (auto &kv : lives)
+      {
+	unsigned int regno = kv.first;
+	if (sl.empty_p (regno))
+	  return false;
+	const subreg_ranges &sr = kv.second;
+	if (!sr.same_p (sl.lives.at (regno)))
+	  return false;
+      }
+    return true;
+  }
+
+  /* Debug methods.  */
+  void dump (FILE *file, const char *indent = ";;     ") const;
+};
+
+class live_point
+{
+public:
+  int point;
+  /* subreg range be defined in current point.  */
+  subreg_ranges def_reg;
+  /* subreg range be used in current point.  */
+  subreg_ranges use_reg;
+
+  live_point (int max, const subreg_range &range, bool is_def)
+    : def_reg (max), use_reg (max)
+  {
+    add_range (max, range, is_def);
+  }
+  live_point (const subreg_ranges &sr, bool is_def)
+    : def_reg (sr.max), use_reg (sr.max)
+  {
+    add_ranges (sr, is_def);
+  }
+  live_point (int point, int max) : point (point), def_reg (max), use_reg (max)
+  {}
+
+  void add_range (int max, const subreg_range &r, bool is_def)
+  {
+    if (is_def)
+      def_reg.add_range (max, r);
+    else
+      use_reg.add_range (max, r);
+  }
+
+  void add_ranges (const subreg_ranges &sr, bool is_def)
+  {
+    if (is_def)
+      def_reg.add_ranges (sr);
+    else
+      use_reg.add_ranges (sr);
+  }
+
+  void dump (FILE *file) const;
+};
+
+class live_points
+{
+public:
+  int id;
+  int max;
+  std::map<int, live_point> points;
+
+  live_points (int id, int max) : id (id), max (max) {}
+
+  void add_point (int max, const subreg_range &range, bool is_def, int point)
+  {
+    gcc_assert (maybe_eq (this->max, max) || maybe_eq (this->max, 1)
+		|| maybe_eq (max, 1));
+    if (points.count (point) == 0)
+      {
+	points.insert ({point, {max, range, is_def}});
+      }
+    else
+      {
+	points.at (point).add_range (max, range, is_def);
+      }
+  }
+  void dump (FILE *file) const;
+};
+
+class subregs_live_points
+{
+public:
+  std::map<int, live_points> subreg_points;
+  std::map<int, subreg_ranges> subreg_live_ranges;
+
+  void add_point (int id, int max, const subreg_range &range, bool is_def,
+		  int point)
+  {
+    if (subreg_points.count (id) == 0)
+      subreg_points.insert ({id, live_points (id, max)});
+
+    subreg_points.at (id).add_point (max, range, is_def, point);
+
+    if (subreg_live_ranges.count (id) == 0)
+      subreg_live_ranges.insert ({id, subreg_ranges (max)});
+
+    if (is_def)
+      subreg_live_ranges.at (id).remove_range (max, range);
+    else
+      subreg_live_ranges.at (id).add_range (max, range);
+  }
+
+  void add_range (int id, int max, const subreg_range &range, bool is_def)
+  {
+    if (subreg_live_ranges.count (id) == 0)
+      subreg_live_ranges.insert ({id, subreg_ranges (max)});
+
+    if (is_def)
+      subreg_live_ranges.at (id).remove_range (max, range);
+    else
+      subreg_live_ranges.at (id).add_range (max, range);
+  }
+
+  bool full_live_p (int id)
+  {
+    return subreg_live_ranges.count (id) != 0
+	   && subreg_live_ranges.at (id).full_p ();
+  }
+
+  bool empty_live_p (int id)
+  {
+    return subreg_live_ranges.count (id) == 0
+	   || subreg_live_ranges.at (id).empty_p ();
+  }
+
+  void clear_live_ranges () { subreg_live_ranges.clear (); }
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+#endif /* GCC_SUBREG_LIVE_RANGE_H */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index d21b08c030d..4b82b6a554b 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -121,6 +121,7 @@ DEFTIMEVAR (TV_DF_MD		     , "df multiple defs")
 DEFTIMEVAR (TV_DF_RD		     , "df reaching defs")
 DEFTIMEVAR (TV_DF_LR		     , "df live regs")
 DEFTIMEVAR (TV_DF_LIVE		     , "df live&initialized regs")
+DEFTIMEVAR (TV_DF_LIVE_SUBREG	     , "df live subregs")
 DEFTIMEVAR (TV_DF_MIR		     , "df must-initialized regs")
 DEFTIMEVAR (TV_DF_CHAIN		     , "df use-def / def-use chains")
 DEFTIMEVAR (TV_DF_WORD_LR	     , "df live reg subwords")
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 3/7] ira: Support subreg live range track
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
  2023-11-08  3:47 ` [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general Lehua Ding
  2023-11-08  3:47 ` [PATCH 2/7] ira: Add live_subreg problem and apply to ira pass Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:47 ` [PATCH 4/7] ira: Support subreg copy Lehua Ding
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch extends the reg live range in ira to track the lifecycle
of subreg, thus enabling more granular tracking of the live range and
conflict of a pseudo subreg part. This patch will divide allocno into
two categories: one has single object, and the other is the case where
it contains subreg objects.

gcc/ChangeLog:

	* ira-build.cc (init_object_start_and_nregs): Removed.
	(ira_create_object): Adjust.
	(find_object): New.
	(find_object_anyway): New.
	(ira_create_allocno): Removed regs_with_subreg.
	(ira_set_allocno_class): Adjust.
	(get_range): New.
	(ira_copy_allocno_objects): New.
	(merge_hard_reg_conflicts): Adjust.
	(create_cap_allocno): Adjust.
	(find_subreg_p): New.
	(add_subregs): New.
	(create_insn_allocnos): Adjust.
	(create_bb_allocnos): Adjust.
	(move_allocno_live_ranges): Adjust.
	(copy_allocno_live_ranges):  Adjust.
	(setup_min_max_allocno_live_range_point): Adjust.
	(init_regs_with_subreg): Removed.
	(ira_build): Removed.
	(ira_destroy): Removed.
	* ira-color.cc (INCLUDE_MAP): use std::map
	(setup_left_conflict_sizes_p): Adjust.
	(push_allocno_to_stack): Adjust.
	* ira-conflicts.cc (record_object_conflict): Adjust.
	(build_object_conflicts): Adjust.
	(build_conflicts): Adjust.
	(print_allocno_conflicts): Adjust.
	* ira-emit.cc (modify_move_list): Adjust.
	* ira-int.h (struct ira_object): Adjust.
	(struct ira_allocno): Adjust.
	(OBJECT_SIZE): New.
	(OBJECT_OFFSET): New.
	(OBJECT_SUBWORD): New.
	(find_object): New.
	(find_object_anyway): New.
	(ira_copy_allocno_objects):  New.
	* ira-lives.cc (INCLUDE_VECTOR): use std::vector.
	(set_subreg_conflict_hard_regs): New.
	(make_hard_regno_dead): Adjust.
	(make_object_live): Adjust.
	(update_allocno_pressure_excess_length): Adjust.
	(make_object_dead): Adjust.
	(mark_pseudo_regno_live): New.
	(add_subreg_point): New.
	(mark_pseudo_object_live): New.
	(mark_pseudo_regno_subword_live): Removed.
	(mark_pseudo_regno_subreg_live): New.
	(mark_pseudo_regno_subregs_live): New.
	(mark_pseudo_reg_live): New.
	(mark_pseudo_regno_dead): Removed.
	(mark_pseudo_object_dead): New.
	(mark_pseudo_regno_subword_dead): Removed.
	(mark_pseudo_regno_subreg_dead): New.
	(mark_pseudo_reg_dead): Adjust.
	(process_single_reg_class_operands): Adjust.
	(process_out_of_region_eh_regs): Adjust.
	(process_bb_node_lives): Adjust.
	(class subreg_live_item): New.
	(create_subregs_live_ranges): New.
	(ira_create_allocno_live_ranges): Adjust.
	* subreg-live-range.h: New fields.

---
 gcc/ira-build.cc        | 275 +++++++++++++--------
 gcc/ira-color.cc        |  68 ++++--
 gcc/ira-conflicts.cc    |  48 ++--
 gcc/ira-emit.cc         |   2 +-
 gcc/ira-int.h           |  21 +-
 gcc/ira-lives.cc        | 522 +++++++++++++++++++++++++++++-----------
 gcc/subreg-live-range.h |  16 ++
 7 files changed, 653 insertions(+), 299 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 7df98164503..5fb7a9f800f 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -29,10 +29,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-config.h"
 #include "regs.h"
 #include "memmodel.h"
+#include "tm_p.h"
 #include "ira.h"
 #include "ira-int.h"
 #include "sparseset.h"
 #include "cfgloop.h"
+#include "subreg-live-range.h"
 
 static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
 				     ira_loop_tree_node_t);
@@ -440,49 +442,14 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
-/* Update OBJ's start and nregs field according A and OBJ info.  */
-static void
-init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
-{
-  enum reg_class aclass = ALLOCNO_CLASS (a);
-  gcc_assert (aclass != NO_REGS);
-
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (ALLOCNO_TRACK_SUBREG_P (a))
-    {
-      poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
-      for (int i = 0; i < nregs; i += 1)
-	{
-	  poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
-	  if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
-	    {
-	      OBJECT_START (obj) = i;
-	    }
-	  if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
-	    {
-	      OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
-	      break;
-	    }
-	}
-      gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
-    }
-  else
-    {
-      OBJECT_START (obj) = 0;
-      OBJECT_NREGS (obj) = nregs;
-    }
-}
-
 /* Create and return an object corresponding to a new allocno A.  */
 static ira_object_t
-ira_create_object (ira_allocno_t a, int subword)
+ira_create_object (ira_allocno_t a, int start, int nregs)
 {
   enum reg_class aclass = ALLOCNO_CLASS (a);
   ira_object_t obj = object_pool.allocate ();
 
   OBJECT_ALLOCNO (obj) = a;
-  OBJECT_SUBWORD (obj) = subword;
   OBJECT_CONFLICT_ID (obj) = ira_objects_num;
   OBJECT_CONFLICT_VEC_P (obj) = false;
   OBJECT_CONFLICT_ARRAY (obj) = NULL;
@@ -494,19 +461,14 @@ ira_create_object (ira_allocno_t a, int subword)
   OBJECT_MIN (obj) = INT_MAX;
   OBJECT_MAX (obj) = -1;
   OBJECT_LIVE_RANGES (obj) = NULL;
-  OBJECT_SIZE (obj) = UNITS_PER_WORD;
-  OBJECT_OFFSET (obj) = subword * UNITS_PER_WORD;
-  OBJECT_START (obj) = -1;
-  OBJECT_NREGS (obj) = -1;
+  OBJECT_START (obj) = start;
+  OBJECT_NREGS (obj) = nregs;
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
     = ira_object_id_map_vec.address ();
   ira_objects_num = ira_object_id_map_vec.length ();
 
-  if (aclass != NO_REGS)
-    init_object_start_and_nregs (a, obj);
-
   a->objects.push_back (obj);
 
   return obj;
@@ -524,6 +486,52 @@ find_object (ira_allocno_t a, int start, int nregs)
   return NULL;
 }
 
+ira_object_t
+find_object (ira_allocno_t a, poly_int64 offset, poly_int64 size)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+
+  if (!has_subreg_object_p (a)
+      || maybe_eq (GET_MODE_SIZE (ALLOCNO_MODE (a)), size))
+    return find_object (a, 0, nregs);
+
+  gcc_assert (maybe_lt (size, GET_MODE_SIZE (ALLOCNO_MODE (a)))
+	      && maybe_le (offset + size, GET_MODE_SIZE (ALLOCNO_MODE (a))));
+
+  int subreg_start = -1;
+  int subreg_nregs = -1;
+  for (int i = 0; i < nregs; i += 1)
+    {
+      poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	{
+	  subreg_start = i;
+	}
+      if (subreg_nregs < 0 && maybe_le (offset + size, right))
+	{
+	  subreg_nregs = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nregs > 0);
+  return find_object (a, subreg_start, subreg_nregs);
+}
+
+/* Return the object in allocno A which match START & NREGS.  Create when not
+   found.  */
+ira_object_t
+find_object_anyway (ira_allocno_t a, int start, int nregs)
+{
+  ira_object_t obj = find_object (a, start, nregs);
+  if (obj == NULL && ALLOCNO_TRACK_SUBREG_P (a))
+    obj = ira_create_object (a, start, nregs);
+
+  gcc_assert (obj != NULL);
+  return obj;
+}
+
 /* Create and return the allocno corresponding to REGNO in
    LOOP_TREE_NODE.  Add the allocno to the list of allocnos with the
    same regno if CAP_P is FALSE.  */
@@ -591,9 +599,6 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
-/* Record the regs referenced by subreg.  */
-static bitmap_head regs_with_subreg;
-
 /* Set up register class for A and update its conflict hard
    registers.  */
 void
@@ -614,8 +619,7 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
   /* SET the unit_size of one register.  */
   machine_mode mode = ALLOCNO_MODE (a);
   int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD)
-      && bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
+  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
     {
       ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
       ALLOCNO_TRACK_SUBREG_P (a) = true;
@@ -623,6 +627,39 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
     }
 }
 
+/* Return the subreg range of rtx SUBREG.  */
+static subreg_range
+get_range (rtx subreg)
+{
+  gcc_assert (read_modify_subreg_p (subreg));
+  rtx reg = SUBREG_REG (subreg);
+  machine_mode reg_mode = GET_MODE (reg);
+
+  machine_mode subreg_mode = GET_MODE (subreg);
+  int nblocks = get_nblocks (reg_mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+
+  poly_int64 offset = SUBREG_BYTE (subreg);
+  poly_int64 left = offset + GET_MODE_SIZE (subreg_mode);
+
+  int subreg_start = -1;
+  int subreg_nblocks = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nblocks < 0 && maybe_le (left, right))
+	{
+	  subreg_nblocks = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nblocks > 0);
+
+  return subreg_range (subreg_start, subreg_start + subreg_nblocks);
+}
+
 /* Determine the number of objects we should associate with allocno A
    and allocate them.  */
 void
@@ -630,15 +667,37 @@ ira_create_allocno_objects (ira_allocno_t a)
 {
   machine_mode mode = ALLOCNO_MODE (a);
   enum reg_class aclass = ALLOCNO_CLASS (a);
-  int n = ira_reg_class_max_nregs[aclass][mode];
-  int i;
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
 
-  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD)
-      || !bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
-    n = 1;
+  ira_create_object (a, 0, nregs);
 
-  for (i = 0; i < n; i++)
-    ira_create_object (a, i);
+  if (aclass == NO_REGS || !ALLOCNO_TRACK_SUBREG_P (a) || a->subregs.empty ())
+    return;
+
+  int nblocks = get_nblocks (ALLOCNO_MODE (a));
+  int times = nblocks / ALLOCNO_NREGS (a);
+  gcc_assert (times >= 1 && nblocks % ALLOCNO_NREGS (a) == 0);
+
+  for (const auto &range : a->subregs)
+    {
+      int start = range.start / times;
+      int end = CEIL (range.end, times);
+      if (find_object (a, start, end - start) != NULL)
+	continue;
+      ira_create_object (a, start, end - start);
+    }
+
+  a->subregs.clear ();
+}
+
+/* Copy the objects from FROM to TO.  */
+void
+ira_copy_allocno_objects (ira_allocno_t to, ira_allocno_t from)
+{
+  ira_allocno_object_iterator oi;
+  ira_object_t obj;
+  FOR_EACH_ALLOCNO_OBJECT (from, obj, oi)
+    ira_create_object (to, OBJECT_START (obj), OBJECT_NREGS (obj));
 }
 
 /* For each allocno, set ALLOCNO_NUM_OBJECTS and create the
@@ -662,11 +721,11 @@ merge_hard_reg_conflicts (ira_allocno_t from, ira_allocno_t to,
 			  bool total_only)
 {
   int i;
-  gcc_assert (ALLOCNO_NUM_OBJECTS (to) == ALLOCNO_NUM_OBJECTS (from));
-  for (i = 0; i < ALLOCNO_NUM_OBJECTS (to); i++)
+  for (i = 0; i < ALLOCNO_NUM_OBJECTS (from); i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
 
       if (!total_only)
 	OBJECT_CONFLICT_HARD_REGS (to_obj)
@@ -960,7 +1019,7 @@ create_cap_allocno (ira_allocno_t a)
   ALLOCNO_WMODE (cap) = ALLOCNO_WMODE (a);
   aclass = ALLOCNO_CLASS (a);
   ira_set_allocno_class (cap, aclass);
-  ira_create_allocno_objects (cap);
+  ira_copy_allocno_objects (cap, a);
   ALLOCNO_CAP_MEMBER (cap) = a;
   ALLOCNO_CAP (a) = cap;
   ALLOCNO_CLASS_COST (cap) = ALLOCNO_CLASS_COST (a);
@@ -1902,6 +1961,26 @@ ira_traverse_loop_tree (bool bb_p, ira_loop_tree_node_t loop_node,
 /* The basic block currently being processed.  */
 static basic_block curr_bb;
 
+/* Return true if A's subregs has a subreg with same SIZE and OFFSET.  */
+static bool
+find_subreg_p (ira_allocno_t a, const subreg_range &r)
+{
+  for (const auto &item : a->subregs)
+    if (item.start == r.start && item.end == r.end)
+      return true;
+  return false;
+}
+
+/* Return start and nregs subregs from DF_LIVE_SUBREG.  */
+static void
+add_subregs (ira_allocno_t a, const subreg_ranges &sr)
+{
+  gcc_assert (get_nblocks (ALLOCNO_MODE (a)) == sr.max);
+  for (const subreg_range &r : sr.ranges)
+    if (!find_subreg_p (a, r))
+      a->subregs.push_back (r);
+}
+
 /* This recursive function creates allocnos corresponding to
    pseudo-registers containing in X.  True OUTPUT_P means that X is
    an lvalue.  OUTER corresponds to the parent expression of X.  */
@@ -1931,6 +2010,14 @@ create_insn_allocnos (rtx x, rtx outer, bool output_p)
 		}
 	    }
 
+	  /* Collect subreg reference.  */
+	  if (outer != NULL && read_modify_subreg_p (outer))
+	    {
+	      const subreg_range r = get_range (outer);
+	      if (!find_subreg_p (a, r))
+		a->subregs.push_back (r);
+	    }
+
 	  ALLOCNO_NREFS (a)++;
 	  ALLOCNO_FREQ (a) += REG_FREQ_FROM_BB (curr_bb);
 	  if (output_p)
@@ -1998,8 +2085,21 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
 
   EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_PARTIAL_IN (bb),
 			     FIRST_PSEUDO_REGISTER, i, bi)
-    if (ira_curr_regno_allocno_map[i] == NULL)
-      ira_create_allocno (i, false, ira_curr_loop_tree_node);
+    {
+      if (ira_curr_regno_allocno_map[i] == NULL)
+	ira_create_allocno (i, false, ira_curr_loop_tree_node);
+      add_subregs (ira_curr_regno_allocno_map[i],
+		   DF_LIVE_SUBREG_RANGE_IN (bb)->lives.at (i));
+    }
+
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+			     FIRST_PSEUDO_REGISTER, i, bi)
+    {
+      if (ira_curr_regno_allocno_map[i] == NULL)
+	ira_create_allocno (i, false, ira_curr_loop_tree_node);
+      add_subregs (ira_curr_regno_allocno_map[i],
+		   DF_LIVE_SUBREG_RANGE_OUT (bb)->lives.at (i));
+    }
 }
 
 /* Create allocnos corresponding to pseudo-registers living on edge E
@@ -2214,20 +2314,20 @@ move_allocno_live_ranges (ira_allocno_t from, ira_allocno_t to)
   int i;
   int n = ALLOCNO_NUM_OBJECTS (from);
 
-  gcc_assert (n == ALLOCNO_NUM_OBJECTS (to));
-
   for (i = 0; i < n; i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
       live_range_t lr = OBJECT_LIVE_RANGES (from_obj);
 
       if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
 	{
 	  fprintf (ira_dump_file,
-		   "      Moving ranges of a%dr%d to a%dr%d: ",
+		   "      Moving ranges of a%dr%d_obj%d to a%dr%d_obj%d: ",
 		   ALLOCNO_NUM (from), ALLOCNO_REGNO (from),
-		   ALLOCNO_NUM (to), ALLOCNO_REGNO (to));
+		   OBJECT_INDEX (from_obj), ALLOCNO_NUM (to),
+		   ALLOCNO_REGNO (to), OBJECT_INDEX (to_obj));
 	  ira_print_live_range_list (ira_dump_file, lr);
 	}
       change_object_in_range_list (lr, to_obj);
@@ -2243,12 +2343,11 @@ copy_allocno_live_ranges (ira_allocno_t from, ira_allocno_t to)
   int i;
   int n = ALLOCNO_NUM_OBJECTS (from);
 
-  gcc_assert (n == ALLOCNO_NUM_OBJECTS (to));
-
   for (i = 0; i < n; i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
       live_range_t lr = OBJECT_LIVE_RANGES (from_obj);
 
       if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
@@ -2860,15 +2959,17 @@ setup_min_max_allocno_live_range_point (void)
 		ira_assert (OBJECT_LIVE_RANGES (obj) == NULL);
 		OBJECT_MAX (obj) = 0;
 		OBJECT_MIN (obj) = 1;
-		continue;
 	      }
 	    ira_assert (ALLOCNO_CAP_MEMBER (a) == NULL);
 	    /* Accumulation of range info.  */
 	    if (ALLOCNO_CAP (a) != NULL)
 	      {
-		for (cap = ALLOCNO_CAP (a); cap != NULL; cap = ALLOCNO_CAP (cap))
+		for (cap = ALLOCNO_CAP (a); cap != NULL;
+		     cap = ALLOCNO_CAP (cap))
 		  {
-		    ira_object_t cap_obj = ALLOCNO_OBJECT (cap, j);
+		    ira_object_t cap_obj = find_object (cap, OBJECT_START (obj),
+							OBJECT_NREGS (obj));
+		    gcc_assert (cap_obj != NULL);
 		    if (OBJECT_MAX (cap_obj) < OBJECT_MAX (obj))
 		      OBJECT_MAX (cap_obj) = OBJECT_MAX (obj);
 		    if (OBJECT_MIN (cap_obj) > OBJECT_MIN (obj))
@@ -2879,7 +2980,9 @@ setup_min_max_allocno_live_range_point (void)
 	    if ((parent = ALLOCNO_LOOP_TREE_NODE (a)->parent) == NULL)
 	      continue;
 	    parent_a = parent->regno_allocno_map[i];
-	    parent_obj = ALLOCNO_OBJECT (parent_a, j);
+	    parent_obj
+	      = find_object (parent_a, OBJECT_START (obj), OBJECT_NREGS (obj));
+	    gcc_assert (parent_obj != NULL);
 	    if (OBJECT_MAX (parent_obj) < OBJECT_MAX (obj))
 	      OBJECT_MAX (parent_obj) = OBJECT_MAX (obj);
 	    if (OBJECT_MIN (parent_obj) > OBJECT_MIN (obj))
@@ -3538,30 +3641,6 @@ update_conflict_hard_reg_costs (void)
     }
 }
 
-/* Traverse all instructions to determine which ones have access through subreg.
- */
-static void
-init_regs_with_subreg ()
-{
-  bitmap_initialize (&regs_with_subreg, &reg_obstack);
-  basic_block bb;
-  rtx_insn *insn;
-  df_ref def, use;
-  FOR_ALL_BB_FN (bb, cfun)
-    FOR_BB_INSNS (bb, insn)
-      {
-	if (!NONDEBUG_INSN_P (insn))
-	  continue;
-	df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
-	FOR_EACH_INSN_INFO_DEF (def, insn_info)
-	  if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_SUBREG))
-	    bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (def));
-	FOR_EACH_INSN_INFO_USE (use, insn_info)
-	  if (DF_REF_FLAGS (use) & (DF_REF_PARTIAL | DF_REF_SUBREG))
-	    bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (use));
-      }
-}
-
 /* Create a internal representation (IR) for IRA (allocnos, copies,
    loop tree nodes).  The function returns TRUE if we generate loop
    structure (besides nodes representing all function and the basic
@@ -3577,7 +3656,6 @@ ira_build (void)
   initiate_allocnos ();
   initiate_prefs ();
   initiate_copies ();
-  init_regs_with_subreg ();
   create_loop_tree_nodes ();
   form_loop_tree ();
   create_allocnos ();
@@ -3668,5 +3746,4 @@ ira_destroy (void)
   finish_allocnos ();
   finish_cost_vectors ();
   ira_finish_allocno_live_ranges ();
-  bitmap_clear (&regs_with_subreg);
 }
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f1b96d1aee6..8aed25144b9 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_MAP
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -852,18 +853,17 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
   node_preorder_num = node->preorder_num;
   node_set = node->hard_regs->set;
   node_check_tick++;
+  /* Collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
   for (k = 0; k < nobj; k++)
     {
       ira_object_t obj = ALLOCNO_OBJECT (a, k);
       ira_object_t conflict_obj;
       ira_object_conflict_iterator oci;
-      
+
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	{
-	  int size;
- 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
-	  allocno_hard_regs_node_t conflict_node, temp_node;
-	  HARD_REG_SET conflict_node_set;
+	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
 	  allocno_color_data_t conflict_data;
 
 	  conflict_data = ALLOCNO_COLOR_DATA (conflict_a);
@@ -872,6 +872,24 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
 					     conflict_data
 					     ->profitable_hard_regs))
 	    continue;
+	  int num = ALLOCNO_NUM (conflict_a);
+	  if (allocno_conflict_regs.count (num) == 0)
+	    allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+	  bitmap_head temp;
+	  bitmap_initialize (&temp, &reg_obstack);
+	  bitmap_set_range (&temp, OBJECT_START (conflict_obj),
+			    OBJECT_NREGS (conflict_obj));
+	  bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+	  int size = bitmap_count_bits (&temp);
+	  bitmap_clear (&temp);
+	  if (size == 0)
+	    continue;
+
+	  bitmap_set_range (allocno_conflict_regs.at (num),
+			    OBJECT_START (conflict_obj),
+			    OBJECT_NREGS (conflict_obj));
+	  allocno_hard_regs_node_t conflict_node, temp_node;
+	  HARD_REG_SET conflict_node_set;
 	  conflict_node = conflict_data->hard_regs_node;
 	  conflict_node_set = conflict_node->hard_regs->set;
 	  if (hard_reg_set_subset_p (node_set, conflict_node_set))
@@ -886,14 +904,13 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
 	      temp_node->check = node_check_tick;
 	      temp_node->conflict_size = 0;
 	    }
-	  size = (ira_reg_class_max_nregs
-		  [ALLOCNO_CLASS (conflict_a)][ALLOCNO_MODE (conflict_a)]);
-	  if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1)
-	    /* We will deal with the subwords individually.  */
-	    size = 1;
 	  temp_node->conflict_size += size;
 	}
     }
+  /* Setup conflict nregs of ALLOCNO.  */
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
+
   for (i = 0; i < data->hard_regs_subnodes_num; i++)
     {
       allocno_hard_regs_node_t temp_node;
@@ -2746,21 +2763,16 @@ push_allocno_to_stack (ira_allocno_t a)
 {
   enum reg_class aclass;
   allocno_color_data_t data, conflict_data;
-  int size, i, n = ALLOCNO_NUM_OBJECTS (a);
-    
+  int i, n = ALLOCNO_NUM_OBJECTS (a);
+
   data = ALLOCNO_COLOR_DATA (a);
   data->in_graph_p = false;
   allocno_stack_vec.safe_push (a);
   aclass = ALLOCNO_CLASS (a);
   if (aclass == NO_REGS)
     return;
-  size = ira_reg_class_max_nregs[aclass][ALLOCNO_MODE (a)];
-  if (n > 1)
-    {
-      /* We will deal with the subwords individually.  */
-      gcc_assert (size == ALLOCNO_NUM_OBJECTS (a));
-      size = 1;
-    }
+  /* Already collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
   for (i = 0; i < n; i++)
     {
       ira_object_t obj = ALLOCNO_OBJECT (a, i);
@@ -2785,6 +2797,21 @@ push_allocno_to_stack (ira_allocno_t a)
 	    continue;
 	  ira_assert (bitmap_bit_p (coloring_allocno_bitmap,
 				    ALLOCNO_NUM (conflict_a)));
+
+	  int num = ALLOCNO_NUM (conflict_a);
+	  if (allocno_conflict_regs.count (num) == 0)
+	    allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+	  bitmap_head temp;
+	  bitmap_initialize (&temp, &reg_obstack);
+	  bitmap_set_range (&temp, OBJECT_START (obj), OBJECT_NREGS (obj));
+	  bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+	  int size = bitmap_count_bits (&temp);
+	  bitmap_clear (&temp);
+	  if (size == 0)
+	    continue;
+
+	  bitmap_set_range (allocno_conflict_regs.at (num), OBJECT_START (obj),
+			    OBJECT_NREGS (obj));
 	  if (update_left_conflict_sizes_p (conflict_a, a, size))
 	    {
 	      delete_allocno_from_bucket
@@ -2800,6 +2827,9 @@ push_allocno_to_stack (ira_allocno_t a)
 	  
 	}
     }
+
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
 }
 
 /* Put ALLOCNO onto the coloring stack and remove it from its bucket.
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index a4d93c8d734..0585ad10043 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -60,23 +60,8 @@ static IRA_INT_TYPE **conflicts;
 static void
 record_object_conflict (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
-  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
-  int w1 = OBJECT_SUBWORD (obj1);
-  int w2 = OBJECT_SUBWORD (obj2);
-  int id1, id2;
-
-  /* Canonicalize the conflict.  If two identically-numbered words
-     conflict, always record this as a conflict between words 0.  That
-     is the only information we need, and it is easier to test for if
-     it is collected in each allocno's lowest-order object.  */
-  if (w1 == w2 && w1 > 0)
-    {
-      obj1 = ALLOCNO_OBJECT (a1, 0);
-      obj2 = ALLOCNO_OBJECT (a2, 0);
-    }
-  id1 = OBJECT_CONFLICT_ID (obj1);
-  id2 = OBJECT_CONFLICT_ID (obj2);
+  int id1 = OBJECT_CONFLICT_ID (obj1);
+  int id2 = OBJECT_CONFLICT_ID (obj2);
 
   SET_MINMAX_SET_BIT (conflicts[id1], id2, OBJECT_MIN (obj1),
 		      OBJECT_MAX (obj1));
@@ -606,8 +591,8 @@ build_object_conflicts (ira_object_t obj)
   if (parent_a == NULL)
     return;
   ira_assert (ALLOCNO_CLASS (a) == ALLOCNO_CLASS (parent_a));
-  ira_assert (ALLOCNO_NUM_OBJECTS (a) == ALLOCNO_NUM_OBJECTS (parent_a));
-  parent_obj = ALLOCNO_OBJECT (parent_a, OBJECT_SUBWORD (obj));
+  parent_obj
+    = find_object_anyway (parent_a, OBJECT_START (obj), OBJECT_NREGS (obj));
   parent_num = OBJECT_CONFLICT_ID (parent_obj);
   parent_min = OBJECT_MIN (parent_obj);
   parent_max = OBJECT_MAX (parent_obj);
@@ -616,7 +601,6 @@ build_object_conflicts (ira_object_t obj)
     {
       ira_object_t another_obj = ira_object_id_map[i];
       ira_allocno_t another_a = OBJECT_ALLOCNO (another_obj);
-      int another_word = OBJECT_SUBWORD (another_obj);
 
       ira_assert (ira_reg_classes_intersect_p
 		  [ALLOCNO_CLASS (a)][ALLOCNO_CLASS (another_a)]);
@@ -627,11 +611,11 @@ build_object_conflicts (ira_object_t obj)
       ira_assert (ALLOCNO_NUM (another_parent_a) >= 0);
       ira_assert (ALLOCNO_CLASS (another_a)
 		  == ALLOCNO_CLASS (another_parent_a));
-      ira_assert (ALLOCNO_NUM_OBJECTS (another_a)
-		  == ALLOCNO_NUM_OBJECTS (another_parent_a));
       SET_MINMAX_SET_BIT (conflicts[parent_num],
-			  OBJECT_CONFLICT_ID (ALLOCNO_OBJECT (another_parent_a,
-							      another_word)),
+			  OBJECT_CONFLICT_ID (
+			    find_object_anyway (another_parent_a,
+						OBJECT_START (another_obj),
+						OBJECT_NREGS (another_obj))),
 			  parent_min, parent_max);
     }
 }
@@ -659,9 +643,10 @@ build_conflicts (void)
 	    build_object_conflicts (obj);
 	    for (cap = ALLOCNO_CAP (a); cap != NULL; cap = ALLOCNO_CAP (cap))
 	      {
-		ira_object_t cap_obj = ALLOCNO_OBJECT (cap, j);
-		gcc_assert (ALLOCNO_NUM_OBJECTS (cap) == ALLOCNO_NUM_OBJECTS (a));
-		build_object_conflicts (cap_obj);
+		  ira_object_t cap_obj
+		    = find_object_anyway (cap, OBJECT_START (obj),
+					  OBJECT_NREGS (obj));
+		  build_object_conflicts (cap_obj);
 	      }
 	  }
       }
@@ -736,7 +721,8 @@ print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
 	}
 
       if (n > 1)
-	fprintf (file, "\n;;   subobject %d:", i);
+	fprintf (file, "\n;;   subobject s%d,n%d,f%d:", OBJECT_START (obj),
+		 OBJECT_NREGS (obj), ALLOCNO_NREGS (a));
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	{
 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
@@ -746,8 +732,10 @@ print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
 	    {
 	      fprintf (file, " a%d(r%d", ALLOCNO_NUM (conflict_a),
 		       ALLOCNO_REGNO (conflict_a));
-	      if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1)
-		fprintf (file, ",w%d", OBJECT_SUBWORD (conflict_obj));
+	      if (has_subreg_object_p (conflict_a))
+		  fprintf (file, ",s%d,n%d,f%d", OBJECT_START (conflict_obj),
+			   OBJECT_NREGS (conflict_obj),
+			   ALLOCNO_NREGS (conflict_a));
 	      if ((bb = ALLOCNO_LOOP_TREE_NODE (conflict_a)->bb) != NULL)
 		fprintf (file, ",b%d", bb->index);
 	      else
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index 84ed482e568..9dc7f3c655e 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -854,7 +854,7 @@ modify_move_list (move_t list)
 		ALLOCNO_MODE (new_allocno) = ALLOCNO_MODE (set_move->to);
 		ira_set_allocno_class (new_allocno,
 				       ALLOCNO_CLASS (set_move->to));
-		ira_create_allocno_objects (new_allocno);
+		ira_copy_allocno_objects (new_allocno, set_move->to);
 		ALLOCNO_ASSIGNED_P (new_allocno) = true;
 		ALLOCNO_HARD_REGNO (new_allocno) = -1;
 		ALLOCNO_EMIT_DATA (new_allocno)->reg
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index b6281d3df6d..b9e24328867 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "function-abi.h"
 #include <vector>
+#include "subreg-live-range.h"
 
 /* To provide consistency in naming, all IRA external variables,
    functions, common typedefs start with prefix ira_.  */
@@ -223,7 +224,7 @@ extern int ira_max_point;
 extern live_range_t *ira_start_point_ranges, *ira_finish_point_ranges;
 
 /* A structure representing conflict information for an allocno
-   (or one of its subwords).  */
+   (or one of its subregs).  */
 struct ira_object
 {
   /* The allocno associated with this record.  */
@@ -237,17 +238,9 @@ struct ira_object
      ranges in the list are not intersected and ordered by decreasing
      their program points*.  */
   live_range_t live_ranges;
-  /* The subword within ALLOCNO which is represented by this object.
-     Zero means the lowest-order subword (or the entire allocno in case
-     it is not being tracked in subwords).  */
-  int subword;
   /* Reprensent OBJECT occupied [start, start + nregs) registers of it's
      ALLOCNO.  */
   int start, nregs;
-  /* Reprensent the size and offset of current object, use to track subreg
-     range, For full reg, the size is GET_MODE_SIZE (ALLOCNO_MODE (allocno)),
-     offset is 0.  */
-  poly_int64 size, offset;
   /* Allocated size of the conflicts array.  */
   unsigned int conflicts_array_size;
   /* A unique number for every instance of this structure, which is used
@@ -400,6 +393,9 @@ struct ira_allocno
      more than one such object in cases where the allocno represents a
      multi-hardreg pesudo.  */
   std::vector<ira_object_t> objects;
+  /* An array of structures decribing the subreg mode start and subreg end for
+     this allocno.  */
+  std::vector<subreg_range> subregs;
   /* Registers clobbered by intersected calls.  */
    HARD_REG_SET crossed_calls_clobbered_regs;
   /* Array of usage costs (accumulated and the one updated during
@@ -526,9 +522,6 @@ allocno_emit_reg (ira_allocno_t a)
 }
 
 #define OBJECT_ALLOCNO(O) ((O)->allocno)
-#define OBJECT_SIZE(O) ((O)->size)
-#define OBJECT_OFFSET(O) ((O)->offset)
-#define OBJECT_SUBWORD(O) ((O)->subword)
 #define OBJECT_CONFLICT_ARRAY(O) ((O)->conflicts_array)
 #define OBJECT_CONFLICT_VEC(O) ((ira_object_t *)(O)->conflicts_array)
 #define OBJECT_CONFLICT_BITVEC(O) ((IRA_INT_TYPE *)(O)->conflicts_array)
@@ -1062,6 +1055,10 @@ extern bool ira_build (void);
 extern void ira_destroy (void);
 extern ira_object_t
 find_object (ira_allocno_t, int, int);
+extern ira_object_t find_object (ira_allocno_t, poly_int64, poly_int64);
+ira_object_t
+find_object_anyway (ira_allocno_t a, int start, int nregs);
+extern void ira_copy_allocno_objects (ira_allocno_t, ira_allocno_t);
 
 /* ira-costs.cc */
 extern void ira_init_costs_once (void);
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 60e6be0b0ae..e00898c0ccd 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -35,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sparseset.h"
 #include "function-abi.h"
 #include "except.h"
+#include "subreg-live-range.h"
 
 /* The code in this file is similar to one in global but the code
    works on the allocno basis and creates live ranges instead of
@@ -91,6 +93,9 @@ static alternative_mask preferred_alternatives;
    we should not add a conflict with the copy's destination operand.  */
 static rtx ignore_reg_for_conflicts;
 
+/* Store def/use point of has_subreg_object_p register.  */
+static class subregs_live_points *subreg_live_points;
+
 /* Record hard register REGNO as now being live.  */
 static void
 make_hard_regno_live (int regno)
@@ -98,6 +103,33 @@ make_hard_regno_live (int regno)
   SET_HARD_REG_BIT (hard_regs_live, regno);
 }
 
+/* Update conflict hard regs of ALLOCNO a for current live part.  */
+static void
+set_subreg_conflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)
+{
+  gcc_assert (has_subreg_object_p (a));
+
+  if (subreg_live_points->subreg_live_ranges.count (ALLOCNO_NUM (a)) == 0)
+    return;
+
+  for (const subreg_range &r :
+       subreg_live_points->subreg_live_ranges.at (ALLOCNO_NUM (a)).ranges)
+    {
+      ira_object_t obj = find_object_anyway (a, r.start, r.end - r.start);
+      OBJECT_CONFLICT_HARD_REGS (obj) |= regs;
+      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= regs;
+    }
+}
+
+static void
+set_subreg_conflict_hard_regs (ira_allocno_t a, unsigned int regno)
+{
+  HARD_REG_SET set;
+  CLEAR_HARD_REG_SET (set);
+  SET_HARD_REG_BIT (set, regno);
+  set_subreg_conflict_hard_regs (a, set);
+}
+
 /* Process the definition of hard register REGNO.  This updates
    hard_regs_live and hard reg conflict information for living allocnos.  */
 static void
@@ -113,8 +145,13 @@ make_hard_regno_dead (int regno)
 	     == (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)))
 	continue;
 
-      SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
-      SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
+      if (has_subreg_object_p (OBJECT_ALLOCNO (obj)))
+	set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);
+      else
+	{
+	  SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
+	  SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
+	}
     }
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
 }
@@ -127,9 +164,29 @@ make_object_live (ira_object_t obj)
   sparseset_set_bit (objects_live, OBJECT_CONFLICT_ID (obj));
 
   live_range_t lr = OBJECT_LIVE_RANGES (obj);
-  if (lr == NULL
-      || (lr->finish != curr_point && lr->finish + 1 != curr_point))
-    ira_add_live_range_to_object (obj, curr_point, -1);
+  if (lr == NULL || (lr->finish != curr_point && lr->finish + 1 != curr_point))
+    {
+      ira_add_live_range_to_object (obj, curr_point, -1);
+      if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+	{
+	  fprintf (ira_dump_file,
+		   "     add new live_range for a%d(r%d): [%d...-1]\n",
+		   ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+		   ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), curr_point);
+	}
+    }
+  else
+    {
+      if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+	{
+	  fprintf (
+	    ira_dump_file,
+	    "     use old live_range for a%d(r%d): [%d...%d], curr: %d\n",
+	    ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+	    ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), lr->start, lr->finish,
+	    curr_point);
+	}
+    }
 }
 
 /* Update ALLOCNO_EXCESS_PRESSURE_POINTS_NUM for the allocno
@@ -140,7 +197,6 @@ update_allocno_pressure_excess_length (ira_object_t obj)
   ira_allocno_t a = OBJECT_ALLOCNO (obj);
   int start, i;
   enum reg_class aclass, pclass, cl;
-  live_range_t p;
 
   aclass = ALLOCNO_CLASS (a);
   pclass = ira_pressure_class_translate[aclass];
@@ -152,10 +208,18 @@ update_allocno_pressure_excess_length (ira_object_t obj)
 	continue;
       if (high_pressure_start_point[cl] < 0)
 	continue;
-      p = OBJECT_LIVE_RANGES (obj);
-      ira_assert (p != NULL);
-      start = (high_pressure_start_point[cl] > p->start
-	       ? high_pressure_start_point[cl] : p->start);
+      int start_point;
+      if (has_subreg_object_p (a))
+	start_point = subreg_live_points->get_start_point (ALLOCNO_NUM (a));
+      else
+	{
+	  live_range_t p = OBJECT_LIVE_RANGES (obj);
+	  ira_assert (p != NULL);
+	  start_point = p->start;
+	}
+      start = (high_pressure_start_point[cl] > start_point
+		 ? high_pressure_start_point[cl]
+		 : start_point);
       ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) += curr_point - start + 1;
     }
 }
@@ -201,6 +265,14 @@ make_object_dead (ira_object_t obj)
     CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
 
   lr = OBJECT_LIVE_RANGES (obj);
+  if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+    {
+      fprintf (ira_dump_file,
+	       "     finish a live_range a%d(r%d): [%d...%d] => [%d...%d]\n",
+	       ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+	       ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), lr->start, lr->finish,
+	       lr->start, curr_point);
+    }
   ira_assert (lr != NULL);
   lr->finish = curr_point;
   update_allocno_pressure_excess_length (obj);
@@ -295,77 +367,144 @@ pseudo_regno_single_word_and_live_p (int regno)
   return sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj));
 }
 
-/* Mark the pseudo register REGNO as live.  Update all information about
-   live ranges and register pressure.  */
+/* Collect the point which the OBJ be def/use.  */
 static void
-mark_pseudo_regno_live (int regno)
+add_subreg_point (ira_object_t obj, bool is_def, bool is_dec = true)
 {
-  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  enum reg_class pclass;
-  int i, n, nregs;
-
-  if (a == NULL)
-    return;
+  ira_allocno_t a = OBJECT_ALLOCNO (obj);
+  if (is_def)
+    {
+      OBJECT_CONFLICT_HARD_REGS (obj) |= hard_regs_live;
+      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= hard_regs_live;
+      if (is_dec)
+	{
+	  enum reg_class pclass
+	    = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+	  dec_register_pressure (pclass, ALLOCNO_NREGS (a));
+	}
+      update_allocno_pressure_excess_length (obj);
+    }
+  else
+    {
+      enum reg_class pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      inc_register_pressure (pclass, ALLOCNO_NREGS (a));
+    }
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  subreg_range r = subreg_range (
+    {OBJECT_START (obj), OBJECT_START (obj) + OBJECT_NREGS (obj)});
+  subreg_live_points->add_point (ALLOCNO_NUM (a), ALLOCNO_NREGS (a), r, is_def,
+				 curr_point);
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
-  if (n > 1)
+  if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
     {
-      /* We track every subobject separately.  */
-      gcc_assert (nregs == n);
-      nregs = 1;
+      fprintf (ira_dump_file, "     %s a%d(r%d", is_def ? "def" : "use",
+	       ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
+      if (ALLOCNO_CLASS (a) != NO_REGS
+	  && ALLOCNO_NREGS (a) != OBJECT_NREGS (obj))
+	fprintf (ira_dump_file, " [subreg: start %d, nregs %d]",
+		 OBJECT_START (obj), OBJECT_NREGS (obj));
+      else
+	fprintf (ira_dump_file, " [full: nregs %d]", OBJECT_NREGS (obj));
+      fprintf (ira_dump_file, ") at point %d\n", curr_point);
     }
 
-  for (i = 0; i < n; i++)
-    {
-      ira_object_t obj = ALLOCNO_OBJECT (a, i);
+  gcc_assert (has_subreg_object_p (a));
+  gcc_assert (subreg_live_points->subreg_live_ranges.count (ALLOCNO_NUM (a))
+	      != 0);
+
+  const subreg_ranges &sr
+    = subreg_live_points->subreg_live_ranges.at (ALLOCNO_NUM (a));
+  ira_object_t main_obj = find_object (a, 0, ALLOCNO_NREGS (a));
+  gcc_assert (main_obj != NULL);
+  if (sr.empty_p ()
+      && sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (main_obj)))
+    sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (main_obj));
+  else if (!sr.empty_p ()
+	   && !sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (main_obj)))
+    sparseset_set_bit (objects_live, OBJECT_CONFLICT_ID (main_obj));
+}
 
+/* Mark the object OBJ as live.  */
+static void
+mark_pseudo_object_live (ira_allocno_t a, ira_object_t obj)
+{
+  /* Invalidate because it is referenced.  */
+  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+
+  if (has_subreg_object_p (a))
+    add_subreg_point (obj, false);
+  else
+    {
       if (sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-	continue;
+	return;
 
-      inc_register_pressure (pclass, nregs);
+      enum reg_class pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      inc_register_pressure (pclass, ALLOCNO_NREGS (a));
       make_object_live (obj);
     }
 }
 
+/* Mark the pseudo register REGNO as live.  Update all information about
+   live ranges and register pressure.  */
+static void
+mark_pseudo_regno_live (int regno)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+
+  if (a == NULL)
+    return;
+
+  int nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  ira_object_t obj = find_object (a, 0, nregs);
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_live (a, obj);
+}
+
 /* Like mark_pseudo_regno_live, but try to only mark one subword of
    the pseudo as live.  SUBWORD indicates which; a value of 0
    indicates the low part.  */
 static void
-mark_pseudo_regno_subword_live (int regno, int subword)
+mark_pseudo_regno_subreg_live (int regno, rtx subreg)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n;
-  enum reg_class pclass;
-  ira_object_t obj;
 
   if (a == NULL)
     return;
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  ira_object_t obj
+    = find_object (a, SUBREG_BYTE (subreg), GET_MODE_SIZE (GET_MODE (subreg)));
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_live (a, obj);
+}
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  if (n == 1)
+/* Mark objects in subreg ranges SR as live.  Update all information about
+   live ranges and register pressure.  */
+static void
+mark_pseudo_regno_subregs_live (int regno, const subreg_ranges &sr)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+  if (a == NULL)
+    return;
+
+  if (!ALLOCNO_TRACK_SUBREG_P (a))
     {
       mark_pseudo_regno_live (regno);
       return;
     }
 
-  pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  gcc_assert
-    (n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
-  obj = ALLOCNO_OBJECT (a, subword);
-
-  if (sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-    return;
-
-  inc_register_pressure (pclass, 1);
-  make_object_live (obj);
+  int times = sr.max / ALLOCNO_NREGS (a);
+  gcc_assert (sr.max >= ALLOCNO_NREGS (a)
+	      && times * ALLOCNO_NREGS (a) == sr.max);
+  for (const subreg_range &range : sr.ranges)
+    {
+      int start = range.start / times;
+      int end = CEIL (range.end, times);
+      ira_object_t obj = find_object (a, start, end - start);
+      gcc_assert (obj != NULL);
+      mark_pseudo_object_live (a, obj);
+    }
 }
 
 /* Mark the register REG as live.  Store a 1 in hard_regs_live for
@@ -403,10 +542,7 @@ static void
 mark_pseudo_reg_live (rtx orig_reg, unsigned regno)
 {
   if (read_modify_subreg_p (orig_reg))
-    {
-      mark_pseudo_regno_subword_live (regno,
-				      subreg_lowpart_p (orig_reg) ? 0 : 1);
-    }
+    mark_pseudo_regno_subreg_live (regno, orig_reg);
   else
     mark_pseudo_regno_live (regno);
 }
@@ -427,72 +563,59 @@ mark_ref_live (df_ref ref)
     mark_hard_reg_live (reg);
 }
 
-/* Mark the pseudo register REGNO as dead.  Update all information about
-   live ranges and register pressure.  */
+/* Mark object as dead.  */
 static void
-mark_pseudo_regno_dead (int regno)
+mark_pseudo_object_dead (ira_allocno_t a, ira_object_t obj)
 {
-  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n, i, nregs;
-  enum reg_class cl;
-
-  if (a == NULL)
-    return;
-
   /* Invalidate because it is referenced.  */
   allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
-  if (n > 1)
-    {
-      /* We track every subobject separately.  */
-      gcc_assert (nregs == n);
-      nregs = 1;
-    }
-  for (i = 0; i < n; i++)
+  if (has_subreg_object_p (a))
+    add_subreg_point (obj, true);
+  else
     {
-      ira_object_t obj = ALLOCNO_OBJECT (a, i);
       if (!sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-	continue;
+	return;
 
-      dec_register_pressure (cl, nregs);
+      enum reg_class cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      dec_register_pressure (cl, ALLOCNO_NREGS (a));
       make_object_dead (obj);
     }
 }
 
-/* Like mark_pseudo_regno_dead, but called when we know that only part of the
-   register dies.  SUBWORD indicates which; a value of 0 indicates the low part.  */
+/* Mark the pseudo register REGNO as dead.  Update all information about
+   live ranges and register pressure.  */
 static void
-mark_pseudo_regno_subword_dead (int regno, int subword)
+mark_pseudo_regno_dead (int regno)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n;
-  enum reg_class cl;
-  ira_object_t obj;
 
   if (a == NULL)
     return;
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  int nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  ira_object_t obj = find_object (a, 0, nregs);
+  gcc_assert (obj != NULL);
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  if (n == 1)
-    /* The allocno as a whole doesn't die in this case.  */
-    return;
+  mark_pseudo_object_dead (a, obj);
+}
 
-  cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  gcc_assert
-    (n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
+/* Like mark_pseudo_regno_dead, but called when we know that only part of the
+   register dies.  SUBWORD indicates which; a value of 0 indicates the low part.
+ */
+static void
+mark_pseudo_regno_subreg_dead (int regno, rtx subreg)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
 
-  obj = ALLOCNO_OBJECT (a, subword);
-  if (!sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
+  if (a == NULL)
     return;
 
-  dec_register_pressure (cl, 1);
-  make_object_dead (obj);
+  ira_object_t obj
+    = find_object (a, SUBREG_BYTE (subreg), GET_MODE_SIZE (GET_MODE (subreg)));
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_dead (a, obj);
 }
 
 /* Process the definition of hard register REG.  This updates hard_regs_live
@@ -528,10 +651,7 @@ static void
 mark_pseudo_reg_dead (rtx orig_reg, unsigned regno)
 {
   if (read_modify_subreg_p (orig_reg))
-    {
-      mark_pseudo_regno_subword_dead (regno,
-				      subreg_lowpart_p (orig_reg) ? 0 : 1);
-    }
+    mark_pseudo_regno_subreg_dead (regno, orig_reg);
   else
     mark_pseudo_regno_dead (regno);
 }
@@ -1059,8 +1179,14 @@ process_single_reg_class_operands (bool in_p, int freq)
 	      /* We could increase costs of A instead of making it
 		 conflicting with the hard register.  But it works worse
 		 because it will be spilled in reload in anyway.  */
-	      OBJECT_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
-	      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
+	    if (has_subreg_object_p (a))
+	      set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj),
+					     reg_class_contents[cl]);
+	    else
+	      {
+		OBJECT_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
+		OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
+	      }
 	    }
 	}
     }
@@ -1198,17 +1324,15 @@ process_out_of_region_eh_regs (basic_block bb)
 			    bi)
     {
       ira_allocno_t a = ira_curr_regno_allocno_map[i];
-      for (int n = ALLOCNO_NUM_OBJECTS (a) - 1; n >= 0; n--)
+      ira_object_t obj = find_object (a, 0, ALLOCNO_NREGS (a));
+      for (int k = 0;; k++)
 	{
-	  ira_object_t obj = ALLOCNO_OBJECT (a, n);
-	  for (int k = 0;; k++)
-	    {
-	      unsigned int regno = EH_RETURN_DATA_REGNO (k);
-	      if (regno == INVALID_REGNUM)
-		break;
-	      SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
-	      SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
-	    }
+	  unsigned int regno = EH_RETURN_DATA_REGNO (k);
+	  if (regno == INVALID_REGNUM)
+	    break;
+
+	  SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
+	  SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
 	}
     }
 }
@@ -1234,6 +1358,10 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
   bb = loop_tree_node->bb;
   if (bb != NULL)
     {
+      if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+	fprintf (ira_dump_file, "\n   BB exit(l%d): point = %d\n",
+		 loop_tree_node->parent->loop_num, curr_point);
+
       for (i = 0; i < ira_pressure_classes_num; i++)
 	{
 	  curr_reg_pressure[ira_pressure_classes[i]] = 0;
@@ -1242,6 +1370,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
       curr_bb_node = loop_tree_node;
       reg_live_out = DF_LIVE_SUBREG_OUT (bb);
       sparseset_clear (objects_live);
+      subreg_live_points->clear_live_ranges ();
       REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
       hard_regs_live &= ~(eliminable_regset | ira_no_alloc_regs);
       for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
@@ -1265,9 +1394,17 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 			    <= ira_class_hard_regs_num[cl]);
 	      }
 	  }
-      EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_FULL_OUT (bb),
+				FIRST_PSEUDO_REGISTER, j, bi)
 	mark_pseudo_regno_live (j);
 
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+				FIRST_PSEUDO_REGISTER, j, bi)
+	{
+	  mark_pseudo_regno_subregs_live (
+	    j, DF_LIVE_SUBREG_RANGE_OUT (bb)->lives.at (j));
+	}
+
 #ifdef EH_RETURN_DATA_REGNO
       process_out_of_region_eh_regs (bb);
 #endif
@@ -1381,27 +1518,33 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 		      || (!targetm.setjmp_preserves_nonvolatile_regs_p ()
 			  && (find_reg_note (insn, REG_SETJMP, NULL_RTX)
 			      != NULL_RTX)))
+		  {
+		    if (has_subreg_object_p (a))
+		      {
+			HARD_REG_SET regs;
+			SET_HARD_REG_SET (regs);
+			set_subreg_conflict_hard_regs (a, regs);
+		      }
+		    else
+		      {
+			SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
+			SET_HARD_REG_SET (
+			  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
+		      }
+		  }
+		  if (can_throw_internal (insn))
 		    {
-		      SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
-		      SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
-		    }
-		  eh_region r;
-		  eh_landing_pad lp;
-		  rtx_code_label *landing_label;
-		  basic_block landing_bb;
-		  if (can_throw_internal (insn)
-		      && (r = get_eh_region_from_rtx (insn)) != NULL
-		      && (lp = gen_eh_landing_pad (r)) != NULL
-		      && (landing_label = lp->landing_pad) != NULL
-		      && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
-		      && (r->type != ERT_CLEANUP
-			  || bitmap_bit_p (df_get_live_in (landing_bb),
-					   ALLOCNO_REGNO (a))))
-		    {
-		      HARD_REG_SET new_conflict_regs
-			= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-		      OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
-		      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+		    if (has_subreg_object_p (a))
+		      set_subreg_conflict_hard_regs (a,
+						     callee_abi.mode_clobbers (
+						       ALLOCNO_MODE (a)));
+		    else
+		      {
+			OBJECT_CONFLICT_HARD_REGS (obj)
+			  |= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+			OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
+			  |= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		      }
 		    }
 		  if (sparseset_bit_p (allocnos_processed, num))
 		    continue;
@@ -1443,7 +1586,14 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	  
 	  /* Mark each used value as live.  */
 	  FOR_EACH_INSN_USE (use, insn)
-	    mark_ref_live (use);
+	    {
+	      unsigned regno = DF_REF_REGNO (use);
+	      ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+	      if (a && has_subreg_object_p (a)
+		  && DF_REF_FLAGS (use) & (DF_REF_READ_WRITE | DF_REF_SUBREG))
+		  continue;
+	      mark_ref_live (use);
+	    }
 
 	  process_single_reg_class_operands (true, freq);
 
@@ -1473,6 +1623,10 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	}
       ignore_reg_for_conflicts = NULL_RTX;
 
+      if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+	fprintf (ira_dump_file, "\n   BB head(l%d): point = %d\n",
+		 loop_tree_node->parent->loop_num, curr_point);
+
       if (bb_has_eh_pred (bb))
 	for (j = 0; ; ++j)
 	  {
@@ -1526,10 +1680,15 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	}
 
       EXECUTE_IF_SET_IN_SPARSESET (objects_live, i)
-	make_object_dead (ira_object_id_map[i]);
+      {
+	ira_object_t obj = ira_object_id_map[i];
+	if (has_subreg_object_p (OBJECT_ALLOCNO (obj)))
+	  add_subreg_point (obj, true, false);
+	else
+	  make_object_dead (obj);
+      }
 
       curr_point++;
-
     }
   /* Propagate register pressure to upper loop tree nodes.  */
   if (loop_tree_node != ira_loop_tree_root)
@@ -1730,6 +1889,86 @@ ira_debug_live_ranges (void)
   print_live_ranges (stderr);
 }
 
+class subreg_live_item
+{
+public:
+  subreg_ranges subreg;
+  int start, finish;
+};
+
+/* Create subreg live ranges from objects def/use point info.  */
+static void
+create_subregs_live_ranges ()
+{
+  for (const auto &subreg_point_it : subreg_live_points->subreg_points)
+    {
+      unsigned int allocno_num = subreg_point_it.first;
+      const class live_points &points = subreg_point_it.second;
+      ira_allocno_t a = ira_allocnos[allocno_num];
+      std::vector<subreg_live_item> temps;
+      gcc_assert (has_subreg_object_p (a));
+      for (const auto &point_it : points.points)
+	{
+	  int point = point_it.first;
+	  const live_point &regs = point_it.second;
+	  gcc_assert (temps.empty () || temps.back ().finish <= point);
+	  if (!regs.use_reg.empty_p ())
+	    {
+	      if (temps.empty ())
+		temps.push_back ({regs.use_reg, point, -1});
+	      else if (temps.back ().finish == -1)
+		{
+		  if (!temps.back ().subreg.same_p (regs.use_reg))
+		    {
+		      if (temps.back ().start == point)
+			temps.back ().subreg.add_ranges (regs.use_reg);
+		      else
+			{
+			  temps.back ().finish = point - 1;
+
+			  subreg_ranges temp = regs.use_reg;
+			  temp.add_ranges (temps.back ().subreg);
+			  temps.push_back ({temp, point, -1});
+			}
+		    }
+		}
+	      else if (temps.back ().subreg.same_p (regs.use_reg)
+		       && (temps.back ().finish == point
+			   || temps.back ().finish + 1 == point))
+		temps.back ().finish = -1;
+	      else
+		temps.push_back ({regs.use_reg, point, -1});
+	    }
+	  if (!regs.def_reg.empty_p ())
+	    {
+	      gcc_assert (!temps.empty ());
+	      if (regs.def_reg.include_ranges_p (temps.back ().subreg))
+		temps.back ().finish = point;
+	      else if (temps.back ().subreg.include_ranges_p (regs.def_reg))
+		{
+		  temps.back ().finish = point;
+
+		  subreg_ranges diff = temps.back ().subreg;
+		  diff.remove_ranges (regs.def_reg);
+		  temps.push_back ({diff, point + 1, -1});
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	}
+      for (const subreg_live_item &item : temps)
+	for (const subreg_range &r : item.subreg.ranges)
+	  {
+	    ira_object_t obj = find_object_anyway (a, r.start, r.end - r.start);
+	    live_range_t lr = OBJECT_LIVE_RANGES (obj);
+	    if (lr != NULL && lr->finish + 1 == item.start)
+	      lr->finish = item.finish;
+	    else
+	      ira_add_live_range_to_object (obj, item.start, item.finish);
+	  }
+    }
+}
+
 /* The main entry function creates live ranges, set up
    CONFLICT_HARD_REGS and TOTAL_CONFLICT_HARD_REGS for objects, and
    calculate register pressure info.  */
@@ -1743,13 +1982,20 @@ ira_create_allocno_live_ranges (void)
   allocno_saved_at_call
     = (int *) ira_allocate (ira_allocnos_num * sizeof (int));
   memset (allocno_saved_at_call, 0, ira_allocnos_num * sizeof (int));
+  subreg_live_points = new subregs_live_points ();
   ira_traverse_loop_tree (true, ira_loop_tree_root, NULL,
 			  process_bb_node_lives);
   ira_max_point = curr_point;
+  create_subregs_live_ranges ();
   create_start_finish_chains ();
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
-    print_live_ranges (ira_dump_file);
+    {
+      fprintf (ira_dump_file, ";; subreg live points:\n");
+      subreg_live_points->dump (ira_dump_file);
+      print_live_ranges (ira_dump_file);
+    }
   /* Clean up.  */
+  delete subreg_live_points;
   ira_free (allocno_saved_at_call);
   sparseset_free (objects_live);
   sparseset_free (allocnos_processed);
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
index 56931b53550..bee97708a52 100644
--- a/gcc/subreg-live-range.h
+++ b/gcc/subreg-live-range.h
@@ -275,11 +275,20 @@ class subregs_live_points
 {
 public:
   std::map<int, live_points> subreg_points;
+  std::map<int, int> last_start_points;
   std::map<int, subreg_ranges> subreg_live_ranges;
 
   void add_point (int id, int max, const subreg_range &range, bool is_def,
 		  int point)
   {
+    if (!is_def && empty_live_p (id))
+      {
+	if (last_start_points.count (id) == 0)
+	  last_start_points.insert ({id, point});
+	else
+	  last_start_points.at (id) = point;
+      }
+
     if (subreg_points.count (id) == 0)
       subreg_points.insert ({id, live_points (id, max)});
 
@@ -317,6 +326,13 @@ public:
 	   || subreg_live_ranges.at (id).empty_p ();
   }
 
+  int get_start_point (int id)
+  {
+    int start_point = last_start_points.at (id);
+    gcc_assert (start_point != -1);
+    return start_point;
+  }
+
   void clear_live_ranges () { subreg_live_ranges.clear (); }
 
   /* Debug methods.  */
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 4/7] ira: Support subreg copy
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (2 preceding siblings ...)
  2023-11-08  3:47 ` [PATCH 3/7] ira: Support subreg live range track Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:47 ` [PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch change the copy between allocno and allocno to the copy between
object and object, that is, allow partial copy between pseudo registers.

gcc/ChangeLog:

	* ira-build.cc (find_allocno_copy): Removed.
	(ira_create_object): Adjust.
	(find_object): New.
	(ira_create_copy): Adjust.
	(add_allocno_copy_to_list): Adjust.
	(swap_allocno_copy_ends_if_necessary): Adjust.
	(ira_add_allocno_copy): Adjust.
	(print_copy): Adjust.
	(print_allocno_copies): Adjust.
	(ira_flattening): Adjust.
	* ira-color.cc (INCLUDE_VECTOR): use std::vector
	(struct allocno_color_data): New fields.
	(struct allocno_hard_regs_subnode): More comments.
	(form_allocno_hard_regs_nodes_forest): More comments.
	(update_left_conflict_sizes_p): More comments.
	(struct update_cost_queue_elem): New field.
	(queue_update_cost): Adjust.
	(get_next_update_cost): Adjust.
	(update_costs_from_allocno): Adjust.
	(update_conflict_hard_regno_costs): Adjust.
	(assign_hard_reg): Adjust.
	(objects_conflict_by_live_ranges_p): New.
	(allocno_thread_conflict_p): Removed.
	(object_thread_conflict_p): New.
	(merge_threads): Adjust.
	(form_threads_from_copies): Adjust.
	(form_threads_from_bucket): Adjust.
	(form_threads_from_colorable_allocno): Adjust.
	(init_allocno_threads): Adjust.
	(add_allocno_to_bucket): Adjust.
	(delete_allocno_from_bucket): Adjust.
	(allocno_copy_cost_saving): Adjust.
	(color_allocnos): Adjust.
	(color_pass): Adjust.
	(update_curr_costs): Adjust.
	(coalesce_allocnos): Adjust.
	(ira_reuse_stack_slot): Adjust.
	(ira_initiate_assign): Adjust.
	(ira_finish_assign): Adjust.
	* ira-conflicts.cc (allocnos_conflict_for_copy_p): Removed.
	(REG_SUBREG_P): Adjust.
	(subreg_move_p): New.
	(regs_non_conflict_for_copy_p): New.
	(subreg_reg_align_and_times_p): New.
	(process_regs_for_copy): Adjust.
	(add_insn_allocno_copies): Adjust.
	(propagate_copies): Adjust.
	* ira-emit.cc (add_range_and_copies_from_move_list): Adjust.
	* ira-int.h (struct ira_object): New field.
	(OBJECT_INDEX): New macro.
	(struct ira_allocno_copy): Adjust fields.
	(ira_add_allocno_copy): Exported.
	(find_object): Exported.
	(subreg_move_p): Exported.
	* ira.cc (print_redundant_copies): Adjust.

---
 gcc/ira-build.cc     | 150 +++++++-----
 gcc/ira-color.cc     | 541 +++++++++++++++++++++++++++++++------------
 gcc/ira-conflicts.cc | 173 +++++++++++---
 gcc/ira-emit.cc      |  10 +-
 gcc/ira-int.h        |  13 +-
 gcc/ira.cc           |   5 +-
 6 files changed, 645 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 5fb7a9f800f..1c47f81ce9d 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-				     ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -463,6 +460,7 @@ ira_create_object (ira_allocno_t a, int start, int nregs)
   OBJECT_LIVE_RANGES (obj) = NULL;
   OBJECT_START (obj) = start;
   OBJECT_NREGS (obj) = nregs;
+  OBJECT_INDEX (obj) = ALLOCNO_NUM_OBJECTS (a);
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
@@ -519,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+    return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+    return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
    found.  */
 ira_object_t
@@ -1502,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
    LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
 		   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a1)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      if (first_a == a1)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  if (cp->first == obj1)
+	    another_obj = cp->second;
+	  else
+	    continue;
 	}
-      else if (cp->second == a1)
+      else if (second_a == a1)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  if (cp->second == obj1)
+	    another_obj = cp->first;
+	  else
+	    continue;
 	}
       else
 	gcc_unreachable ();
-      if (another_a == a2 && cp->insn == insn
+      if (another_obj == obj2 && cp->insn == insn
 	  && cp->loop_tree_node == loop_tree_node)
 	return cp;
     }
@@ -1532,7 +1549,7 @@ find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
 /* Create and return copy with given attributes LOOP_TREE_NODE, FIRST,
    SECOND, FREQ, CONSTRAINT_P, and INSN.  */
 ira_copy_t
-ira_create_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+ira_create_copy (ira_object_t first, ira_object_t second, int freq,
 		 bool constraint_p, rtx_insn *insn,
 		 ira_loop_tree_node_t loop_tree_node)
 {
@@ -1556,28 +1573,29 @@ ira_create_copy (ira_allocno_t first, ira_allocno_t second, int freq,
 static void
 add_allocno_copy_to_list (ira_copy_t cp)
 {
-  ira_allocno_t first = cp->first, second = cp->second;
+  ira_object_t first = cp->first, second = cp->second;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (first), a2 = OBJECT_ALLOCNO (second);
 
   cp->prev_first_allocno_copy = NULL;
   cp->prev_second_allocno_copy = NULL;
-  cp->next_first_allocno_copy = ALLOCNO_COPIES (first);
+  cp->next_first_allocno_copy = ALLOCNO_COPIES (a1);
   if (cp->next_first_allocno_copy != NULL)
     {
-      if (cp->next_first_allocno_copy->first == first)
+      if (OBJECT_ALLOCNO (cp->next_first_allocno_copy->first) == a1)
 	cp->next_first_allocno_copy->prev_first_allocno_copy = cp;
       else
 	cp->next_first_allocno_copy->prev_second_allocno_copy = cp;
     }
-  cp->next_second_allocno_copy = ALLOCNO_COPIES (second);
+  cp->next_second_allocno_copy = ALLOCNO_COPIES (a2);
   if (cp->next_second_allocno_copy != NULL)
     {
-      if (cp->next_second_allocno_copy->second == second)
+      if (OBJECT_ALLOCNO (cp->next_second_allocno_copy->second) == a2)
 	cp->next_second_allocno_copy->prev_second_allocno_copy = cp;
       else
 	cp->next_second_allocno_copy->prev_first_allocno_copy = cp;
     }
-  ALLOCNO_COPIES (first) = cp;
-  ALLOCNO_COPIES (second) = cp;
+  ALLOCNO_COPIES (a1) = cp;
+  ALLOCNO_COPIES (a2) = cp;
 }
 
 /* Make a copy CP a canonical copy where number of the
@@ -1585,7 +1603,8 @@ add_allocno_copy_to_list (ira_copy_t cp)
 static void
 swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 {
-  if (ALLOCNO_NUM (cp->first) <= ALLOCNO_NUM (cp->second))
+  if (ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))
+      <= ALLOCNO_NUM (OBJECT_ALLOCNO (cp->second)))
     return;
 
   std::swap (cp->first, cp->second);
@@ -1594,11 +1613,10 @@ swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 }
 
 /* Create (or update frequency if the copy already exists) and return
-   the copy of allocnos FIRST and SECOND with frequency FREQ
-   corresponding to move insn INSN (if any) and originated from
-   LOOP_TREE_NODE.  */
+   the copy of objects FIRST and SECOND with frequency FREQ corresponding to
+   move insn INSN (if any) and originated from LOOP_TREE_NODE.  */
 ira_copy_t
-ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+ira_add_allocno_copy (ira_object_t first, ira_object_t second, int freq,
 		      bool constraint_p, rtx_insn *insn,
 		      ira_loop_tree_node_t loop_tree_node)
 {
@@ -1617,15 +1635,33 @@ ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
   return cp;
 }
 
+/* Create (or update frequency if the copy already exists) and return
+   the copy of allocnos FIRST and SECOND with frequency FREQ
+   corresponding to move insn INSN (if any) and originated from
+   LOOP_TREE_NODE.  */
+ira_copy_t
+ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+		      bool constraint_p, rtx_insn *insn,
+		      ira_loop_tree_node_t loop_tree_node)
+{
+  ira_object_t obj1 = get_full_object (first);
+  ira_object_t obj2 = get_full_object (second);
+  gcc_assert (obj1 != NULL && obj2 != NULL);
+  return ira_add_allocno_copy (obj1, obj2, freq, constraint_p, insn,
+			       loop_tree_node);
+}
+
 /* Print info about copy CP into file F.  */
 static void
 print_copy (FILE *f, ira_copy_t cp)
 {
-  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num,
-	   ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-	   ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second), cp->freq,
-	   cp->insn != NULL
-	   ? "move" : cp->constraint_p ? "constraint" : "shuffle");
+  ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
+  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num, ALLOCNO_NUM (a1),
+	   ALLOCNO_REGNO (a1), ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2), cp->freq,
+	   cp->insn != NULL   ? "move"
+	   : cp->constraint_p ? "constraint"
+			      : "shuffle");
 }
 
 DEBUG_FUNCTION void
@@ -1672,24 +1708,25 @@ ira_debug_copies (void)
 static void
 print_allocno_copies (FILE *f, ira_allocno_t a)
 {
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
   ira_copy_t cp, next_cp;
 
   fprintf (f, " a%d(r%d):", ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      if (OBJECT_ALLOCNO (cp->first) == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_obj = cp->second;
 	}
-      else if (cp->second == a)
+      else if (OBJECT_ALLOCNO (cp->second) == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_obj = cp->first;
 	}
       else
 	gcc_unreachable ();
+      ira_allocno_t another_a = OBJECT_ALLOCNO (another_obj);
       fprintf (f, " cp%d:a%d(r%d)@%d", cp->num,
 	       ALLOCNO_NUM (another_a), ALLOCNO_REGNO (another_a), cp->freq);
     }
@@ -3479,25 +3516,21 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
      copies.  */
   FOR_EACH_COPY (cp, ci)
     {
-      if (ALLOCNO_CAP_MEMBER (cp->first) != NULL
-	  || ALLOCNO_CAP_MEMBER (cp->second) != NULL)
+      ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
+      if (ALLOCNO_CAP_MEMBER (a1) != NULL || ALLOCNO_CAP_MEMBER (a2) != NULL)
 	{
 	  if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
-	    fprintf
-	      (ira_dump_file, "      Remove cp%d:%c%dr%d-%c%dr%d\n",
-	       cp->num, ALLOCNO_CAP_MEMBER (cp->first) != NULL ? 'c' : 'a',
-	       ALLOCNO_NUM (cp->first),
-	       REGNO (allocno_emit_reg (cp->first)),
-	       ALLOCNO_CAP_MEMBER (cp->second) != NULL ? 'c' : 'a',
-	       ALLOCNO_NUM (cp->second),
-	       REGNO (allocno_emit_reg (cp->second)));
+	    fprintf (ira_dump_file, "      Remove cp%d:%c%dr%d-%c%dr%d\n",
+		     cp->num, ALLOCNO_CAP_MEMBER (a1) != NULL ? 'c' : 'a',
+		     ALLOCNO_NUM (a1), REGNO (allocno_emit_reg (a1)),
+		     ALLOCNO_CAP_MEMBER (a2) != NULL ? 'c' : 'a',
+		     ALLOCNO_NUM (a2), REGNO (allocno_emit_reg (a2)));
 	  cp->loop_tree_node = NULL;
 	  continue;
 	}
-      first
-	= regno_top_level_allocno_map[REGNO (allocno_emit_reg (cp->first))];
-      second
-	= regno_top_level_allocno_map[REGNO (allocno_emit_reg (cp->second))];
+      first = regno_top_level_allocno_map[REGNO (allocno_emit_reg (a1))];
+      second = regno_top_level_allocno_map[REGNO (allocno_emit_reg (a2))];
       node = cp->loop_tree_node;
       if (node == NULL)
 	keep_p = true; /* It copy generated in ira-emit.cc.  */
@@ -3505,8 +3538,8 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
 	{
 	  /* Check that the copy was not propagated from level on
 	     which we will have different pseudos.  */
-	  node_first = node->regno_allocno_map[ALLOCNO_REGNO (cp->first)];
-	  node_second = node->regno_allocno_map[ALLOCNO_REGNO (cp->second)];
+	  node_first = node->regno_allocno_map[ALLOCNO_REGNO (a1)];
+	  node_second = node->regno_allocno_map[ALLOCNO_REGNO (a2)];
 	  keep_p = ((REGNO (allocno_emit_reg (first))
 		     == REGNO (allocno_emit_reg (node_first)))
 		     && (REGNO (allocno_emit_reg (second))
@@ -3515,18 +3548,18 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
       if (keep_p)
 	{
 	  cp->loop_tree_node = ira_loop_tree_root;
-	  cp->first = first;
-	  cp->second = second;
+	  cp->first = find_object_anyway (first, OBJECT_START (cp->first),
+					  OBJECT_NREGS (cp->first));
+	  cp->second = find_object_anyway (second, OBJECT_START (cp->second),
+					   OBJECT_NREGS (cp->second));
 	}
       else
 	{
 	  cp->loop_tree_node = NULL;
 	  if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file, "      Remove cp%d:a%dr%d-a%dr%d\n",
-		     cp->num, ALLOCNO_NUM (cp->first),
-		     REGNO (allocno_emit_reg (cp->first)),
-		     ALLOCNO_NUM (cp->second),
-		     REGNO (allocno_emit_reg (cp->second)));
+		     cp->num, ALLOCNO_NUM (a1), REGNO (allocno_emit_reg (a1)),
+		     ALLOCNO_NUM (a2), REGNO (allocno_emit_reg (a2)));
 	}
     }
   /* Remove unnecessary allocnos on lower levels of the loop tree.  */
@@ -3562,9 +3595,10 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
 	  finish_copy (cp);
 	  continue;
 	}
-      ira_assert
-	(ALLOCNO_LOOP_TREE_NODE (cp->first) == ira_loop_tree_root
-	 && ALLOCNO_LOOP_TREE_NODE (cp->second) == ira_loop_tree_root);
+      ira_assert (ALLOCNO_LOOP_TREE_NODE (OBJECT_ALLOCNO (cp->first))
+		    == ira_loop_tree_root
+		  && ALLOCNO_LOOP_TREE_NODE (OBJECT_ALLOCNO (cp->second))
+		       == ira_loop_tree_root);
       add_allocno_copy_to_list (cp);
       swap_allocno_copy_ends_if_necessary (cp);
     }
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MAP
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -150,11 +151,18 @@ struct allocno_color_data
   struct update_cost_record *update_cost_records;
   /* Threads.  We collect allocnos connected by copies into threads
      and try to assign hard regs to allocnos by threads.  */
-  /* Allocno representing all thread.  */
-  ira_allocno_t first_thread_allocno;
+  /* The head objects for all thread.  */
+  ira_object_t *first_thread_objects;
   /* Allocnos in thread forms a cycle list through the following
      member.  */
-  ira_allocno_t next_thread_allocno;
+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;
+  /* The freq sum of all thread allocno.  */
   /* All thread frequency.  Defined only for first thread allocno.  */
   int thread_freq;
   /* Sum of frequencies of hard register preferences of the allocno.  */
@@ -188,6 +196,9 @@ static bitmap coloring_allocno_bitmap;
    allocnos.  */
 static bitmap consideration_allocno_bitmap;
 
+/* Bitmap of allocnos which is not trivially colorable.  */
+static bitmap uncolorable_allocno_set;
+
 /* All allocnos sorted according their priorities.  */
 static ira_allocno_t *sorted_allocnos;
 
@@ -647,9 +658,13 @@ struct allocno_hard_regs_subnode
      Overall conflict size is
      left_conflict_subnodes_size
        + MIN (max_node_impact - left_conflict_subnodes_size,
-              left_conflict_size)
+	      left_conflict_size)
+     Use MIN here to ensure that the total conflict does not exceed
+     max_node_impact.
   */
+  /* The total conflict size of subnodes.  */
   short left_conflict_subnodes_size;
+  /* The maximum number of registers that the current node can use.  */
   short max_node_impact;
 };
 
@@ -758,6 +773,8 @@ form_allocno_hard_regs_nodes_forest (void)
       collect_allocno_hard_regs_cover (hard_regs_roots,
 				       allocno_data->profitable_hard_regs);
       allocno_hard_regs_node = NULL;
+      /* Find the ancestor node in forest which cover all nodes. The ancestor is
+	 a smallest superset of profitable_hard_regs.  */
       for (j = 0; hard_regs_node_vec.iterate (j, &node); j++)
 	allocno_hard_regs_node
 	  = (j == 0
@@ -990,6 +1007,8 @@ update_left_conflict_sizes_p (ira_allocno_t a,
 					removed_node->hard_regs->set));
   start = node_preorder_num * allocno_hard_regs_nodes_num;
   i = allocno_hard_regs_subnode_index[start + removed_node->preorder_num];
+  /* i < 0 means removed_node is parent of node instead of node is the parent of
+     removed_node.  */
   if (i < 0)
     i = 0;
   subnodes = allocno_hard_regs_subnodes + data->hard_regs_subnodes_start;
@@ -999,6 +1018,7 @@ update_left_conflict_sizes_p (ira_allocno_t a,
 	      - subnodes[i].left_conflict_subnodes_size,
 	      subnodes[i].left_conflict_size));
   subnodes[i].left_conflict_size -= size;
+  /* Update all ancestors for subnode i.  */
   for (;;)
     {
       conflict_size
@@ -1242,6 +1262,9 @@ struct update_cost_queue_elem
      connecting this allocno to the one being allocated.  */
   int divisor;
 
+  /* Hard register regno assigned to current ALLOCNO.  */
+  int hard_regno;
+
   /* Allocno from which we started chaining costs of connected
      allocnos. */
   ira_allocno_t start;
@@ -1308,7 +1331,7 @@ start_update_cost (void)
 /* Add (ALLOCNO, START, FROM, DIVISOR) to the end of update_cost_queue, unless
    ALLOCNO is already in the queue, or has NO_REGS class.  */
 static inline void
-queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
+queue_update_cost (ira_allocno_t allocno, int hard_regno, ira_allocno_t start,
 		   ira_allocno_t from, int divisor)
 {
   struct update_cost_queue_elem *elem;
@@ -1317,6 +1340,7 @@ queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
   if (elem->check != update_cost_check
       && ALLOCNO_CLASS (allocno) != NO_REGS)
     {
+      elem->hard_regno = hard_regno;
       elem->check = update_cost_check;
       elem->start = start;
       elem->from = from;
@@ -1334,8 +1358,8 @@ queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
    false if the queue was empty, otherwise make (*ALLOCNO, *START,
    *FROM, *DIVISOR) describe the removed element.  */
 static inline bool
-get_next_update_cost (ira_allocno_t *allocno, ira_allocno_t *start,
-		      ira_allocno_t *from, int *divisor)
+get_next_update_cost (ira_allocno_t *allocno, int *hard_regno,
+		      ira_allocno_t *start, ira_allocno_t *from, int *divisor)
 {
   struct update_cost_queue_elem *elem;
 
@@ -1348,6 +1372,8 @@ get_next_update_cost (ira_allocno_t *allocno, ira_allocno_t *start,
   *from = elem->from;
   *divisor = elem->divisor;
   update_cost_queue = elem->next;
+  if (hard_regno != NULL)
+    *hard_regno = elem->hard_regno;
   return true;
 }
 
@@ -1449,31 +1475,41 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
   enum reg_class rclass, aclass;
   ira_allocno_t another_allocno, start = allocno, from = NULL;
   ira_copy_t cp, next_cp;
+  ira_object_t another_obj;
+  unsigned int obj_index1, obj_index2;
 
   rclass = REGNO_REG_CLASS (hard_regno);
   do
     {
+      gcc_assert (hard_regno >= 0);
       mode = ALLOCNO_MODE (allocno);
       ira_init_register_move_cost_if_necessary (mode);
       for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == allocno)
+	  if (OBJECT_ALLOCNO (cp->first) == allocno)
 	    {
+	      obj_index1 = OBJECT_INDEX (cp->first);
+	      obj_index2 = OBJECT_INDEX (cp->second);
 	      next_cp = cp->next_first_allocno_copy;
-	      another_allocno = cp->second;
+	      another_obj = cp->second;
 	    }
-	  else if (cp->second == allocno)
+	  else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	    {
+	      obj_index1 = OBJECT_INDEX (cp->second);
+	      obj_index2 = OBJECT_INDEX (cp->first);
 	      next_cp = cp->next_second_allocno_copy;
-	      another_allocno = cp->first;
+	      another_obj = cp->first;
 	    }
 	  else
 	    gcc_unreachable ();
 
+	  another_allocno = OBJECT_ALLOCNO (another_obj);
 	  if (another_allocno == from
 	      || (ALLOCNO_COLOR_DATA (another_allocno) != NULL
-		  && (ALLOCNO_COLOR_DATA (allocno)->first_thread_allocno
-		      != ALLOCNO_COLOR_DATA (another_allocno)->first_thread_allocno)))
+		  && (ALLOCNO_COLOR_DATA (allocno)
+			->first_thread_objects[obj_index1]
+		      != ALLOCNO_COLOR_DATA (another_allocno)
+			   ->first_thread_objects[obj_index2])))
 	    continue;
 
 	  aclass = ALLOCNO_CLASS (another_allocno);
@@ -1482,6 +1518,8 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 	      || ALLOCNO_ASSIGNED_P (another_allocno))
 	    continue;
 
+	  ira_allocno_t first_allocno = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_allocno = OBJECT_ALLOCNO (cp->second);
 	  /* If we have different modes use the smallest one.  It is
 	     a sub-register move.  It is hard to predict what LRA
 	     will reload (the pseudo or its sub-register) but LRA
@@ -1489,14 +1527,21 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 	     register classes bigger modes might be invalid,
 	     e.g. DImode for AREG on x86.  For such cases the
 	     register move cost will be maximal.  */
-	  mode = narrower_subreg_mode (ALLOCNO_MODE (cp->first),
-				       ALLOCNO_MODE (cp->second));
+	  mode = narrower_subreg_mode (ALLOCNO_MODE (first_allocno),
+				       ALLOCNO_MODE (second_allocno));
 
 	  ira_init_register_move_cost_if_necessary (mode);
 
-	  cost = (cp->second == allocno
-		  ? ira_register_move_cost[mode][rclass][aclass]
-		  : ira_register_move_cost[mode][aclass][rclass]);
+	  cost = (second_allocno == allocno
+		    ? ira_register_move_cost[mode][rclass][aclass]
+		    : ira_register_move_cost[mode][aclass][rclass]);
+	  /* Adjust the hard regno for another_allocno for subreg copy.  */
+	  int start_regno = hard_regno;
+	  if (cp->insn && subreg_move_p (cp->first, cp->second))
+	    {
+	      int diff = OBJECT_START (cp->first) - OBJECT_START (cp->second);
+	      start_regno += (first_allocno == allocno ? diff : -diff);
+	    }
 	  if (decr_p)
 	    cost = -cost;
 
@@ -1505,25 +1550,30 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 
 	  if (internal_flag_ira_verbose > 5 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file,
-		     "          a%dr%d (hr%d): update cost by %d, conflict cost by %d\n",
-		     ALLOCNO_NUM (another_allocno), ALLOCNO_REGNO (another_allocno),
-		     hard_regno, update_cost, update_conflict_cost);
+		     "          a%dr%d (hr%d): update cost by %d, conflict "
+		     "cost by %d\n",
+		     ALLOCNO_NUM (another_allocno),
+		     ALLOCNO_REGNO (another_allocno), start_regno, update_cost,
+		     update_conflict_cost);
 	  if (update_cost == 0)
 	    continue;
 
-	  if (! update_allocno_cost (another_allocno, hard_regno,
-				     update_cost, update_conflict_cost))
+	  if (start_regno < 0
+	      || (start_regno + ALLOCNO_NREGS (another_allocno))
+		   > FIRST_PSEUDO_REGISTER
+	      || !update_allocno_cost (another_allocno, start_regno,
+				       update_cost, update_conflict_cost))
 	    continue;
-	  queue_update_cost (another_allocno, start, allocno,
+	  queue_update_cost (another_allocno, start_regno, start, allocno,
 			     divisor * COST_HOP_DIVISOR);
 	  if (record_p && ALLOCNO_COLOR_DATA (another_allocno) != NULL)
 	    ALLOCNO_COLOR_DATA (another_allocno)->update_cost_records
-	      = get_update_cost_record (hard_regno, divisor,
-					ALLOCNO_COLOR_DATA (another_allocno)
-					->update_cost_records);
+	      = get_update_cost_record (
+		start_regno, divisor,
+		ALLOCNO_COLOR_DATA (another_allocno)->update_cost_records);
 	}
-    }
-  while (get_next_update_cost (&allocno, &start, &from, &divisor));
+  } while (
+    get_next_update_cost (&allocno, &hard_regno, &start, &from, &divisor));
 }
 
 /* Decrease preferred ALLOCNO hard register costs and costs of
@@ -1632,23 +1682,25 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
   enum reg_class another_aclass;
   ira_allocno_t allocno, another_allocno, start, from;
   ira_copy_t cp, next_cp;
+  ira_object_t another_obj;
 
-  while (get_next_update_cost (&allocno, &start, &from, &divisor))
+  while (get_next_update_cost (&allocno, NULL, &start, &from, &divisor))
     for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
       {
-	if (cp->first == allocno)
+	if (OBJECT_ALLOCNO (cp->first) == allocno)
 	  {
 	    next_cp = cp->next_first_allocno_copy;
-	    another_allocno = cp->second;
+	    another_obj = cp->second;
 	  }
-	else if (cp->second == allocno)
+	else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	  {
 	    next_cp = cp->next_second_allocno_copy;
-	    another_allocno = cp->first;
+	    another_obj = cp->first;
 	  }
 	else
 	  gcc_unreachable ();
 
+	another_allocno = OBJECT_ALLOCNO (another_obj);
 	another_aclass = ALLOCNO_CLASS (another_allocno);
 	if (another_allocno == from
 	    || ALLOCNO_ASSIGNED_P (another_allocno)
@@ -1696,7 +1748,8 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR))
-	  queue_update_cost (another_allocno, start, from, divisor * COST_HOP_DIVISOR);
+	  queue_update_cost (another_allocno, -1, start, from,
+			     divisor * COST_HOP_DIVISOR);
       }
 }
 
@@ -2034,6 +2087,11 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
         {
 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
+
+	  if (ALLOCNO_COLOR_DATA (a)->first_thread_allocno
+	      == ALLOCNO_COLOR_DATA (conflict_a)->first_thread_allocno)
+	    continue;
+
 	  enum reg_class conflict_aclass;
 	  allocno_color_data_t data = ALLOCNO_COLOR_DATA (conflict_a);
 
@@ -2225,7 +2283,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 		      continue;
 		    full_costs[j] -= conflict_costs[k];
 		  }
-	      queue_update_cost (conflict_a, conflict_a, NULL, COST_HOP_DIVISOR);
+	      queue_update_cost (conflict_a, -1, conflict_a, NULL,
+				 COST_HOP_DIVISOR);
 	    }
 	}
     }
@@ -2239,7 +2298,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
   if (! retry_p)
     {
       start_update_cost ();
-      queue_update_cost (a, a, NULL, COST_HOP_DIVISOR);
+      queue_update_cost (a, -1, a, NULL, COST_HOP_DIVISOR);
       update_conflict_hard_regno_costs (full_costs, aclass, false);
     }
   min_cost = min_full_cost = INT_MAX;
@@ -2264,17 +2323,17 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
       if (!HONOR_REG_ALLOC_ORDER)
 	{
 	  if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0)
-	  /* We need to save/restore the hard register in
-	     epilogue/prologue.  Therefore we increase the cost.  */
-	  {
-	    rclass = REGNO_REG_CLASS (hard_regno);
-	    add_cost = ((ira_memory_move_cost[mode][rclass][0]
-		         + ira_memory_move_cost[mode][rclass][1])
+	    /* We need to save/restore the hard register in
+	       epilogue/prologue.  Therefore we increase the cost.  */
+	    {
+	      rclass = REGNO_REG_CLASS (hard_regno);
+	      add_cost = ((ira_memory_move_cost[mode][rclass][0]
+			   + ira_memory_move_cost[mode][rclass][1])
 		        * saved_nregs / hard_regno_nregs (hard_regno,
 							  mode) - 1);
-	    cost += add_cost;
-	    full_cost += add_cost;
-	  }
+	      cost += add_cost;
+	      full_cost += add_cost;
+	    }
 	}
       if (min_cost > cost)
 	min_cost = cost;
@@ -2393,54 +2452,173 @@ copy_freq_compare_func (const void *v1p, const void *v2p)
   return cp1->num - cp2->num;
 }
 
-\f
+/* Return true if object OBJ1 conflict with OBJ2.  */
+static bool
+objects_conflict_by_live_ranges_p (ira_object_t obj1, ira_object_t obj2)
+{
+  rtx reg1, reg2;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  if (a1 == a2)
+    return false;
+  reg1 = regno_reg_rtx[ALLOCNO_REGNO (a1)];
+  reg2 = regno_reg_rtx[ALLOCNO_REGNO (a2)];
+  if (reg1 != NULL && reg2 != NULL
+      && ORIGINAL_REGNO (reg1) == ORIGINAL_REGNO (reg2))
+    return false;
+
+  /* We don't keep live ranges for caps because they can be quite big.
+     Use ranges of non-cap allocno from which caps are created.  */
+  a1 = get_cap_member (a1);
+  a2 = get_cap_member (a2);
+
+  obj1 = find_object (a1, OBJECT_START (obj1), OBJECT_NREGS (obj1));
+  obj2 = find_object (a2, OBJECT_START (obj2), OBJECT_NREGS (obj2));
+  return ira_live_ranges_intersect_p (OBJECT_LIVE_RANGES (obj1),
+				      OBJECT_LIVE_RANGES (obj2));
+}
 
-/* Return true if any allocno from thread of A1 conflicts with any
-   allocno from thread A2.  */
+/* Return true if any object from thread of OBJ1 conflicts with any
+   object from thread OBJ2.  */
 static bool
-allocno_thread_conflict_p (ira_allocno_t a1, ira_allocno_t a2)
+object_thread_conflict_p (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a, conflict_a;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+
+  gcc_assert (
+    obj1 != obj2
+    && ALLOCNO_COLOR_DATA (a1)->first_thread_objects[OBJECT_INDEX (obj1)]
+	 == obj1
+    && ALLOCNO_COLOR_DATA (a2)->first_thread_objects[OBJECT_INDEX (obj2)]
+	 == obj2);
+
+  ira_allocno_t first_thread_allocno1
+    = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+  ira_allocno_t first_thread_allocno2
+    = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno;
+
+  int offset
+    = (ALLOCNO_COLOR_DATA (a1)->first_thread_offset + OBJECT_START (obj1))
+      - (ALLOCNO_COLOR_DATA (a2)->first_thread_offset + OBJECT_START (obj2));
+
+  /* Update first_thread_allocno and thread_allocnos info.  */
+  bitmap thread_allocnos1
+    = ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_allocnos;
+  bitmap thread_allocnos2
+    = ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos;
+  gcc_assert (!bitmap_empty_p (thread_allocnos1)
+	      && !bitmap_empty_p (thread_allocnos2));
+  std::vector<ira_object_t> thread_objects_2;
 
-  for (a = ALLOCNO_COLOR_DATA (a2)->next_thread_allocno;;
-       a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
+  unsigned int i;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos2, 0, i, bi)
     {
-      for (conflict_a = ALLOCNO_COLOR_DATA (a1)->next_thread_allocno;;
-	   conflict_a = ALLOCNO_COLOR_DATA (conflict_a)->next_thread_allocno)
-	{
-	  if (allocnos_conflict_by_live_ranges_p (a, conflict_a))
-	    return true;
-	  if (conflict_a == a1)
-	    break;
-	}
-      if (a == a2)
-	break;
+      ira_allocno_object_iterator oi;
+      ira_object_t obj;
+      FOR_EACH_ALLOCNO_OBJECT (ira_allocnos[i], obj, oi)
+	thread_objects_2.push_back (obj);
+    }
+
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos1, 0, i, bi)
+    {
+      ira_allocno_object_iterator oi;
+      ira_object_t obj;
+      ira_allocno_t a = ira_allocnos[i];
+      FOR_EACH_ALLOCNO_OBJECT (ira_allocnos[i], obj, oi)
+	for (ira_object_t other_obj : thread_objects_2)
+	  {
+	    int thread_start1 = ALLOCNO_COLOR_DATA (a)->first_thread_offset
+				+ OBJECT_START (obj);
+	    int thread_start2 = ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (other_obj))
+				  ->first_thread_offset
+				+ offset + OBJECT_START (other_obj);
+	    if (!(thread_start1 + OBJECT_NREGS (obj) <= thread_start2
+		  || thread_start2 + OBJECT_NREGS (other_obj) <= thread_start1)
+		&& objects_conflict_by_live_ranges_p (obj, other_obj))
+	      return true;
+	  }
     }
+
   return false;
 }
 
-/* Merge two threads given correspondingly by their first allocnos T1
-   and T2 (more accurately merging T2 into T1).  */
+/* Merge two threads given correspondingly by their first objects OBJ1
+   and OBJ2 (more accurately merging OBJ2 into OBJ1).  */
 static void
-merge_threads (ira_allocno_t t1, ira_allocno_t t2)
+merge_threads (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a, next, last;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+
+  gcc_assert (
+    obj1 != obj2
+    && ALLOCNO_COLOR_DATA (a1)->first_thread_objects[OBJECT_INDEX (obj1)]
+	 == obj1
+    && ALLOCNO_COLOR_DATA (a2)->first_thread_objects[OBJECT_INDEX (obj2)]
+	 == obj2);
+
+  ira_allocno_t first_thread_allocno1
+    = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+  ira_allocno_t first_thread_allocno2
+    = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno;
+
+  gcc_assert (first_thread_allocno1 != first_thread_allocno2);
 
-  gcc_assert (t1 != t2
-	      && ALLOCNO_COLOR_DATA (t1)->first_thread_allocno == t1
-	      && ALLOCNO_COLOR_DATA (t2)->first_thread_allocno == t2);
-  for (last = t2, a = ALLOCNO_COLOR_DATA (t2)->next_thread_allocno;;
-       a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
+  int offset
+    = (ALLOCNO_COLOR_DATA (a1)->first_thread_offset + OBJECT_START (obj1))
+      - (ALLOCNO_COLOR_DATA (a2)->first_thread_offset + OBJECT_START (obj2));
+
+  /* Update first_thread_allocno and thread_allocnos info.  */
+  unsigned int i;
+  bitmap_iterator bi;
+  bitmap thread_allocnos2
+    = ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos;
+  bitmap thread_allocnos1
+    = ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_allocnos;
+  gcc_assert (!bitmap_empty_p (thread_allocnos1)
+	      && !bitmap_empty_p (thread_allocnos2));
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos2, 0, i, bi)
+    {
+      ira_allocno_t a = ira_allocnos[i];
+      gcc_assert (ALLOCNO_COLOR_DATA (a)->first_thread_allocno
+		  == first_thread_allocno2);
+      /* Update first_thread_allocno and first_thread_offset filed.  */
+      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = first_thread_allocno1;
+      ALLOCNO_COLOR_DATA (a)->first_thread_offset += offset;
+      bitmap_set_bit (thread_allocnos1, i);
+    }
+  bitmap_clear (thread_allocnos2);
+  ira_free_bitmap (thread_allocnos2);
+  ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos = NULL;
+
+  ira_object_t last_obj = obj2;
+  for (ira_object_t next_obj
+       = ALLOCNO_COLOR_DATA (a2)->next_thread_objects[OBJECT_INDEX (obj2)];
+       ; next_obj = ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (next_obj))
+		      ->next_thread_objects[OBJECT_INDEX (next_obj)])
     {
-      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = t1;
-      if (a == t2)
+      ira_allocno_t next_a = OBJECT_ALLOCNO (next_obj);
+      ALLOCNO_COLOR_DATA (next_a)->first_thread_objects[OBJECT_INDEX (next_obj)]
+	= obj1;
+      gcc_assert (ALLOCNO_COLOR_DATA (next_a)->first_thread_allocno
+		  == first_thread_allocno1);
+      gcc_assert (bitmap_bit_p (thread_allocnos1, ALLOCNO_NUM (next_a)));
+      if (next_obj == obj2)
 	break;
-      last = a;
+      last_obj = next_obj;
     }
-  next = ALLOCNO_COLOR_DATA (t1)->next_thread_allocno;
-  ALLOCNO_COLOR_DATA (t1)->next_thread_allocno = t2;
-  ALLOCNO_COLOR_DATA (last)->next_thread_allocno = next;
-  ALLOCNO_COLOR_DATA (t1)->thread_freq += ALLOCNO_COLOR_DATA (t2)->thread_freq;
+  /* Add OBJ2's threads chain to OBJ1.  */
+  ira_object_t temp_obj
+    = ALLOCNO_COLOR_DATA (a1)->next_thread_objects[OBJECT_INDEX (obj1)];
+  ALLOCNO_COLOR_DATA (a1)->next_thread_objects[OBJECT_INDEX (obj1)] = obj2;
+  ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (last_obj))
+    ->next_thread_objects[OBJECT_INDEX (last_obj)]
+    = temp_obj;
+
+  ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_freq
+    += ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_freq;
 }
 
 /* Create threads by processing CP_NUM copies from sorted copies.  We
@@ -2448,7 +2626,6 @@ merge_threads (ira_allocno_t t1, ira_allocno_t t2)
 static void
 form_threads_from_copies (int cp_num)
 {
-  ira_allocno_t a, thread1, thread2;
   ira_copy_t cp;
 
   qsort (sorted_copies, cp_num, sizeof (ira_copy_t), copy_freq_compare_func);
@@ -2457,33 +2634,43 @@ form_threads_from_copies (int cp_num)
   for (int i = 0; i < cp_num; i++)
     {
       cp = sorted_copies[i];
-      thread1 = ALLOCNO_COLOR_DATA (cp->first)->first_thread_allocno;
-      thread2 = ALLOCNO_COLOR_DATA (cp->second)->first_thread_allocno;
-      if (thread1 == thread2)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      ira_object_t thread1 = ALLOCNO_COLOR_DATA (first_a)
+			       ->first_thread_objects[OBJECT_INDEX (cp->first)];
+      ira_object_t thread2
+	= ALLOCNO_COLOR_DATA (second_a)
+	    ->first_thread_objects[OBJECT_INDEX (cp->second)];
+      if (thread1 == thread2
+	  || ALLOCNO_COLOR_DATA (first_a)->first_thread_allocno
+	       == ALLOCNO_COLOR_DATA (second_a)->first_thread_allocno)
 	continue;
-      if (! allocno_thread_conflict_p (thread1, thread2))
+      if (!object_thread_conflict_p (thread1, thread2))
 	{
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf
-		(ira_dump_file,
-		 "        Forming thread by copy %d:a%dr%d-a%dr%d (freq=%d):\n",
-		 cp->num, ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-		 ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second),
-		 cp->freq);
+	    fprintf (
+	      ira_dump_file,
+	      "        Forming thread by copy %d:a%dr%d-a%dr%d (freq=%d):\n",
+	      cp->num, ALLOCNO_NUM (first_a), ALLOCNO_REGNO (first_a),
+	      ALLOCNO_NUM (second_a), ALLOCNO_REGNO (second_a), cp->freq);
 	  merge_threads (thread1, thread2);
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	    {
-	      thread1 = ALLOCNO_COLOR_DATA (thread1)->first_thread_allocno;
-	      fprintf (ira_dump_file, "          Result (freq=%d): a%dr%d(%d)",
-		       ALLOCNO_COLOR_DATA (thread1)->thread_freq,
-		       ALLOCNO_NUM (thread1), ALLOCNO_REGNO (thread1),
-		       ALLOCNO_FREQ (thread1));
-	      for (a = ALLOCNO_COLOR_DATA (thread1)->next_thread_allocno;
-		   a != thread1;
-		   a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
-		fprintf (ira_dump_file, " a%dr%d(%d)",
-			 ALLOCNO_NUM (a), ALLOCNO_REGNO (a),
-			 ALLOCNO_FREQ (a));
+	      ira_allocno_t a1 = OBJECT_ALLOCNO (thread1);
+	      ira_allocno_t first_thread_allocno
+		= ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+	      fprintf (ira_dump_file, "          Result (freq=%d):",
+		       ALLOCNO_COLOR_DATA (first_thread_allocno)->thread_freq);
+	      unsigned int i;
+	      bitmap_iterator bi;
+	      EXECUTE_IF_SET_IN_BITMAP (
+		ALLOCNO_COLOR_DATA (first_thread_allocno)->thread_allocnos, 0,
+		i, bi)
+		{
+		  ira_allocno_t a = ira_allocnos[i];
+		  fprintf (ira_dump_file, " a%dr%d(%d)", ALLOCNO_NUM (a),
+			   ALLOCNO_REGNO (a), ALLOCNO_FREQ (a));
+		}
 	      fprintf (ira_dump_file, "\n");
 	    }
 	}
@@ -2503,13 +2690,27 @@ form_threads_from_bucket (ira_allocno_t bucket)
     {
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == a)
+	  bool intersect_p = hard_reg_set_intersect_p (
+	    ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (cp->first))
+	      ->profitable_hard_regs,
+	    ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (cp->second))
+	      ->profitable_hard_regs);
+	  if (OBJECT_ALLOCNO (cp->first) == a)
 	    {
 	      next_cp = cp->next_first_allocno_copy;
+	      if (!intersect_p)
+		continue;
+	      sorted_copies[cp_num++] = cp;
+	    }
+	  else if (OBJECT_ALLOCNO (cp->second) == a)
+	    {
+	      next_cp = cp->next_second_allocno_copy;
+	      if (!intersect_p
+		  || !bitmap_bit_p (uncolorable_allocno_set,
+				    ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))))
+		continue;
 	      sorted_copies[cp_num++] = cp;
 	    }
-	  else if (cp->second == a)
-	    next_cp = cp->next_second_allocno_copy;
 	  else
 	    gcc_unreachable ();
 	}
@@ -2531,15 +2732,15 @@ form_threads_from_colorable_allocno (ira_allocno_t a)
 	     ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      if (OBJECT_ALLOCNO (cp->first) == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_a = OBJECT_ALLOCNO (cp->second);
 	}
-      else if (cp->second == a)
+      else if (OBJECT_ALLOCNO (cp->second) == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_a = OBJECT_ALLOCNO (cp->first);
 	}
       else
 	gcc_unreachable ();
@@ -2564,8 +2765,16 @@ init_allocno_threads (void)
     {
       a = ira_allocnos[j];
       /* Set up initial thread data: */
-      ALLOCNO_COLOR_DATA (a)->first_thread_allocno
-	= ALLOCNO_COLOR_DATA (a)->next_thread_allocno = a;
+      for (int i = 0; i < ALLOCNO_NUM_OBJECTS (a); i += 1)
+	{
+	  ira_object_t obj = ALLOCNO_OBJECT (a, i);
+	  ALLOCNO_COLOR_DATA (a)->first_thread_objects[i]
+	    = ALLOCNO_COLOR_DATA (a)->next_thread_objects[i] = obj;
+	}
+      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = a;
+      ALLOCNO_COLOR_DATA (a)->first_thread_offset = 0;
+      ALLOCNO_COLOR_DATA (a)->thread_allocnos = ira_allocate_bitmap ();
+      bitmap_set_bit (ALLOCNO_COLOR_DATA (a)->thread_allocnos, ALLOCNO_NUM (a));
       ALLOCNO_COLOR_DATA (a)->thread_freq = ALLOCNO_FREQ (a);
       ALLOCNO_COLOR_DATA (a)->hard_reg_prefs = 0;
       for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
@@ -2608,6 +2817,9 @@ add_allocno_to_bucket (ira_allocno_t a, ira_allocno_t *bucket_ptr)
   ira_allocno_t first_a;
   allocno_color_data_t data;
 
+  if (bucket_ptr == &uncolorable_allocno_bucket)
+    bitmap_set_bit (uncolorable_allocno_set, ALLOCNO_NUM (a));
+
   if (bucket_ptr == &uncolorable_allocno_bucket
       && ALLOCNO_CLASS (a) != NO_REGS)
     {
@@ -2734,6 +2946,9 @@ delete_allocno_from_bucket (ira_allocno_t allocno, ira_allocno_t *bucket_ptr)
 {
   ira_allocno_t prev_allocno, next_allocno;
 
+  if (bucket_ptr == &uncolorable_allocno_bucket)
+    bitmap_clear_bit (uncolorable_allocno_set, ALLOCNO_NUM (allocno));
+
   if (bucket_ptr == &uncolorable_allocno_bucket
       && ALLOCNO_CLASS (allocno) != NO_REGS)
     {
@@ -3227,16 +3442,23 @@ allocno_copy_cost_saving (ira_allocno_t allocno, int hard_regno)
     rclass = ALLOCNO_CLASS (allocno);
   for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
     {
-      if (cp->first == allocno)
+      if (OBJECT_ALLOCNO (cp->first) == allocno)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  if (ALLOCNO_HARD_REGNO (cp->second) != hard_regno)
+	  ira_allocno_t another_a = OBJECT_ALLOCNO (cp->second);
+	  if (ALLOCNO_HARD_REGNO (another_a) > -1
+	      && hard_regno + OBJECT_START (cp->first)
+		   != ALLOCNO_HARD_REGNO (another_a)
+			+ OBJECT_START (cp->second))
 	    continue;
 	}
-      else if (cp->second == allocno)
+      else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  if (ALLOCNO_HARD_REGNO (cp->first) != hard_regno)
+	  ira_allocno_t another_a = OBJECT_ALLOCNO (cp->first);
+	  if (ALLOCNO_HARD_REGNO (another_a) > -1
+	      && hard_regno + OBJECT_START (cp->second)
+		   != ALLOCNO_HARD_REGNO (another_a) + OBJECT_START (cp->first))
 	    continue;
 	}
       else
@@ -3643,6 +3865,7 @@ color_allocnos (void)
       /* Put the allocnos into the corresponding buckets.  */
       colorable_allocno_bucket = NULL;
       uncolorable_allocno_bucket = NULL;
+      bitmap_clear (uncolorable_allocno_set);
       EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi)
 	{
 	  a = ira_allocnos[i];
@@ -3740,10 +3963,12 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
   bitmap_copy (coloring_allocno_bitmap, loop_tree_node->all_allocnos);
   bitmap_copy (consideration_allocno_bitmap, coloring_allocno_bitmap);
   n = 0;
+  size_t obj_n = 0;
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
       n++;
+      obj_n += ALLOCNO_NUM_OBJECTS (a);
       if (! ALLOCNO_ASSIGNED_P (a))
 	continue;
       bitmap_clear_bit (coloring_allocno_bitmap, ALLOCNO_NUM (a));
@@ -3752,20 +3977,29 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
     = (allocno_color_data_t) ira_allocate (sizeof (struct allocno_color_data)
 					   * n);
   memset (allocno_color_data, 0, sizeof (struct allocno_color_data) * n);
+  ira_object_t *thread_objects
+    = (ira_object_t *) ira_allocate (sizeof (ira_object_t *) * obj_n * 2);
+  memset (thread_objects, 0, sizeof (ira_object_t *) * obj_n * 2);
   curr_allocno_process = 0;
   n = 0;
+  size_t obj_offset = 0;
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
       ALLOCNO_ADD_DATA (a) = allocno_color_data + n;
+      ALLOCNO_COLOR_DATA (a)->first_thread_objects
+	= thread_objects + obj_offset;
+      obj_offset += ALLOCNO_NUM_OBJECTS (a);
+      ALLOCNO_COLOR_DATA (a)->next_thread_objects = thread_objects + obj_offset;
+      obj_offset += ALLOCNO_NUM_OBJECTS (a);
       n++;
     }
+  gcc_assert (obj_n * 2 == obj_offset);
   init_allocno_threads ();
   /* Color all mentioned allocnos including transparent ones.  */
   color_allocnos ();
   /* Process caps.  They are processed just once.  */
-  if (flag_ira_region == IRA_REGION_MIXED
-      || flag_ira_region == IRA_REGION_ALL)
+  if (flag_ira_region == IRA_REGION_MIXED || flag_ira_region == IRA_REGION_ALL)
     EXECUTE_IF_SET_IN_BITMAP (loop_tree_node->all_allocnos, 0, j, bi)
       {
 	a = ira_allocnos[j];
@@ -3881,12 +4115,22 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
 	    }
 	}
     }
-  ira_free (allocno_color_data);
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
+      gcc_assert (a != NULL);
+      ALLOCNO_COLOR_DATA (a)->first_thread_objects = NULL;
+      ALLOCNO_COLOR_DATA (a)->next_thread_objects = NULL;
+      if (ALLOCNO_COLOR_DATA (a)->thread_allocnos != NULL)
+	{
+	  bitmap_clear (ALLOCNO_COLOR_DATA (a)->thread_allocnos);
+	  ira_free_bitmap (ALLOCNO_COLOR_DATA (a)->thread_allocnos);
+	  ALLOCNO_COLOR_DATA (a)->thread_allocnos = NULL;
+	}
       ALLOCNO_ADD_DATA (a) = NULL;
     }
+  ira_free (allocno_color_data);
+  ira_free (thread_objects);
 }
 
 /* Initialize the common data for coloring and calls functions to do
@@ -4080,15 +4324,17 @@ update_curr_costs (ira_allocno_t a)
   ira_init_register_move_cost_if_necessary (mode);
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      if (first_a == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_a = second_a;
 	}
-      else if (cp->second == a)
+      else if (second_a == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_a = first_a;
 	}
       else
 	gcc_unreachable ();
@@ -4100,9 +4346,8 @@ update_curr_costs (ira_allocno_t a)
       i = ira_class_hard_reg_index[aclass][hard_regno];
       if (i < 0)
 	continue;
-      cost = (cp->first == a
-	      ? ira_register_move_cost[mode][rclass][aclass]
-	      : ira_register_move_cost[mode][aclass][rclass]);
+      cost = (first_a == a ? ira_register_move_cost[mode][rclass][aclass]
+			   : ira_register_move_cost[mode][aclass][rclass]);
       ira_allocate_and_set_or_copy_costs
 	(&ALLOCNO_UPDATED_HARD_REG_COSTS (a), aclass, ALLOCNO_CLASS_COST (a),
 	 ALLOCNO_HARD_REG_COSTS (a));
@@ -4349,21 +4594,23 @@ coalesce_allocnos (void)
 	continue;
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == a)
+	  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+	  if (first_a == a)
 	    {
 	      next_cp = cp->next_first_allocno_copy;
-	      regno = ALLOCNO_REGNO (cp->second);
+	      regno = ALLOCNO_REGNO (second_a);
 	      /* For priority coloring we coalesce allocnos only with
 		 the same allocno class not with intersected allocno
 		 classes as it were possible.  It is done for
 		 simplicity.  */
 	      if ((cp->insn != NULL || cp->constraint_p)
-		  && ALLOCNO_ASSIGNED_P (cp->second)
-		  && ALLOCNO_HARD_REGNO (cp->second) < 0
-		  && ! ira_equiv_no_lvalue_p (regno))
+		  && ALLOCNO_ASSIGNED_P (second_a)
+		  && ALLOCNO_HARD_REGNO (second_a) < 0
+		  && !ira_equiv_no_lvalue_p (regno))
 		sorted_copies[cp_num++] = cp;
 	    }
-	  else if (cp->second == a)
+	  else if (second_a == a)
 	    next_cp = cp->next_second_allocno_copy;
 	  else
 	    gcc_unreachable ();
@@ -4376,17 +4623,18 @@ coalesce_allocnos (void)
       for (i = 0; i < cp_num; i++)
 	{
 	  cp = sorted_copies[i];
-	  if (! coalesced_allocno_conflict_p (cp->first, cp->second))
+	  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+	  if (!coalesced_allocno_conflict_p (first_a, second_a))
 	    {
 	      allocno_coalesced_p = true;
 	      if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-		fprintf
-		  (ira_dump_file,
-		   "      Coalescing copy %d:a%dr%d-a%dr%d (freq=%d)\n",
-		   cp->num, ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-		   ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second),
-		   cp->freq);
-	      merge_allocnos (cp->first, cp->second);
+		fprintf (ira_dump_file,
+			 "      Coalescing copy %d:a%dr%d-a%dr%d (freq=%d)\n",
+			 cp->num, ALLOCNO_NUM (first_a),
+			 ALLOCNO_REGNO (first_a), ALLOCNO_NUM (second_a),
+			 ALLOCNO_REGNO (second_a), cp->freq);
+	      merge_allocnos (first_a, second_a);
 	      i++;
 	      break;
 	    }
@@ -4395,8 +4643,11 @@ coalesce_allocnos (void)
       for (n = 0; i < cp_num; i++)
 	{
 	  cp = sorted_copies[i];
-	  if (allocno_coalesce_data[ALLOCNO_NUM (cp->first)].first
-	      != allocno_coalesce_data[ALLOCNO_NUM (cp->second)].first)
+	  if (allocno_coalesce_data[ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))]
+		.first
+	      != allocno_coalesce_data[ALLOCNO_NUM (
+					 OBJECT_ALLOCNO (cp->second))]
+		   .first)
 	    sorted_copies[n++] = cp;
 	}
       cp_num = n;
@@ -5070,15 +5321,15 @@ ira_reuse_stack_slot (int regno, poly_uint64 inherent_size,
 	       cp != NULL;
 	       cp = next_cp)
 	    {
-	      if (cp->first == allocno)
+	      if (OBJECT_ALLOCNO (cp->first) == allocno)
 		{
 		  next_cp = cp->next_first_allocno_copy;
-		  another_allocno = cp->second;
+		  another_allocno = OBJECT_ALLOCNO (cp->second);
 		}
-	      else if (cp->second == allocno)
+	      else if (OBJECT_ALLOCNO (cp->second) == allocno)
 		{
 		  next_cp = cp->next_second_allocno_copy;
-		  another_allocno = cp->first;
+		  another_allocno = OBJECT_ALLOCNO (cp->first);
 		}
 	      else
 		gcc_unreachable ();
@@ -5274,6 +5525,7 @@ ira_initiate_assign (void)
     = (ira_allocno_t *) ira_allocate (sizeof (ira_allocno_t)
 				      * ira_allocnos_num);
   consideration_allocno_bitmap = ira_allocate_bitmap ();
+  uncolorable_allocno_set = ira_allocate_bitmap ();
   initiate_cost_update ();
   allocno_priorities = (int *) ira_allocate (sizeof (int) * ira_allocnos_num);
   sorted_copies = (ira_copy_t *) ira_allocate (ira_copies_num
@@ -5286,6 +5538,7 @@ ira_finish_assign (void)
 {
   ira_free (sorted_allocnos);
   ira_free_bitmap (consideration_allocno_bitmap);
+  ira_free_bitmap (uncolorable_allocno_set);
   finish_cost_update ();
   ira_free (allocno_priorities);
   ira_free (sorted_copies);
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index 0585ad10043..7aeed7202ce 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -173,25 +173,115 @@ build_conflict_bit_table (void)
   sparseset_free (objects_live);
   return true;
 }
-\f
-/* Return true iff allocnos A1 and A2 cannot be allocated to the same
-   register due to conflicts.  */
 
-static bool
-allocnos_conflict_for_copy_p (ira_allocno_t a1, ira_allocno_t a2)
+/* Check that X is REG or SUBREG of REG.  */
+#define REG_SUBREG_P(x)                                                        \
+  (REG_P (x) || (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))))
+
+/* Return true if OBJ1 and OBJ2 can be a move INSN.  */
+bool
+subreg_move_p (ira_object_t obj1, ira_object_t obj2)
 {
-  /* Due to the fact that we canonicalize conflicts (see
-     record_object_conflict), we only need to test for conflicts of
-     the lowest order words.  */
-  ira_object_t obj1 = ALLOCNO_OBJECT (a1, 0);
-  ira_object_t obj2 = ALLOCNO_OBJECT (a2, 0);
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  return ALLOCNO_CLASS (a1) != NO_REGS && ALLOCNO_CLASS (a2) != NO_REGS
+	 && (ALLOCNO_TRACK_SUBREG_P (a1) || ALLOCNO_TRACK_SUBREG_P (a2))
+	 && OBJECT_NREGS (obj1) == OBJECT_NREGS (obj2)
+	 && (OBJECT_NREGS (obj1) != ALLOCNO_NREGS (a1)
+	     || OBJECT_NREGS (obj2) != ALLOCNO_NREGS (a2));
+}
 
-  return OBJECTS_CONFLICT_P (obj1, obj2);
+/* Return true if ORIG_DEST_REG and ORIG_SRC_REG can be a move INSN.  */
+bool
+subreg_move_p (rtx orig_dest_reg, rtx orig_src_reg)
+{
+  gcc_assert (REG_SUBREG_P (orig_dest_reg) && REG_SUBREG_P (orig_src_reg));
+  rtx reg1
+    = SUBREG_P (orig_dest_reg) ? SUBREG_REG (orig_dest_reg) : orig_dest_reg;
+  rtx reg2 = SUBREG_P (orig_src_reg) ? SUBREG_REG (orig_src_reg) : orig_src_reg;
+  if (HARD_REGISTER_P (reg1) || HARD_REGISTER_P (reg2))
+    return false;
+  ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
+  ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
+  ira_object_t obj1 = find_object (a1, orig_dest_reg);
+  ira_object_t obj2 = find_object (a2, orig_src_reg);
+  return subreg_move_p (obj1, obj2);
 }
 
-/* Check that X is REG or SUBREG of REG.  */
-#define REG_SUBREG_P(x)							\
-   (REG_P (x) || (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))))
+/* Return true if OBJ1 and OBJ2 can allocate to the same register.  */
+static bool
+regs_non_conflict_for_copy_p (ira_object_t obj1, ira_object_t obj2,
+			      bool is_move, bool offset_equal)
+{
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  if (is_move && subreg_move_p (obj1, obj2))
+    {
+      if (OBJECTS_CONFLICT_P (obj1, obj2))
+	return false;
+      /* Assume a1 allocate to `OBJECT_START (obj2)` and a2 allocate to
+	 `OBJECT_START (obj1)` hard register, so both objects can use the same
+	 hard register `OBJECT_START (obj1) + OBJECT_START (obj2)`.  */
+      int start_regno1 = OBJECT_START (obj2);
+      int start_regno2 = OBJECT_START (obj1);
+
+      ira_object_t obj_a, obj_b;
+      ira_allocno_object_iterator oi_a, oi_b;
+      FOR_EACH_ALLOCNO_OBJECT (a1, obj_a, oi_a)
+	FOR_EACH_ALLOCNO_OBJECT (a2, obj_b, oi_b)
+	  /* If there have a conflict between a1 and a2 and prevent the
+	     allocation before, then obj1 and obj2 cannot be a copy.  */
+	  if (OBJECTS_CONFLICT_P (obj_a, obj_b)
+	      && !(start_regno1 + OBJECT_START (obj_a) + OBJECT_NREGS (obj_a)
+		     <= (start_regno2 + OBJECT_START (obj_b))
+		   || start_regno2 + OBJECT_START (obj_b) + OBJECT_NREGS (obj_b)
+			<= (start_regno1 + OBJECT_START (obj_a))))
+	      return false;
+
+      return true;
+    }
+  else
+    {
+      /* For normal case, make sure full_obj1 and full_obj2 can allocate to the
+	 same register.  */
+      ira_object_t full_obj1 = find_object (a1, 0, ALLOCNO_NREGS (a1));
+      ira_object_t full_obj2 = find_object (a2, 0, ALLOCNO_NREGS (a2));
+      return !OBJECTS_CONFLICT_P (full_obj1, full_obj2) && offset_equal;
+    }
+}
+
+/* Return true if ORIG_REG offset align in ALLOCNO_UNIT_SIZE (A) and times of
+   ALLOCNO_UNIT_SIZE (A). Use to forbidden bellow rtl which has a subreg move to
+   create copy (from testsuite/gcc.dg/vect/vect-simd-20.c on AArch64). Suppose
+   they are all allocated to the fourth register, that is, pseudo 127 is
+   allocated to w4, and pseudo 149 is allocated to x4 and x5. Then the third
+   instruction can be safely deleted without affecting the result of pseudo 149.
+   But when the second instruction is executed, the upper 32 bits of x4 will be
+   set to 0 (the behavior of the add instruction), that is to say, the result of
+   pseudo 149 is modified, and its 32~63 bits are set to 0, Not the desired
+   result.
+
+     (set (reg:SI 127)
+	  (subreg:SI (reg:TI 149) 0))
+     ...
+     (set (reg:SI 127)
+	  (plus:SI (reg:SI 127)
+		   (reg:SI 180)))
+     ...
+     (set (zero_extract:DI (subreg:DI (reg:TI 149) 0)
+			   (const_int 32 [0x20])
+			   (const_int 0 [0]))
+	  (subreg:DI (reg:SI 127) 0))  */
+static bool
+subreg_reg_align_and_times_p (ira_allocno_t a, rtx orig_reg)
+{
+  if (!has_subreg_object_p (a) || !SUBREG_P (orig_reg))
+    return true;
+
+  return multiple_p (SUBREG_BYTE (orig_reg), ALLOCNO_UNIT_SIZE (a))
+	 && multiple_p (GET_MODE_SIZE (GET_MODE (orig_reg)),
+			ALLOCNO_UNIT_SIZE (a));
+}
 
 /* Return X if X is a REG, otherwise it should be SUBREG of REG and
    the function returns the reg in this case.  *OFFSET will be set to
@@ -237,8 +327,9 @@ get_freq_for_shuffle_copy (int freq)
    SINGLE_INPUT_OP_HAS_CSTR_P is only meaningful when constraint_p
    is true, see function ira_get_dup_out_num for its meaning.  */
 static bool
-process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
-		       int freq, bool single_input_op_has_cstr_p = true)
+process_regs_for_copy (rtx orig_reg1, rtx orig_reg2, bool constraint_p,
+		       rtx_insn *insn, int freq,
+		       bool single_input_op_has_cstr_p = true)
 {
   int allocno_preferenced_hard_regno, index, offset1, offset2;
   int cost, conflict_cost, move_cost;
@@ -248,10 +339,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
   machine_mode mode;
   ira_copy_t cp;
 
-  gcc_assert (REG_SUBREG_P (reg1) && REG_SUBREG_P (reg2));
-  only_regs_p = REG_P (reg1) && REG_P (reg2);
-  reg1 = go_through_subreg (reg1, &offset1);
-  reg2 = go_through_subreg (reg2, &offset2);
+  gcc_assert (REG_SUBREG_P (orig_reg1) && REG_SUBREG_P (orig_reg2));
+  only_regs_p = REG_P (orig_reg1) && REG_P (orig_reg2);
+  rtx reg1 = go_through_subreg (orig_reg1, &offset1);
+  rtx reg2 = go_through_subreg (orig_reg2, &offset2);
   /* Set up hard regno preferenced by allocno.  If allocno gets the
      hard regno the copy (or potential move) insn will be removed.  */
   if (HARD_REGISTER_P (reg1))
@@ -270,13 +361,17 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
     {
       ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
       ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
+      ira_object_t obj1 = find_object (a1, orig_reg1);
+      ira_object_t obj2 = find_object (a2, orig_reg2);
 
-      if (!allocnos_conflict_for_copy_p (a1, a2)
-	  && offset1 == offset2
+      if (subreg_reg_align_and_times_p (a1, orig_reg1)
+	  && subreg_reg_align_and_times_p (a2, orig_reg2)
+	  && regs_non_conflict_for_copy_p (obj1, obj2, insn != NULL,
+					   offset1 == offset2)
 	  && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
 			GET_MODE_PRECISION (ALLOCNO_MODE (a2))))
 	{
-	  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
+	  cp = ira_add_allocno_copy (obj1, obj2, freq, constraint_p, insn,
 				     ira_curr_loop_tree_node);
 	  bitmap_set_bit (ira_curr_loop_tree_node->local_copies, cp->num);
 	  return true;
@@ -438,16 +533,15 @@ add_insn_allocno_copies (rtx_insn *insn)
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
   if (freq == 0)
     freq = 1;
-  if ((set = single_set (insn)) != NULL_RTX
-      && REG_SUBREG_P (SET_DEST (set)) && REG_SUBREG_P (SET_SRC (set))
-      && ! side_effects_p (set)
-      && find_reg_note (insn, REG_DEAD,
-			REG_P (SET_SRC (set))
-			? SET_SRC (set)
-			: SUBREG_REG (SET_SRC (set))) != NULL_RTX)
+  if ((set = single_set (insn)) != NULL_RTX && REG_SUBREG_P (SET_DEST (set))
+      && REG_SUBREG_P (SET_SRC (set)) && !side_effects_p (set)
+      && (find_reg_note (insn, REG_DEAD,
+			 REG_P (SET_SRC (set)) ? SET_SRC (set)
+					       : SUBREG_REG (SET_SRC (set)))
+	    != NULL_RTX
+	  || subreg_move_p (SET_DEST (set), SET_SRC (set))))
     {
-      process_regs_for_copy (SET_SRC (set), SET_DEST (set),
-			     false, insn, freq);
+      process_regs_for_copy (SET_SRC (set), SET_DEST (set), false, insn, freq);
       return;
     }
   /* Fast check of possibility of constraint or shuffle copies.  If
@@ -521,16 +615,23 @@ propagate_copies (void)
 
   FOR_EACH_COPY (cp, ci)
     {
-      a1 = cp->first;
-      a2 = cp->second;
+      a1 = OBJECT_ALLOCNO (cp->first);
+      a2 = OBJECT_ALLOCNO (cp->second);
       if (ALLOCNO_LOOP_TREE_NODE (a1) == ira_loop_tree_root)
 	continue;
       ira_assert ((ALLOCNO_LOOP_TREE_NODE (a2) != ira_loop_tree_root));
       parent_a1 = ira_parent_or_cap_allocno (a1);
       parent_a2 = ira_parent_or_cap_allocno (a2);
+      ira_object_t parent_obj1
+	= find_object_anyway (parent_a1, OBJECT_START (cp->first),
+			      OBJECT_NREGS (cp->first));
+      ira_object_t parent_obj2
+	= find_object_anyway (parent_a2, OBJECT_START (cp->second),
+			      OBJECT_NREGS (cp->second));
       ira_assert (parent_a1 != NULL && parent_a2 != NULL);
-      if (! allocnos_conflict_for_copy_p (parent_a1, parent_a2))
-	ira_add_allocno_copy (parent_a1, parent_a2, cp->freq,
+      if (regs_non_conflict_for_copy_p (parent_obj1, parent_obj2,
+					cp->insn != NULL, true))
+	ira_add_allocno_copy (parent_obj1, parent_obj2, cp->freq,
 			      cp->constraint_p, cp->insn, cp->loop_tree_node);
     }
 }
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index 9dc7f3c655e..30ff46980f5 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -1129,11 +1129,11 @@ add_range_and_copies_from_move_list (move_t list, ira_loop_tree_node_t node,
       update_costs (to, false, freq);
       cp = ira_add_allocno_copy (from, to, freq, false, move->insn, NULL);
       if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
-	fprintf (ira_dump_file, "    Adding cp%d:a%dr%d-a%dr%d\n",
-		 cp->num, ALLOCNO_NUM (cp->first),
-		 REGNO (allocno_emit_reg (cp->first)),
-		 ALLOCNO_NUM (cp->second),
-		 REGNO (allocno_emit_reg (cp->second)));
+	fprintf (ira_dump_file, "    Adding cp%d:a%dr%d-a%dr%d\n", cp->num,
+		 ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first)),
+		 REGNO (allocno_emit_reg (OBJECT_ALLOCNO (cp->first))),
+		 ALLOCNO_NUM (OBJECT_ALLOCNO (cp->second)),
+		 REGNO (allocno_emit_reg (OBJECT_ALLOCNO (cp->second))));
 
       nr = ALLOCNO_NUM_OBJECTS (from);
       for (i = 0; i < nr; i++)
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index b9e24328867..963e533e448 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -229,6 +229,8 @@ struct ira_object
 {
   /* The allocno associated with this record.  */
   ira_allocno_t allocno;
+  /* Index in allocno->objects array */
+  unsigned int index;
   /* Vector of accumulated conflicting conflict_redords with NULL end
      marker (if OBJECT_CONFLICT_VEC_P is true) or conflict bit vector
      otherwise.  */
@@ -522,6 +524,7 @@ allocno_emit_reg (ira_allocno_t a)
 }
 
 #define OBJECT_ALLOCNO(O) ((O)->allocno)
+#define OBJECT_INDEX(O) ((O)->index)
 #define OBJECT_CONFLICT_ARRAY(O) ((O)->conflicts_array)
 #define OBJECT_CONFLICT_VEC(O) ((ira_object_t *)(O)->conflicts_array)
 #define OBJECT_CONFLICT_BITVEC(O) ((IRA_INT_TYPE *)(O)->conflicts_array)
@@ -591,9 +594,9 @@ struct ira_allocno_copy
 {
   /* The unique order number of the copy node starting with 0.  */
   int num;
-  /* Allocnos connected by the copy.  The first allocno should have
+  /* Objects connected by the copy.  The first allocno should have
      smaller order number than the second one.  */
-  ira_allocno_t first, second;
+  ira_object_t first, second;
   /* Execution frequency of the copy.  */
   int freq;
   bool constraint_p;
@@ -1043,6 +1046,9 @@ extern void ira_remove_allocno_prefs (ira_allocno_t);
 extern ira_copy_t ira_create_copy (ira_allocno_t, ira_allocno_t,
 				   int, bool, rtx_insn *,
 				   ira_loop_tree_node_t);
+extern ira_copy_t
+ira_add_allocno_copy (ira_object_t, ira_object_t, int, bool, rtx_insn *,
+		      ira_loop_tree_node_t);
 extern ira_copy_t ira_add_allocno_copy (ira_allocno_t, ira_allocno_t, int,
 					bool, rtx_insn *,
 					ira_loop_tree_node_t);
@@ -1056,6 +1062,7 @@ extern void ira_destroy (void);
 extern ira_object_t
 find_object (ira_allocno_t, int, int);
 extern ira_object_t find_object (ira_allocno_t, poly_int64, poly_int64);
+extern ira_object_t find_object (ira_allocno_t, rtx);
 ira_object_t
 find_object_anyway (ira_allocno_t a, int start, int nregs);
 extern void ira_copy_allocno_objects (ira_allocno_t, ira_allocno_t);
@@ -1084,6 +1091,8 @@ extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *,
 /* ira-conflicts.cc */
 extern void ira_debug_conflicts (bool);
 extern void ira_build_conflicts (void);
+extern bool subreg_move_p (ira_object_t, ira_object_t);
+extern bool subreg_move_p (rtx, rtx);
 
 /* ira-color.cc */
 extern ira_allocno_t ira_soft_conflict (ira_allocno_t, ira_allocno_t);
diff --git a/gcc/ira.cc b/gcc/ira.cc
index b9159d089c3..739ef28af6e 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2853,14 +2853,15 @@ print_redundant_copies (void)
       if (hard_regno >= 0)
 	continue;
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
-	if (cp->first == a)
+	if (OBJECT_ALLOCNO (cp->first) == a)
 	  next_cp = cp->next_first_allocno_copy;
 	else
 	  {
 	    next_cp = cp->next_second_allocno_copy;
 	    if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL
 		&& cp->insn != NULL_RTX
-		&& ALLOCNO_HARD_REGNO (cp->first) == hard_regno)
+		&& ALLOCNO_HARD_REGNO (OBJECT_ALLOCNO (cp->first))
+		     == hard_regno)
 	      fprintf (ira_dump_file,
 		       "        Redundant move from %d(freq %d):%d\n",
 		       INSN_UID (cp->insn), cp->freq, hard_regno);
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (3 preceding siblings ...)
  2023-11-08  3:47 ` [PATCH 4/7] ira: Support subreg copy Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:47 ` [PATCH 6/7] lra: Apply live_subreg df_problem to lra pass Lehua Ding
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch completely relax to track all eligible subregs.

gcc/ChangeLog:

	* ira-build.cc (get_reg_unit_size): New.
	(has_same_nregs): New.
	(ira_set_allocno_class): Relax.

---
 gcc/ira-build.cc | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 1c47f81ce9d..379f877ca67 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (REGNO_REG_CLASS (i) != NO_REGS
+	&& reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+	&& ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+      return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
    registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
 
   if (aclass == NO_REGS)
     return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+     hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
     {
-      ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+      ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
       ALLOCNO_TRACK_SUBREG_P (a) = true;
       return;
     }
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 6/7] lra: Apply live_subreg df_problem to lra pass
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (4 preceding siblings ...)
  2023-11-08  3:47 ` [PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:47 ` [PATCH 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch change the use of old live data to the new live_subreg data.

gcc/ChangeLog:

	* lra-coalesce.cc (update_live_info): Update.
	(lra_coalesce): Update.
	* lra-constraints.cc (update_ebb_live_info): Update.
	(get_live_on_other_edges): Update.
	(inherit_in_ebb): Update.
	(lra_inheritance): Update.
	(fix_bb_live_info): Update.
	(remove_inheritance_pseudos): Update.
	* lra-lives.cc (make_hard_regno_live): Update.
	(make_hard_regno_dead): Update.
	(mark_regno_live): Update.
	(mark_regno_dead): Update.
	(class bb_data_pseudos): Update.
	(live_trans_fun): Update.
	(live_con_fun_0): Update.
	(live_con_fun_n): Update.
	(initiate_live_solver): Update.
	(finish_live_solver): Update.
	(process_bb_lives): Update.
	(lra_create_live_ranges_1): Update.
	* lra-remat.cc (dump_candidates_and_remat_bb_data): Update.
	(calculate_livein_cands): Update.
	(do_remat): Update.
	* lra-spills.cc (spill_pseudos): Update.

---
 gcc/lra-coalesce.cc    |  20 ++-
 gcc/lra-constraints.cc |  93 ++++++++++---
 gcc/lra-lives.cc       | 308 ++++++++++++++++++++++++++++++++---------
 gcc/lra-remat.cc       |  13 +-
 gcc/lra-spills.cc      |  22 ++-
 5 files changed, 354 insertions(+), 102 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
    bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (&used_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, all,
 			    FIRST_PSEUDO_REGISTER, j, bi)
     bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (&used_pseudos_bitmap))
     {
-      bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
-      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+      bitmap_and_compl_into (all, &coalesced_pseudos_bitmap);
+      bitmap_ior_into (all, &used_pseudos_bitmap);
+
+      bitmap_and_compl_into (full, &coalesced_pseudos_bitmap);
+      bitmap_ior_and_compl_into (full, &used_pseudos_bitmap, partial);
+
+      bitmap_and_compl_into (partial, &coalesced_pseudos_bitmap);
+      bitmap_ior_and_compl_into (partial, &used_pseudos_bitmap, full);
     }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (&used_pseudos_bitmap, &reg_obstack);
   FOR_EACH_BB_FN (bb, cfun)
     {
-      update_live_info (df_get_live_in (bb));
-      update_live_info (df_get_live_out (bb));
+      update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+			DF_LIVE_SUBREG_PARTIAL_IN (bb));
+      update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+			DF_LIVE_SUBREG_PARTIAL_OUT (bb));
       FOR_BB_INSNS_SAFE (bb, insn, next)
 	if (INSN_P (insn)
 	    && bitmap_bit_p (&involved_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
 	{
 	  if (prev_bb != NULL)
 	    {
-	      /* Update df_get_live_in (prev_bb):  */
+	      /* Update subreg live (prev_bb):  */
+	      bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+	      bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+	      bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+	      subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
 	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
 		if (bitmap_bit_p (&live_regs, j))
-		  bitmap_set_bit (df_get_live_in (prev_bb), j);
-		else
-		  bitmap_clear_bit (df_get_live_in (prev_bb), j);
+		  {
+		    bitmap_set_bit (subreg_all_in, j);
+		    bitmap_set_bit (subreg_full_in, j);
+		    if (bitmap_bit_p (subreg_partial_in, j))
+		      {
+			bitmap_clear_bit (subreg_partial_in, j);
+			range_in->remove_live (j);
+		      }
+		  }
+		else if (bitmap_bit_p (subreg_all_in, j))
+		  {
+		    bitmap_clear_bit (subreg_all_in, j);
+		    bitmap_clear_bit (subreg_full_in, j);
+		    if (bitmap_bit_p (subreg_partial_in, j))
+		      {
+			bitmap_clear_bit (subreg_partial_in, j);
+			range_in->remove_live (j);
+		      }
+		  }
 	    }
+	  bitmap subreg_all_out = DF_LIVE_SUBREG_OUT (curr_bb);
 	  if (curr_bb != last_bb)
 	    {
-	      /* Update df_get_live_out (curr_bb):  */
+	      /* Update subreg live (curr_bb):  */
+	      bitmap subreg_all_out = DF_LIVE_SUBREG_OUT (curr_bb);
+	      bitmap subreg_full_out = DF_LIVE_SUBREG_FULL_OUT (curr_bb);
+	      bitmap subreg_partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (curr_bb);
+	      subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (curr_bb);
 	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
 		{
 		  live_p = bitmap_bit_p (&live_regs, j);
 		  if (! live_p)
 		    FOR_EACH_EDGE (e, ei, curr_bb->succs)
-		      if (bitmap_bit_p (df_get_live_in (e->dest), j))
+		      if (bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), j))
 			{
 			  live_p = true;
 			  break;
 			}
 		  if (live_p)
-		    bitmap_set_bit (df_get_live_out (curr_bb), j);
-		  else
-		    bitmap_clear_bit (df_get_live_out (curr_bb), j);
+		    {
+		      bitmap_set_bit (subreg_all_out, j);
+		      bitmap_set_bit (subreg_full_out, j);
+		      if (bitmap_bit_p (subreg_partial_out, j))
+			{
+			  bitmap_clear_bit (subreg_partial_out, j);
+			  range_out->remove_live (j);
+			}
+		    }
+		  else if (bitmap_bit_p (subreg_all_out, j))
+		    {
+		      bitmap_clear_bit (subreg_all_out, j);
+		      bitmap_clear_bit (subreg_full_out, j);
+		      if (bitmap_bit_p (subreg_partial_out, j))
+			{
+			  bitmap_clear_bit (subreg_partial_out, j);
+			  range_out->remove_live (j);
+			}
+		    }
 		}
 	    }
 	  prev_bb = curr_bb;
-	  bitmap_and (&live_regs, &check_only_regs, df_get_live_out (curr_bb));
+	  bitmap_and (&live_regs, &check_only_regs, subreg_all_out);
 	}
       if (! NONDEBUG_INSN_P (curr_insn))
 	continue;
@@ -6715,7 +6756,7 @@ get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
   bitmap_clear (res);
   FOR_EACH_EDGE (e, ei, from->succs)
     if (e->dest != to)
-      bitmap_ior_into (res, df_get_live_in (e->dest));
+      bitmap_ior_into (res, DF_LIVE_SUBREG_IN (e->dest));
   last = get_last_insertion_point (from);
   if (! JUMP_P (last))
     return;
@@ -6787,7 +6828,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
 	{
 	  /* We are at the end of BB.  Add qualified living
 	     pseudos for potential splitting.  */
-	  to_process = df_get_live_out (curr_bb);
+	  to_process = DF_LIVE_SUBREG_OUT (curr_bb);
 	  if (last_processed_bb != NULL)
 	    {
 	      /* We are somewhere in the middle of EBB.	 */
@@ -7159,7 +7200,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
 	{
 	  /* We reached the beginning of the current block -- do
 	     rest of spliting in the current BB.  */
-	  to_process = df_get_live_in (curr_bb);
+	  to_process = DF_LIVE_SUBREG_IN (curr_bb);
 	  if (BLOCK_FOR_INSN (head) != curr_bb)
 	    {
 	      /* We are somewhere in the middle of EBB.	 */
@@ -7236,7 +7277,7 @@ lra_inheritance (void)
 	fprintf (lra_dump_file, "EBB");
       /* Form a EBB starting with BB.  */
       bitmap_clear (&ebb_global_regs);
-      bitmap_ior_into (&ebb_global_regs, df_get_live_in (bb));
+      bitmap_ior_into (&ebb_global_regs, DF_LIVE_SUBREG_IN (bb));
       for (;;)
 	{
 	  if (lra_dump_file != NULL)
@@ -7252,7 +7293,7 @@ lra_inheritance (void)
 	    break;
 	  bb = bb->next_bb;
 	}
-      bitmap_ior_into (&ebb_global_regs, df_get_live_out (bb));
+      bitmap_ior_into (&ebb_global_regs, DF_LIVE_SUBREG_OUT (bb));
       if (lra_dump_file != NULL)
 	fprintf (lra_dump_file, "\n");
       if (inherit_in_ebb (BB_HEAD (start_bb), BB_END (bb)))
@@ -7281,15 +7322,23 @@ int lra_undo_inheritance_iter;
 /* Fix BB live info LIVE after removing pseudos created on pass doing
    inheritance/split which are REMOVED_PSEUDOS.	 */
 static void
-fix_bb_live_info (bitmap live, bitmap removed_pseudos)
+fix_bb_live_info (bitmap all, bitmap full, bitmap partial,
+		  bitmap removed_pseudos)
 {
   unsigned int regno;
   bitmap_iterator bi;
 
   EXECUTE_IF_SET_IN_BITMAP (removed_pseudos, 0, regno, bi)
-    if (bitmap_clear_bit (live, regno)
-	&& REG_P (lra_reg_info[regno].restore_rtx))
-      bitmap_set_bit (live, REGNO (lra_reg_info[regno].restore_rtx));
+    {
+      if (bitmap_clear_bit (all, regno)
+	  && REG_P (lra_reg_info[regno].restore_rtx))
+	{
+	  bitmap_set_bit (all, REGNO (lra_reg_info[regno].restore_rtx));
+	  bitmap_clear_bit (full, regno);
+	  bitmap_set_bit (full, REGNO (lra_reg_info[regno].restore_rtx));
+	  gcc_assert (!bitmap_bit_p (partial, regno));
+	}
+    }
 }
 
 /* Return regno of the (subreg of) REG. Otherwise, return a negative
@@ -7355,8 +7404,10 @@ remove_inheritance_pseudos (bitmap remove_pseudos)
      constraint pass.  */
   FOR_EACH_BB_FN (bb, cfun)
     {
-      fix_bb_live_info (df_get_live_in (bb), remove_pseudos);
-      fix_bb_live_info (df_get_live_out (bb), remove_pseudos);
+      fix_bb_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+			DF_LIVE_SUBREG_PARTIAL_IN (bb), remove_pseudos);
+      fix_bb_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+			DF_LIVE_SUBREG_PARTIAL_OUT (bb), remove_pseudos);
       FOR_BB_INSNS_REVERSE (bb, curr_insn)
 	{
 	  if (! INSN_P (curr_insn))
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f60e564da82..477b82786cf 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -287,7 +287,7 @@ make_hard_regno_live (int regno)
   SET_HARD_REG_BIT (hard_regs_live, regno);
   sparseset_set_bit (start_living, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
-    bitmap_set_bit (bb_gen_pseudos, regno);
+    bitmap_set_bit (&curr_bb_info->full_use, regno);
 }
 
 /* Process the definition of hard register REGNO.  This updates
@@ -310,8 +310,8 @@ make_hard_regno_dead (int regno)
   sparseset_set_bit (start_dying, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
     {
-      bitmap_clear_bit (bb_gen_pseudos, regno);
-      bitmap_set_bit (bb_killed_pseudos, regno);
+      bitmap_clear_bit (&curr_bb_info->full_use, regno);
+      bitmap_set_bit (&curr_bb_info->full_def, regno);
     }
 }
 
@@ -355,7 +355,9 @@ mark_regno_live (int regno, machine_mode mode)
   else
     {
       mark_pseudo_live (regno);
-      bitmap_set_bit (bb_gen_pseudos, regno);
+      bitmap_set_bit (&curr_bb_info->full_use, regno);
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_def, regno));
     }
 }
 
@@ -375,8 +377,10 @@ mark_regno_dead (int regno, machine_mode mode)
   else
     {
       mark_pseudo_dead (regno);
-      bitmap_clear_bit (bb_gen_pseudos, regno);
-      bitmap_set_bit (bb_killed_pseudos, regno);
+      bitmap_clear_bit (&curr_bb_info->full_use, regno);
+      bitmap_set_bit (&curr_bb_info->full_def, regno);
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_def, regno));
     }
 }
 
@@ -387,23 +391,6 @@ mark_regno_dead (int regno, machine_mode mode)
    border.  That might be a consequence of some global transformations
    in LRA, e.g. PIC pseudo reuse or rematerialization.  */
 
-/* Structure describing local BB data used for pseudo
-   live-analysis.  */
-class bb_data_pseudos
-{
-public:
-  /* Basic block about which the below data are.  */
-  basic_block bb;
-  bitmap_head killed_pseudos; /* pseudos killed in the BB.  */
-  bitmap_head gen_pseudos; /* pseudos generated in the BB.  */
-};
-
-/* Array for all BB data.  Indexed by the corresponding BB index.  */
-typedef class bb_data_pseudos *bb_data_t;
-
-/* All basic block data are referred through the following array.  */
-static bb_data_t bb_data;
-
 /* Two small functions for access to the bb data.  */
 static inline bb_data_t
 get_bb_data (basic_block bb)
@@ -430,13 +417,93 @@ static bool
 live_trans_fun (int bb_index)
 {
   basic_block bb = get_bb_data_by_index (bb_index)->bb;
-  bitmap bb_liveout = df_get_live_out (bb);
-  bitmap bb_livein = df_get_live_in (bb);
+  bitmap full_out = DF_LIVE_SUBREG_FULL_OUT (bb);
+  bitmap full_in = DF_LIVE_SUBREG_FULL_IN (bb);
+  bitmap partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+  bitmap partial_in = DF_LIVE_SUBREG_PARTIAL_IN (bb);
+  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
+  subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (bb);
   bb_data_t bb_info = get_bb_data (bb);
 
-  bitmap_and_compl (&temp_bitmap, bb_liveout, &all_hard_regs_bitmap);
-  return bitmap_ior_and_compl (bb_livein, &bb_info->gen_pseudos,
-			       &temp_bitmap, &bb_info->killed_pseudos);
+  if (!has_subreg_live_p)
+    {
+      bitmap_and_compl (&temp_bitmap, full_out, &all_hard_regs_bitmap);
+      return bitmap_ior_and_compl (full_in, &bb_info->full_use, &temp_bitmap,
+				   &bb_info->full_def);
+    }
+
+  /* If there has subreg live need be tracked.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  bool changed = false;
+  bitmap_head temp_full_out;
+  bitmap_head temp_partial_out;
+  bitmap_head temp_partial_be_full_out;
+  bitmap_head all_def;
+  subregs_live temp_range_out;
+  bitmap_initialize (&temp_full_out, &reg_obstack);
+  bitmap_initialize (&temp_partial_out, &reg_obstack);
+  bitmap_initialize (&temp_partial_be_full_out, &reg_obstack);
+  bitmap_initialize (&all_def, &reg_obstack);
+
+  bitmap_and_compl (&temp_full_out, full_out, &all_hard_regs_bitmap);
+
+  bitmap_ior (&all_def, &bb_info->full_def, &bb_info->partial_def);
+
+  bitmap_and (&temp_partial_out, &temp_full_out, &bb_info->partial_def);
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_out, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      subreg_ranges temp (bb_info->range_def->lives.at (regno).max);
+      temp.make_full ();
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      temp_range_out.add_ranges (regno, temp);
+    }
+  bitmap_ior_and_compl_into (&temp_partial_out, partial_out, &all_def);
+  EXECUTE_IF_AND_COMPL_IN_BITMAP (partial_out, &all_def, FIRST_PSEUDO_REGISTER,
+				  regno, bi)
+    {
+      temp_range_out.add_ranges (regno, range_out->lives.at (regno));
+    }
+  EXECUTE_IF_AND_IN_BITMAP (partial_out, &bb_info->partial_def, 0, regno, bi)
+    {
+      subreg_ranges temp = range_out->lives.at (regno);
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      if (!temp.empty_p ())
+	{
+	  bitmap_set_bit (&temp_partial_out, regno);
+	  temp_range_out.add_ranges (regno, temp);
+	}
+    }
+
+  temp_range_out.add_lives (*bb_info->range_use);
+  EXECUTE_IF_AND_IN_BITMAP (&temp_partial_out, &bb_info->partial_use, 0, regno,
+			    bi)
+    {
+      subreg_ranges temp = temp_range_out.lives.at (regno);
+      temp.add_ranges (bb_info->range_use->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_partial_be_full_out, regno);
+	  temp_range_out.remove_live (regno);
+	}
+    }
+
+  bitmap_ior_and_compl_into (&temp_partial_be_full_out, &temp_full_out,
+			     &all_def);
+  changed
+    |= bitmap_ior (full_in, &temp_partial_be_full_out, &bb_info->full_use);
+
+  bitmap_ior_into (&temp_partial_out, &bb_info->partial_use);
+  changed |= bitmap_and_compl (partial_in, &temp_partial_out,
+			       &temp_partial_be_full_out);
+  changed |= range_in->copy_lives (temp_range_out);
+
+  bitmap_clear (&temp_full_out);
+  bitmap_clear (&temp_partial_out);
+  bitmap_clear (&temp_partial_be_full_out);
+  bitmap_clear (&all_def);
+
+  return changed;
 }
 
 /* The confluence function used by the DF equation solver to set up
@@ -444,7 +511,9 @@ live_trans_fun (int bb_index)
 static void
 live_con_fun_0 (basic_block bb)
 {
-  bitmap_and_into (df_get_live_out (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_OUT (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_FULL_OUT (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_PARTIAL_OUT (bb), &all_hard_regs_bitmap);
 }
 
 /* The confluence function used by the DF equation solver to propagate
@@ -456,13 +525,77 @@ live_con_fun_0 (basic_block bb)
 static bool
 live_con_fun_n (edge e)
 {
-  basic_block bb = e->src;
-  basic_block dest = e->dest;
-  bitmap bb_liveout = df_get_live_out (bb);
-  bitmap dest_livein = df_get_live_in (dest);
+  class df_live_subreg_bb_info *src_bb_info
+    = df_live_subreg_get_bb_info (e->src->index);
+  class df_live_subreg_bb_info *dest_bb_info
+    = df_live_subreg_get_bb_info (e->dest->index);
 
-  return bitmap_ior_and_compl_into (bb_liveout,
-				    dest_livein, &all_hard_regs_bitmap);
+  if (!has_subreg_live_p)
+    {
+      return bitmap_ior_and_compl_into (&src_bb_info->full_out,
+					&dest_bb_info->full_in,
+					&all_hard_regs_bitmap);
+    }
+
+  /* If there has subreg live need be tracked. Calculation formula:
+       temp_full mean:
+	 1. partial in out/in, full in other in/out
+	 2. partial in out and in, and mrege range is full
+       temp_range mean:
+	 the range of regno which partial live
+       src_bb_info->partial_out = (src_bb_info->partial_out |
+     dest_bb_info->partial_in) & ~temp_full src_bb_info->range_out = copy
+     (temp_range) src_bb_info->full_out |= dest_bb_info->full_in | temp_full
+       */
+  subregs_live temp_range;
+  temp_range.add_lives (*src_bb_info->range_out);
+  temp_range.add_lives (*dest_bb_info->range_in);
+
+  bitmap_head temp_partial_all;
+  bitmap_initialize (&temp_partial_all, &bitmap_default_obstack);
+  bitmap_ior (&temp_partial_all, &src_bb_info->partial_out,
+	      &dest_bb_info->partial_in);
+
+  bitmap_head temp_full;
+  bitmap_initialize (&temp_full, &bitmap_default_obstack);
+
+  /* Collect regno that become full after merge src_bb_info->partial_out
+     and dest_bb_info->partial_in.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_all, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      if (bitmap_bit_p (&src_bb_info->full_out, regno)
+	  || bitmap_bit_p (&dest_bb_info->full_in, regno))
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	  continue;
+	}
+      else if (!bitmap_bit_p (&src_bb_info->partial_out, regno)
+	       || !bitmap_bit_p (&dest_bb_info->partial_in, regno))
+	continue;
+
+      subreg_ranges temp = src_bb_info->range_out->lives.at (regno);
+      temp.add_ranges (dest_bb_info->range_in->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	}
+    }
+
+  /* Calculating src_bb_info->partial_out and src_bb_info->range_out.  */
+  bool changed = bitmap_and_compl (&src_bb_info->partial_out, &temp_partial_all,
+				   &temp_full);
+  changed |= src_bb_info->range_out->copy_lives (temp_range);
+
+  /* Calculating src_bb_info->full_out.  */
+  bitmap_ior_and_compl_into (&temp_full, &dest_bb_info->full_in,
+			     &all_hard_regs_bitmap);
+  changed |= bitmap_ior_into (&src_bb_info->full_out, &temp_full);
+
+  return changed;
 }
 
 /* Indexes of all function blocks.  */
@@ -483,8 +616,12 @@ initiate_live_solver (void)
     {
       bb_data_t bb_info = get_bb_data (bb);
       bb_info->bb = bb;
-      bitmap_initialize (&bb_info->killed_pseudos, &reg_obstack);
-      bitmap_initialize (&bb_info->gen_pseudos, &reg_obstack);
+      bitmap_initialize (&bb_info->full_def, &reg_obstack);
+      bitmap_initialize (&bb_info->partial_def, &reg_obstack);
+      bitmap_initialize (&bb_info->full_use, &reg_obstack);
+      bitmap_initialize (&bb_info->partial_use, &reg_obstack);
+      bb_info->range_def = new subregs_live ();
+      bb_info->range_use = new subregs_live ();
       bitmap_set_bit (&all_blocks, bb->index);
     }
 }
@@ -499,8 +636,12 @@ finish_live_solver (void)
   FOR_ALL_BB_FN (bb, cfun)
     {
       bb_data_t bb_info = get_bb_data (bb);
-      bitmap_clear (&bb_info->killed_pseudos);
-      bitmap_clear (&bb_info->gen_pseudos);
+      bitmap_clear (&bb_info->full_def);
+      bitmap_clear (&bb_info->partial_def);
+      bitmap_clear (&bb_info->full_use);
+      bitmap_clear (&bb_info->partial_use);
+      delete bb_info->range_def;
+      delete bb_info->range_use;
     }
   free (bb_data);
   bitmap_clear (&all_hard_regs_bitmap);
@@ -663,7 +804,9 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   /* Only has a meaningful value once we've seen a call.  */
   function_abi last_call_abi = default_function_abi;
 
-  reg_live_out = df_get_live_out (bb);
+  reg_live_out = DF_LIVE_SUBREG_OUT (bb);
+  bitmap reg_live_partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
   sparseset_clear (pseudos_live);
   sparseset_clear (pseudos_live_through_calls);
   sparseset_clear (pseudos_live_through_setjumps);
@@ -675,10 +818,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       mark_pseudo_live (j);
     }
 
-  bb_gen_pseudos = &get_bb_data (bb)->gen_pseudos;
-  bb_killed_pseudos = &get_bb_data (bb)->killed_pseudos;
-  bitmap_clear (bb_gen_pseudos);
-  bitmap_clear (bb_killed_pseudos);
+  curr_bb_info = get_bb_data (bb);
+  bitmap_clear (&curr_bb_info->full_use);
+  bitmap_clear (&curr_bb_info->partial_use);
+  bitmap_clear (&curr_bb_info->full_def);
+  bitmap_clear (&curr_bb_info->partial_def);
+  curr_bb_info->range_use->clear ();
+  curr_bb_info->range_def->clear ();
   freq = REG_FREQ_FROM_BB (bb);
 
   if (lra_dump_file != NULL)
@@ -1101,16 +1247,16 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   bool live_change_p = false;
   /* Check if bb border live info was changed.  */
   unsigned int live_pseudos_num = 0;
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb),
-			    FIRST_PSEUDO_REGISTER, j, bi)
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			    bi)
     {
       live_pseudos_num++;
-      if (! sparseset_bit_p (pseudos_live, j))
+      if (!sparseset_bit_p (pseudos_live, j))
 	{
 	  live_change_p = true;
 	  if (lra_dump_file != NULL)
-	    fprintf (lra_dump_file,
-		     "  r%d is removed as live at bb%d start\n", j, bb->index);
+	    fprintf (lra_dump_file, "  r%d is removed as live at bb%d start\n",
+		     j, bb->index);
 	  break;
 	}
     }
@@ -1120,9 +1266,9 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       live_change_p = true;
       if (lra_dump_file != NULL)
 	EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
-	  if (! bitmap_bit_p (df_get_live_in (bb), j))
-	    fprintf (lra_dump_file,
-		     "  r%d is added to live at bb%d start\n", j, bb->index);
+      if (!bitmap_bit_p (DF_LIVE_SUBREG_IN (bb), j))
+	fprintf (lra_dump_file, "  r%d is added to live at bb%d start\n", j,
+		 bb->index);
     }
   /* See if we'll need an increment at the end of this basic block.
      An increment is needed if the PSEUDOS_LIVE set is not empty,
@@ -1135,8 +1281,9 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       mark_pseudo_dead (i);
     }
 
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, j, bi)
-    {
+    EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			      bi)
+      {
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
@@ -1151,7 +1298,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (!TEST_HARD_REG_BIT (hard_regs_spilled_into, i))
 	continue;
 
-      if (bitmap_bit_p (df_get_live_in (bb), i))
+      if (bitmap_bit_p (DF_LIVE_SUBREG_IN (bb), i))
 	continue;
 
       live_change_p = true;
@@ -1159,7 +1306,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	fprintf (lra_dump_file,
 		 "  hard reg r%d is added to live at bb%d start\n", i,
 		 bb->index);
-      bitmap_set_bit (df_get_live_in (bb), i);
+      bitmap_set_bit (DF_LIVE_SUBREG_IN (bb), i);
+      bitmap_set_bit (DF_LIVE_SUBREG_FULL_IN (bb), i);
     }
 
   if (need_curr_point_incr)
@@ -1425,10 +1573,24 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	 disappear, e.g. pseudos with used equivalences.  */
       FOR_EACH_BB_FN (bb, cfun)
 	{
-	  bitmap_clear_range (df_get_live_in (bb), FIRST_PSEUDO_REGISTER,
+	  bitmap_clear_range (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_FULL_IN (bb),
+			      FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_PARTIAL_IN (bb),
+			      FIRST_PSEUDO_REGISTER,
 			      max_regno - FIRST_PSEUDO_REGISTER);
-	  bitmap_clear_range (df_get_live_out (bb), FIRST_PSEUDO_REGISTER,
+	  bitmap_clear_range (DF_LIVE_SUBREG_OUT (bb), FIRST_PSEUDO_REGISTER,
 			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_FULL_OUT (bb),
+			      FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+			      FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  DF_LIVE_SUBREG_RANGE_IN (bb)->clear ();
+	  DF_LIVE_SUBREG_RANGE_OUT (bb)->clear ();
 	}
       /* As we did not change CFG since LRA start we can use
 	 DF-infrastructure solver to solve live data flow problem.  */
@@ -1441,6 +1603,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	(DF_BACKWARD, NULL, live_con_fun_0, live_con_fun_n,
 	 live_trans_fun, &all_blocks,
 	 df_get_postorder (DF_BACKWARD), df_get_n_blocks (DF_BACKWARD));
+      df_live_subreg_finalize (&all_blocks);
+
       if (lra_dump_file != NULL)
 	{
 	  fprintf (lra_dump_file,
@@ -1449,16 +1613,28 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	  FOR_EACH_BB_FN (bb, cfun)
 	    {
 	      bb_data_t bb_info = get_bb_data (bb);
-	      bitmap bb_livein = df_get_live_in (bb);
-	      bitmap bb_liveout = df_get_live_out (bb);
 
 	      fprintf (lra_dump_file, "\nBB %d:\n", bb->index);
-	      lra_dump_bitmap_with_title ("  gen:",
-					  &bb_info->gen_pseudos, bb->index);
-	      lra_dump_bitmap_with_title ("  killed:",
-					  &bb_info->killed_pseudos, bb->index);
-	      lra_dump_bitmap_with_title ("  livein:", bb_livein, bb->index);
-	      lra_dump_bitmap_with_title ("  liveout:", bb_liveout, bb->index);
+	      lra_dump_bitmap_with_title ("  full use", &bb_info->full_use,
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  partial use",
+					  &bb_info->partial_use, bb->index);
+	      lra_dump_bitmap_with_title ("  full def", &bb_info->full_def,
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  partial def",
+					  &bb_info->partial_def, bb->index);
+	      lra_dump_bitmap_with_title ("  live in full",
+					  DF_LIVE_SUBREG_FULL_IN (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live in partial",
+					  DF_LIVE_SUBREG_PARTIAL_IN (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live out full",
+					  DF_LIVE_SUBREG_FULL_OUT (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live out partial",
+					  DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+					  bb->index);
 	    }
 	}
     }
diff --git a/gcc/lra-remat.cc b/gcc/lra-remat.cc
index 681dcf36331..26d3da07b00 100644
--- a/gcc/lra-remat.cc
+++ b/gcc/lra-remat.cc
@@ -556,11 +556,11 @@ dump_candidates_and_remat_bb_data (void)
       fprintf (lra_dump_file, "\nBB %d:\n", bb->index);
       /* Livein */
       fprintf (lra_dump_file, "  register live in:");
-      dump_regset (df_get_live_in (bb), lra_dump_file);
+      dump_regset (DF_LIVE_SUBREG_IN (bb), lra_dump_file);
       putc ('\n', lra_dump_file);
       /* Liveout */
       fprintf (lra_dump_file, "  register live out:");
-      dump_regset (df_get_live_out (bb), lra_dump_file);
+      dump_regset (DF_LIVE_SUBREG_OUT (bb), lra_dump_file);
       putc ('\n', lra_dump_file);
       /* Changed/dead regs: */
       fprintf (lra_dump_file, "  changed regs:");
@@ -727,7 +727,7 @@ calculate_livein_cands (void)
 
   FOR_EACH_BB_FN (bb, cfun)
     {
-      bitmap livein_regs = df_get_live_in (bb);
+      bitmap livein_regs = DF_LIVE_SUBREG_IN (bb);
       bitmap livein_cands = &get_remat_bb_data (bb)->livein_cands;
       for (unsigned int i = 0; i < cands_num; i++)
 	{
@@ -1064,11 +1064,10 @@ do_remat (void)
   FOR_EACH_BB_FN (bb, cfun)
     {
       CLEAR_HARD_REG_SET (live_hard_regs);
-      EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), 0, regno, bi)
 	{
-	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
-			   ? regno
-			   : reg_renumber[regno];
+	  int hard_regno
+	    = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
 	  if (hard_regno >= 0)
 	    SET_HARD_REG_BIT (live_hard_regs, hard_regno);
 	}
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index a663a1931e3..d38a2ffe2a7 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -566,8 +566,26 @@ spill_pseudos (void)
 			 "Debug insn #%u is reset because it referenced "
 			 "removed pseudo\n", INSN_UID (insn));
 	    }
-	  bitmap_and_compl_into (df_get_live_in (bb), spilled_pseudos);
-	  bitmap_and_compl_into (df_get_live_out (bb), spilled_pseudos);
+	  unsigned int regno;
+	  bitmap_iterator bi;
+
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_IN (bb), spilled_pseudos);
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_FULL_IN (bb), spilled_pseudos);
+	  bitmap partial_in = DF_LIVE_SUBREG_PARTIAL_IN (bb);
+	  subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (bb);
+	  EXECUTE_IF_AND_IN_BITMAP (partial_in, spilled_pseudos,
+				    FIRST_PSEUDO_REGISTER, regno, bi)
+	    range_in->remove_live (regno);
+	  bitmap_and_compl_into (partial_in, spilled_pseudos);
+
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_OUT (bb), spilled_pseudos);
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_FULL_OUT (bb), spilled_pseudos);
+	  bitmap partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+	  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
+	  EXECUTE_IF_AND_IN_BITMAP (partial_out, spilled_pseudos,
+				    FIRST_PSEUDO_REGISTER, regno, bi)
+	    range_out->remove_live (regno);
+	  bitmap_and_compl_into (partial_out, spilled_pseudos);
 	}
     }
 }
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 7/7] lra: Support subreg live range track and conflict detect
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (5 preceding siblings ...)
  2023-11-08  3:47 ` [PATCH 6/7] lra: Apply live_subreg df_problem to lra pass Lehua Ding
@ 2023-11-08  3:47 ` Lehua Ding
  2023-11-08  3:55 ` [PATCH 0/7] ira/lra: Support subreg coalesce juzhe.zhong
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch implements tracking of the live range of subregs and synchronously
modifies conflict detection.

gcc/ChangeLog:

	* ira-build.cc (print_copy): Adjust print.
	(setup_pseudos_has_subreg_object): New.
	(ira_build): collect subreg object allocno.
	* lra-assigns.cc (set_offset_conflicts): New.
	(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
	(lra_assign): Adjust.
	* lra-constraints.cc (process_alt_operands): Relax.
	* lra-int.h (GCC_LRA_INT_H): New include.
	(struct lra_live_range): New field subreg.
	(struct lra_insn_reg): New fields.
	(get_range_hard_regs):  Exported.
	(get_nregs): New.
	(has_subreg_object_p): New.
	* lra-lives.cc (INCLUDE_VECTOR): New.
	(lra_live_range_pool): New.
	(create_live_range): Adjust.
	(lra_merge_live_ranges): Adjust.
	(update_pseudo_point): Adjust.
	(class bb_data_pseudos): New.
	(mark_regno_live): Adjust.
	(mark_regno_dead): Adjust.
	(process_bb_lives): Adjust.
	(remove_some_program_points_and_update_live_ranges): Adjust.
	(lra_print_live_range_list): Adjust print.
	(class subreg_live_item): New class.
	(create_subregs_live_ranges): New.
	(lra_create_live_ranges_1): Add subreg live ranges.
	* lra.cc (get_range_blocks): New.
	(get_range_hard_regs): New.
	(new_insn_reg): Adjust.
	(collect_non_operand_hard_regs): Adjust.
	(initialize_lra_reg_info_element): Adjust.
	(reg_same_range_p): New.
	(add_regs_to_insn_regno_info): Adjust.
	* subreg-live-range.h: New constructor.

---
 gcc/ira-build.cc        |  40 ++++-
 gcc/lra-assigns.cc      | 111 ++++++++++--
 gcc/lra-constraints.cc  |  18 +-
 gcc/lra-int.h           |  33 ++++
 gcc/lra-lives.cc        | 361 ++++++++++++++++++++++++++++++++++------
 gcc/lra.cc              | 139 ++++++++++++++--
 gcc/subreg-live-range.h |   1 +
 7 files changed, 614 insertions(+), 89 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 379f877ca67..cba38d5fecb 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
    basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
    the member loop_num.  */
 static void
@@ -1688,8 +1691,13 @@ print_copy (FILE *f, ira_copy_t cp)
 {
   ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
   ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
-  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num, ALLOCNO_NUM (a1),
-	   ALLOCNO_REGNO (a1), ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2), cp->freq,
+  fprintf (f, "  cp%d:a%d(r%d", cp->num, ALLOCNO_NUM (a1), ALLOCNO_REGNO (a1));
+  if (ALLOCNO_NREGS (a1) != OBJECT_NREGS (cp->first))
+    fprintf (f, "_obj%d", OBJECT_INDEX (cp->first));
+  fprintf (f, ")<->a%d(r%d", ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2));
+  if (ALLOCNO_NREGS (a2) != OBJECT_NREGS (cp->second))
+    fprintf (f, "_obj%d", OBJECT_INDEX (cp->second));
+  fprintf (f, ")@%d:%s\n", cp->freq,
 	   cp->insn != NULL   ? "move"
 	   : cp->constraint_p ? "constraint"
 			      : "shuffle");
@@ -3706,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
     }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (&pseudos_has_subreg_object, &reg_obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+    if (has_subreg_object_p (a))
+      {
+	bitmap_set_bit (&pseudos_has_subreg_object, ALLOCNO_REGNO (a));
+	if (ira_dump_file != NULL)
+	  {
+	    fprintf (ira_dump_file,
+		     "  a%d(r%d, nregs: %d) has subreg objects:\n",
+		     ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+	    ira_allocno_object_iterator oi;
+	    ira_object_t obj;
+	    FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+	      fprintf (ira_dump_file, "    object %d: start: %d, nregs: %d\n",
+		       OBJECT_INDEX (obj), OBJECT_START (obj),
+		       OBJECT_NREGS (obj));
+	    fprintf (ira_dump_file, "\n");
+	  }
+      }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
    loop tree nodes).  The function returns TRUE if we generate loop
    structure (besides nodes representing all function and the basic
@@ -3726,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1131,6 +1131,52 @@ assign_hard_regno (int hard_regno, int regno)
 /* Array used for sorting different pseudos.  */
 static int *sorted_pseudos;
 
+/* The detail conflict offsets If two live ranges conflict. Use to record
+   partail conflict.  */
+static bitmap_head live_range_conflicts;
+
+/* Set the conflict offset of the two registers REGNO1 and REGNO2. Use the
+   regno with bigger nregs as the base.  */
+static void
+set_offset_conflicts (int regno1, int regno2)
+{
+  gcc_assert (reg_renumber[regno1] >= 0 && reg_renumber[regno2] >= 0);
+  int nregs1 = get_nregs (regno1);
+  int nregs2 = get_nregs (regno2);
+  if (nregs1 < nregs2)
+    {
+      std::swap (nregs1, nregs2);
+      std::swap (regno1, regno2);
+    }
+
+  lra_live_range_t r1 = lra_reg_info[regno1].live_ranges;
+  lra_live_range_t r2 = lra_reg_info[regno2].live_ranges;
+  int total = nregs1;
+
+  bitmap_clear (&live_range_conflicts);
+  while (r1 != NULL && r2 != NULL)
+    {
+      if (r1->start > r2->finish)
+	r1 = r1->next;
+      else if (r2->start > r1->finish)
+	r2 = r2->next;
+      else
+	{
+	  for (const subreg_range &range1 : r1->subreg.ranges)
+	    for (const subreg_range &range2 : r2->subreg.ranges)
+	      /* Record all overlap offset.  */
+	      for (int i = range1.start - (range2.end - range2.start) + 1;
+		   i < range1.end; i++)
+		if (i >= 0 && i < total)
+		  bitmap_set_bit (&live_range_conflicts, i);
+	  if (r1->finish < r2->finish)
+	    r1 = r1->next;
+	  else
+	    r2 = r2->next;
+	}
+    }
+}
+
 /* The constraints pass is allowed to create equivalences between
    pseudos that make the current allocation "incorrect" (in the sense
    that pseudos are assigned to hard registers from their own conflict
@@ -1226,19 +1272,56 @@ setup_live_pseudos_and_spill_after_risky_transforms (bitmap
 	       the same hard register.	*/
 	    || hard_regno != reg_renumber[conflict_regno])
 	  {
-	    int conflict_hard_regno = reg_renumber[conflict_regno];
-	    
-	    biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
-	    biggest_nregs = hard_regno_nregs (conflict_hard_regno,
-					      biggest_mode);
-	    nregs_diff
-	      = (biggest_nregs
-		 - hard_regno_nregs (conflict_hard_regno,
-				     PSEUDO_REGNO_MODE (conflict_regno)));
-	    add_to_hard_reg_set (&conflict_set,
-				 biggest_mode,
-				 conflict_hard_regno
-				 - (WORDS_BIG_ENDIAN ? nregs_diff : 0));
+	  if (hard_regno >= 0 && reg_renumber[conflict_regno] >= 0
+	      && (has_subreg_object_p (regno)
+		  || has_subreg_object_p (conflict_regno)))
+	    {
+	      int nregs1 = get_nregs (regno);
+	      int nregs2 = get_nregs (conflict_regno);
+	      /* Quick check it is no overlap at all between them.  */
+	      if (hard_regno + nregs1 <= reg_renumber[conflict_regno]
+		  || reg_renumber[conflict_regno] + nregs2 <= hard_regno)
+		continue;
+
+	      /* Check the overlap is ok if them have partial overlap.  */
+	      set_offset_conflicts (regno, conflict_regno);
+	      if (nregs1 >= nregs2)
+		EXECUTE_IF_SET_IN_BITMAP (&live_range_conflicts, 0, k, bi)
+		  {
+		    int start_regno
+		      = WORDS_BIG_ENDIAN
+			  ? reg_renumber[conflict_regno] + nregs2 + k - nregs1
+			  : reg_renumber[conflict_regno] - k;
+		    if (start_regno >= 0 && hard_regno == start_regno)
+		      SET_HARD_REG_BIT (conflict_set, start_regno);
+		  }
+	      else
+		EXECUTE_IF_SET_IN_BITMAP (&live_range_conflicts, 0, k, bi)
+		  {
+		    int start_regno
+		      = WORDS_BIG_ENDIAN
+			  ? reg_renumber[conflict_regno] + nregs2 - k - nregs1
+			  : reg_renumber[conflict_regno] + k;
+		    if (start_regno < FIRST_PSEUDO_REGISTER
+			&& hard_regno == start_regno)
+		      SET_HARD_REG_BIT (conflict_set, start_regno);
+		  }
+	    }
+	  else
+	    {
+	      int conflict_hard_regno = reg_renumber[conflict_regno];
+
+	      biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
+	      biggest_nregs
+		= hard_regno_nregs (conflict_hard_regno, biggest_mode);
+	      nregs_diff
+		= (biggest_nregs
+		   - hard_regno_nregs (conflict_hard_regno,
+				       PSEUDO_REGNO_MODE (conflict_regno)));
+	      add_to_hard_reg_set (&conflict_set, biggest_mode,
+				   conflict_hard_regno
+				     - (WORDS_BIG_ENDIAN ? nregs_diff : 0));
+	    }
 	  }
       if (! overlaps_hard_reg_set_p (conflict_set, mode, hard_regno))
 	{
@@ -1637,7 +1720,9 @@ lra_assign (bool &fails_p)
   init_regno_assign_info ();
   bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
   create_live_range_start_chains ();
+  bitmap_initialize (&live_range_conflicts, &reg_obstack);
   setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
+  bitmap_clear (&live_range_conflicts);
   if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking)
     /* Check correctness of allocation but only when there are no hard reg
        splits and asm errors as in the case of errors explicit insns involving
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c3ad846b97b..912d0c3feec 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2363,13 +2363,19 @@ process_alt_operands (int only_alternative)
 		      {
 			/* We should reject matching of an early
 			   clobber operand if the matching operand is
-			   not dying in the insn.  */
-			if (!TEST_BIT (curr_static_id->operand[m]
-				       .early_clobber_alts, nalt)
+			   not dying in the insn. But for subreg of pseudo which
+			   has subreg live be tracked in ira, the REG_DEAD note
+			   doesn't have. that case we think them the matching is
+			   ok. */
+			if (!TEST_BIT (
+			      curr_static_id->operand[m].early_clobber_alts,
+			      nalt)
 			    || operand_reg[nop] == NULL_RTX
-			    || (find_regno_note (curr_insn, REG_DEAD,
-						 REGNO (op))
-				|| REGNO (op) == REGNO (operand_reg[m])))
+			    || find_regno_note (curr_insn, REG_DEAD, REGNO (op))
+			    || (read_modify_subreg_p (
+				  *curr_id->operand_loc[nop])
+				&& has_subreg_object_p (REGNO (op)))
+			    || REGNO (op) == REGNO (operand_reg[m]))
 			  match_p = true;
 		      }
 		    if (match_p)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index d0752c2ae50..5a97bd61475 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -21,6 +21,9 @@ along with GCC; see the file COPYING3.	If not see
 #ifndef GCC_LRA_INT_H
 #define GCC_LRA_INT_H
 
+#include "lra.h"
+#include "subreg-live-range.h"
+
 #define lra_assert(c) gcc_checking_assert (c)
 
 /* The parameter used to prevent infinite reloading for an insn.  Each
@@ -46,6 +49,8 @@ struct lra_live_range
   lra_live_range_t next;
   /* Pointer to structures with the same start.	 */
   lra_live_range_t start_next;
+  /* Object whose live range is described by given structure.  */
+  subreg_ranges subreg;
 };
 
 typedef struct lra_copy *lra_copy_t;
@@ -108,6 +113,8 @@ public:
   /* The biggest size mode in which each pseudo reg is referred in
      whole function (possibly via subreg).  */
   machine_mode biggest_mode;
+  /* The real reg MODE.  */
+  machine_mode reg_mode;
   /* Live ranges of the pseudo.	 */
   lra_live_range_t live_ranges;
   /* This member is set up in lra-lives.cc for subsequent
@@ -159,6 +166,12 @@ struct lra_insn_reg
   unsigned int subreg_p : 1;
   /* The corresponding regno of the register.  */
   int regno;
+  /* The start and end of current ref of blocks, remember the use/def can be
+     a normal subreg.  */
+  int start, end;
+  /* The start and end of current ref of hard regs, remember the use/def can be
+     a normal subreg.  */
+  int start_reg, end_reg;
   /* Next reg info of the same insn.  */
   struct lra_insn_reg *next;
 };
@@ -330,6 +343,8 @@ extern struct lra_insn_reg *lra_get_insn_regs (int);
 extern void lra_free_copies (void);
 extern void lra_create_copy (int, int, int);
 extern lra_copy_t lra_get_copy (int);
+extern subreg_range
+get_range_hard_regs (int regno, const subreg_range &r);
 
 extern int lra_new_regno_start;
 extern int lra_constraint_new_regno_start;
@@ -531,4 +546,22 @@ lra_assign_reg_val (int from, int to)
   lra_reg_info[to].offset = lra_reg_info[from].offset;
 }
 
+/* Return the number regs of REGNO.  */
+inline int
+get_nregs (int regno)
+{
+  enum reg_class aclass = lra_get_allocno_class (regno);
+  gcc_assert (aclass != NO_REGS);
+  int nregs = ira_reg_class_max_nregs[aclass][lra_reg_info[regno].reg_mode];
+  return nregs;
+}
+
+extern bitmap_head pseudos_has_subreg_object;
+/* Return true if pseudo REGNO has subreg live range.  */
+inline bool
+has_subreg_object_p (int regno)
+{
+  return bitmap_bit_p (&pseudos_has_subreg_object, regno);
+}
+
 #endif /* GCC_LRA_INT_H */
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 477b82786cf..814b3960541 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.	If not see
    stack memory slots to spilled pseudos.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -44,6 +45,7 @@ along with GCC; see the file COPYING3.	If not see
 #include "lra-int.h"
 #include "target.h"
 #include "function-abi.h"
+#include "subreg-live-range.h"
 
 /* Program points are enumerated by numbers from range
    0..LRA_LIVE_MAX_POINT-1.  There are approximately two times more
@@ -97,6 +99,9 @@ static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges.	 */
 static object_allocator<lra_live_range> lra_live_range_pool ("live ranges");
 
+/* Store def/use point of has_subreg_object_p register.  */
+static class subregs_live_points *live_points;
+
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -113,16 +118,26 @@ free_live_range_list (lra_live_range_t lr)
 
 /* Create and return pseudo live range with given attributes.  */
 static lra_live_range_t
-create_live_range (int regno, int start, int finish, lra_live_range_t next)
+create_live_range (int regno, const subreg_ranges &sr, int start, int finish,
+		   lra_live_range_t next)
 {
   lra_live_range_t p = lra_live_range_pool.allocate ();
   p->regno = regno;
   p->start = start;
   p->finish = finish;
   p->next = next;
+  p->subreg = sr;
   return p;
 }
 
+static lra_live_range_t
+create_live_range (int regno, int start, int finish, lra_live_range_t next)
+{
+  subreg_ranges sr = subreg_ranges (1);
+  sr.add_range (1, subreg_range (0, 1));
+  return create_live_range (regno, sr, start, finish, next);
+}
+
 /* Copy live range R and return the result.  */
 static lra_live_range_t
 copy_live_range (lra_live_range_t r)
@@ -164,7 +179,8 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
       if (r1->start < r2->start)
 	std::swap (r1, r2);
 
-      if (r1->start == r2->finish + 1)
+      if (r1->start == r2->finish + 1
+	  && (r1->regno != r2->regno || r1->subreg.same_p (r2->subreg)))
 	{
 	  /* Joint ranges: merge r1 and r2 into r1.  */
 	  r1->start = r2->start;
@@ -174,7 +190,8 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
 	}
       else
 	{
-	  gcc_assert (r2->finish + 1 < r1->start);
+	  gcc_assert (r2->finish + 1 < r1->start
+		      || !r1->subreg.same_p (r2->subreg));
 	  /* Add r1 to the result.  */
 	  if (first == NULL)
 	    first = last = r1;
@@ -237,6 +254,10 @@ sparseset_contains_pseudos_p (sparseset a)
   return false;
 }
 
+static void
+update_pseudo_point (int regno, const subreg_range &range, int point,
+		     enum point_type type);
+
 /* Mark pseudo REGNO as living or dying at program point POINT, depending on
    whether TYPE is a definition or a use.  If this is the first reference to
    REGNO that we've encountered, then create a new live range for it.  */
@@ -249,31 +270,100 @@ update_pseudo_point (int regno, int point, enum point_type type)
   /* Don't compute points for hard registers.  */
   if (HARD_REGISTER_NUM_P (regno))
     return;
+  if (!complete_info_p && lra_get_regno_hard_regno (regno) >= 0)
+    return;
 
-  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+  if (has_subreg_object_p (regno))
     {
-      if (type == DEF_POINT)
-	{
-	  if (sparseset_bit_p (pseudos_live, regno))
-	    {
-	      p = lra_reg_info[regno].live_ranges;
-	      lra_assert (p != NULL);
-	      p->finish = point;
-	    }
-	}
-      else /* USE_POINT */
+      update_pseudo_point (regno, subreg_range (0, get_nregs (regno)), point,
+			   type);
+      return;
+    }
+
+  if (type == DEF_POINT)
+    {
+      if (sparseset_bit_p (pseudos_live, regno))
 	{
-	  if (!sparseset_bit_p (pseudos_live, regno)
-	      && ((p = lra_reg_info[regno].live_ranges) == NULL
-		  || (p->finish != point && p->finish + 1 != point)))
-	    lra_reg_info[regno].live_ranges
-	      = create_live_range (regno, point, -1, p);
+	  p = lra_reg_info[regno].live_ranges;
+	  lra_assert (p != NULL);
+	  p->finish = point;
 	}
     }
+  else /* USE_POINT */
+    {
+      if (!sparseset_bit_p (pseudos_live, regno)
+	  && ((p = lra_reg_info[regno].live_ranges) == NULL
+	      || (p->finish != point && p->finish + 1 != point)))
+	lra_reg_info[regno].live_ranges
+	  = create_live_range (regno, point, -1, p);
+    }
 }
 
-/* The corresponding bitmaps of BB currently being processed.  */
-static bitmap bb_killed_pseudos, bb_gen_pseudos;
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+update_pseudo_point (int regno, const subreg_range &range, int point,
+		     enum point_type type)
+{
+  /* Don't compute points for hard registers.  */
+  if (HARD_REGISTER_NUM_P (regno))
+    return;
+
+  if (!complete_info_p && lra_get_regno_hard_regno (regno) >= 0)
+    {
+      if (has_subreg_object_p (regno))
+	live_points->add_range (regno, get_nregs (regno), range,
+				type == DEF_POINT);
+      return;
+    }
+
+  if (!has_subreg_object_p (regno))
+    {
+      update_pseudo_point (regno, point, type);
+      return;
+    }
+
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "       %s r%d",
+	       type == DEF_POINT ? "def" : "use", regno);
+      fprintf (lra_dump_file, "[subreg: start %d, nregs: %d]", range.start,
+	       range.end - range.start);
+      fprintf (lra_dump_file, " at point %d\n", point);
+    }
+
+  live_points->add_point (regno, get_nregs (regno), range, type == DEF_POINT,
+			  point);
+}
+
+/* Update each range in SR.  */
+static void
+update_pseudo_point (int regno, const subreg_ranges sr, int point,
+		     enum point_type type)
+{
+  for (const subreg_range &range : sr.ranges)
+    update_pseudo_point (regno, range, point, type);
+}
+
+/* Structure describing local BB data used for pseudo
+   live-analysis.  */
+class bb_data_pseudos : public basic_block_subreg_live_info
+{
+public:
+  /* Basic block about which the below data are.  */
+  basic_block bb;
+};
+
+/* Array for all BB data.  Indexed by the corresponding BB index.  */
+typedef class bb_data_pseudos *bb_data_t;
+
+/* All basic block data are referred through the following array.  */
+static bb_data_t bb_data;
+
+/* The corresponding basic block info of BB currently being processed.  */
+static bb_data_t curr_bb_info;
+
+/* Flag mean curr function has subreg ref need be tracked.  */
+static bool has_subreg_live_p;
 
 /* Record hard register REGNO as now being live.  It updates
    living hard regs and START_LIVING.  */
@@ -336,12 +426,18 @@ mark_pseudo_dead (int regno)
   if (!sparseset_bit_p (pseudos_live, regno))
     return;
 
+  /* Just return if regno have partial subreg live for subreg access.  */
+  if (has_subreg_object_p (regno) && !live_points->empty_live_p (regno))
+    return;
+
   sparseset_clear_bit (pseudos_live, regno);
   sparseset_set_bit (start_dying, regno);
 }
 
+static void
+mark_regno_live (int regno, const subreg_range &range, machine_mode mode);
 /* Mark register REGNO (pseudo or hard register) in MODE as being live
-   and update BB_GEN_PSEUDOS.  */
+   and update CURR_BB_INFO.  */
 static void
 mark_regno_live (int regno, machine_mode mode)
 {
@@ -352,6 +448,11 @@ mark_regno_live (int regno, machine_mode mode)
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
 	make_hard_regno_live (regno);
     }
+  else if (has_subreg_object_p (regno))
+    {
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      mark_regno_live (regno, subreg_range (0, get_nregs (regno)), mode);
+    }
   else
     {
       mark_pseudo_live (regno);
@@ -361,9 +462,26 @@ mark_regno_live (int regno, machine_mode mode)
     }
 }
 
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+mark_regno_live (int regno, const subreg_range &range, machine_mode mode)
+{
+  if (HARD_REGISTER_NUM_P (regno) || !has_subreg_object_p (regno))
+    mark_regno_live (regno, mode);
+  else
+    {
+      mark_pseudo_live (regno);
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      if (!range.full_p (get_nregs (regno)))
+	has_subreg_live_p = true;
+      add_subreg_range (curr_bb_info, regno, mode, range, false);
+    }
+}
 
+static void
+mark_regno_dead (int regno, const subreg_range &range, machine_mode mode);
 /* Mark register REGNO (pseudo or hard register) in MODE as being dead
-   and update BB_GEN_PSEUDOS and BB_KILLED_PSEUDOS.  */
+   and update CURR_BB_INFO.  */
 static void
 mark_regno_dead (int regno, machine_mode mode)
 {
@@ -374,6 +492,12 @@ mark_regno_dead (int regno, machine_mode mode)
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
 	make_hard_regno_dead (regno);
     }
+  else if (has_subreg_object_p (regno))
+    {
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      subreg_range range = subreg_range (0, get_nregs (regno));
+      mark_regno_dead (regno, range, mode);
+    }
   else
     {
       mark_pseudo_dead (regno);
@@ -384,7 +508,22 @@ mark_regno_dead (int regno, machine_mode mode)
     }
 }
 
-\f
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+mark_regno_dead (int regno, const subreg_range &range, machine_mode mode)
+{
+  if (HARD_REGISTER_NUM_P (regno) || !has_subreg_object_p (regno))
+    mark_regno_dead (regno, mode);
+  else
+    {
+      mark_pseudo_dead (regno);
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      if (!range.full_p (get_nregs (regno)))
+	has_subreg_live_p = true;
+      remove_subreg_range (curr_bb_info, regno, mode, range);
+      add_subreg_range (curr_bb_info, regno, mode, range, true);
+    }
+}
 
 /* This page contains code for making global live analysis of pseudos.
    The code works only when pseudo live info is changed on a BB
@@ -814,7 +953,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   hard_regs_live &= ~eliminable_regset;
   EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
     {
-      update_pseudo_point (j, curr_point, USE_POINT);
+      if (bitmap_bit_p (reg_live_partial_out, j) && has_subreg_object_p (j))
+	for (const subreg_range &r : range_out->lives.at (j).ranges)
+	  update_pseudo_point (j, get_range_hard_regs (j, r), curr_point,
+			       USE_POINT);
+      else
+	update_pseudo_point (j, curr_point, USE_POINT);
       mark_pseudo_live (j);
     }
 
@@ -1007,8 +1151,11 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	if (reg->type != OP_IN)
 	  {
-	    update_pseudo_point (reg->regno, curr_point, USE_POINT);
-	    mark_regno_live (reg->regno, reg->biggest_mode);
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
+	    update_pseudo_point (reg->regno,
+				 get_range_hard_regs (reg->regno, range),
+				 curr_point, USE_POINT);
+	    mark_regno_live (reg->regno, range, reg->biggest_mode);
 	    /* ??? Should be a no-op for unused registers.  */
 	    check_pseudos_live_through_calls (reg->regno, last_call_abi);
 	  }
@@ -1029,17 +1176,20 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* See which defined values die here.  */
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && !reg_early_clobber_p (reg, n_alt)
+	    && (!reg->subreg_p || has_subreg_object_p (reg->regno)))
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_OUT)
-	      update_pseudo_point (reg->regno, curr_point, DEF_POINT);
-	    mark_regno_dead (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, DEF_POINT);
+	    mark_regno_dead (reg->regno, range, reg->biggest_mode);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && !reg_early_clobber_p (reg, n_alt)
+	    && !reg->subreg_p)
 	  make_hard_regno_dead (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
@@ -1070,7 +1220,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Increment the current program point if we must.  */
       if (sparseset_contains_pseudos_p (unused_set)
-	  || sparseset_contains_pseudos_p (start_dying))
+	  || sparseset_contains_pseudos_p (start_dying) || has_subreg_live_p)
 	next_program_point (curr_point, freq);
 
       /* If we removed the source reg from a simple register copy from the
@@ -1091,9 +1241,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	if (reg->type != OP_OUT)
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_IN)
-	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
-	    mark_regno_live (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, USE_POINT);
+	    mark_regno_live (reg->regno, range, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno, last_call_abi);
 	  }
 
@@ -1113,22 +1266,25 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Mark early clobber outputs dead.  */
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && reg_early_clobber_p (reg, n_alt)
+	    && (!reg->subreg_p || has_subreg_object_p (reg->regno)))
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_OUT)
-	      update_pseudo_point (reg->regno, curr_point, DEF_POINT);
-	    mark_regno_dead (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, DEF_POINT);
+	    mark_regno_dead (reg->regno, range, reg->biggest_mode);
 
 	    /* We're done processing inputs, so make sure early clobber
 	       operands that are both inputs and outputs are still live.  */
 	    if (reg->type == OP_INOUT)
-	      mark_regno_live (reg->regno, reg->biggest_mode);
+	      mark_regno_live (reg->regno, range, reg->biggest_mode);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && reg_early_clobber_p (reg, n_alt)
+	    && !reg->subreg_p)
 	  {
 	    struct lra_insn_reg *reg2;
 
@@ -1144,7 +1300,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Increment the current program point if we must.  */
       if (sparseset_contains_pseudos_p (dead_set)
-	  || sparseset_contains_pseudos_p (start_dying))
+	  || sparseset_contains_pseudos_p (start_dying) || has_subreg_live_p)
 	next_program_point (curr_point, freq);
 
       /* Update notes.	*/
@@ -1277,13 +1433,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
     {
-      update_pseudo_point (i, curr_point, DEF_POINT);
+      if (has_subreg_object_p (i))
+	update_pseudo_point (i, live_points->subreg_live_ranges.at (i),
+			     curr_point, DEF_POINT);
+      else
+	update_pseudo_point (i, curr_point, DEF_POINT);
       mark_pseudo_dead (i);
     }
 
-    EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
-			      bi)
-      {
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			    bi)
+    {
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
@@ -1384,7 +1544,8 @@ remove_some_program_points_and_update_live_ranges (void)
 	      next_r = r->next;
 	      r->start = map[r->start];
 	      r->finish = map[r->finish];
-	      if (prev_r == NULL || prev_r->start > r->finish + 1)
+	      if (prev_r == NULL || prev_r->start > r->finish + 1
+		  || !prev_r->subreg.same_p (r->subreg))
 		{
 		  prev_r = r;
 		  continue;
@@ -1402,8 +1563,18 @@ remove_some_program_points_and_update_live_ranges (void)
 void
 lra_print_live_range_list (FILE *f, lra_live_range_t r)
 {
-  for (; r != NULL; r = r->next)
-    fprintf (f, " [%d..%d]", r->start, r->finish);
+  if (r != NULL && has_subreg_object_p (r->regno))
+    {
+      for (; r != NULL; r = r->next)
+	{
+	  fprintf (f, " [%d..%d]{", r->start, r->finish);
+	  r->subreg.dump (f);
+	  fprintf (f, "}");
+	}
+    }
+  else
+    for (; r != NULL; r = r->next)
+      fprintf (f, " [%d..%d]", r->start, r->finish);
   fprintf (f, "\n");
 }
 
@@ -1476,7 +1647,84 @@ compress_live_ranges (void)
     }
 }
 
-\f
+/* Use to temp record subregs live range in create_subregs_live_ranges function.
+ */
+class subreg_live_item
+{
+public:
+  subreg_ranges subreg;
+  int start, finish;
+};
+
+/* Create subreg live ranges from objects def/use point info.  */
+static void
+create_subregs_live_ranges ()
+{
+  for (const auto &subreg_point_it : live_points->subreg_points)
+    {
+      unsigned int regno = subreg_point_it.first;
+      const class live_points &points = subreg_point_it.second;
+      class lra_reg *reg_info = &lra_reg_info[regno];
+      std::vector<subreg_live_item> temps;
+      gcc_assert (has_subreg_object_p (regno));
+      for (const auto &point_it : points.points)
+	{
+	  int point = point_it.first;
+	  const live_point &regs = point_it.second;
+	  gcc_assert (temps.empty () || temps.back ().finish <= point);
+	  if (!regs.use_reg.empty_p ())
+	    {
+	      if (temps.empty ())
+		temps.push_back ({regs.use_reg, point, -1});
+	      else if (temps.back ().finish == -1)
+		{
+		  if (!temps.back ().subreg.same_p (regs.use_reg))
+		    {
+		      if (temps.back ().start == point)
+			temps.back ().subreg.add_ranges (regs.use_reg);
+		      else
+			{
+			  temps.back ().finish = point - 1;
+
+			  subreg_ranges temp = regs.use_reg;
+			  temp.add_ranges (temps.back ().subreg);
+			  temps.push_back ({temp, point, -1});
+			}
+		    }
+		}
+	      else if (temps.back ().subreg.same_p (regs.use_reg)
+		       && (temps.back ().finish == point
+			   || temps.back ().finish + 1 == point))
+		temps.back ().finish = -1;
+	      else
+		temps.push_back ({regs.use_reg, point, -1});
+	    }
+	  if (!regs.def_reg.empty_p ())
+	    {
+	      gcc_assert (!temps.empty ());
+	      if (regs.def_reg.include_ranges_p (temps.back ().subreg))
+		temps.back ().finish = point;
+	      else if (temps.back ().subreg.include_ranges_p (regs.def_reg))
+		{
+		  temps.back ().finish = point;
+
+		  subreg_ranges diff = temps.back ().subreg;
+		  diff.remove_ranges (regs.def_reg);
+		  temps.push_back ({diff, point + 1, -1});
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	}
+
+      gcc_assert (reg_info->live_ranges == NULL);
+
+      for (const subreg_live_item &item : temps)
+	reg_info->live_ranges
+	  = create_live_range (regno, item.subreg, item.start, item.finish,
+			       reg_info->live_ranges);
+    }
+}
 
 /* The number of the current live range pass.  */
 int lra_live_range_iter;
@@ -1557,6 +1805,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   int n = inverted_rev_post_order_compute (cfun, rpo);
   lra_assert (n == n_basic_blocks_for_fn (cfun));
   bb_live_change_p = false;
+  has_subreg_live_p = false;
+  live_points = new subregs_live_points ();
   for (i = 0; i < n; ++i)
     {
       bb = BASIC_BLOCK_FOR_FN (cfun, rpo[i]);
@@ -1639,9 +1889,14 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	}
     }
   lra_live_max_point = curr_point;
+  create_subregs_live_ranges ();
   if (lra_dump_file != NULL)
-    print_live_ranges (lra_dump_file);
+    {
+      live_points->dump (lra_dump_file);
+      print_live_ranges (lra_dump_file);
+    }
   /* Clean up.	*/
+  delete live_points;
   sparseset_free (unused_set);
   sparseset_free (dead_set);
   sparseset_free (start_dying);
diff --git a/gcc/lra.cc b/gcc/lra.cc
index bcc00ff7d6b..47d378b371e 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -566,6 +566,54 @@ lra_asm_insn_error (rtx_insn *insn)
 /* Pools for insn reg info.  */
 object_allocator<lra_insn_reg> lra_insn_reg_pool ("insn regs");
 
+/* Return the subreg range of rtx SUBREG in blocks.  */
+static subreg_range
+get_range_blocks (int regno, bool subreg_p, machine_mode reg_mode,
+		  poly_int64 offset, poly_int64 size)
+{
+  gcc_assert (has_subreg_object_p (regno));
+  int nblocks = get_nblocks (reg_mode);
+  if (!subreg_p)
+    return subreg_range (0, nblocks);
+
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+  poly_int64 left = offset + size;
+
+  int subreg_start = -1;
+  int subreg_nregs = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nregs < 0 && maybe_le (left, right))
+	{
+	  subreg_nregs = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nregs > 0);
+  return subreg_range (subreg_start, subreg_start + subreg_nregs);
+}
+
+/* Return the subreg range of rtx SUBREG in hard regs.  */
+subreg_range
+get_range_hard_regs (int regno, const subreg_range &r)
+{
+  if (!has_subreg_object_p (regno))
+    return subreg_range (0, 1);
+  enum reg_class aclass = lra_get_allocno_class (regno);
+  gcc_assert (aclass != NO_REGS);
+  int nregs = ira_reg_class_max_nregs[aclass][lra_reg_info[regno].reg_mode];
+  int nblocks = get_nblocks (lra_reg_info[regno].reg_mode);
+  int times = nblocks / nregs;
+  gcc_assert (nblocks >= nregs && times * nregs == nblocks);
+  int start = r.start / times;
+  int end = CEIL (r.end, times);
+
+  return subreg_range (start, end);
+}
+
 /* Create LRA insn related info about a reference to REGNO in INSN
    with TYPE (in/out/inout), biggest reference mode MODE, flag that it
    is reference through subreg (SUBREG_P), and reference to the next
@@ -573,21 +621,49 @@ object_allocator<lra_insn_reg> lra_insn_reg_pool ("insn regs");
    alternatives in which it can be early clobbered are given by
    EARLY_CLOBBER_ALTS.  */
 static struct lra_insn_reg *
-new_insn_reg (rtx_insn *insn, int regno, enum op_type type,
-	      machine_mode mode, bool subreg_p,
-	      alternative_mask early_clobber_alts,
+new_insn_reg (rtx_insn *insn, int regno, enum op_type type, poly_int64 size,
+	      poly_int64 offset, machine_mode mode, machine_mode reg_mode,
+	      bool subreg_p, alternative_mask early_clobber_alts,
 	      struct lra_insn_reg *next)
 {
   lra_insn_reg *ir = lra_insn_reg_pool.allocate ();
   ir->type = type;
   ir->biggest_mode = mode;
-  if (NONDEBUG_INSN_P (insn)
-      && partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
-    lra_reg_info[regno].biggest_mode = mode;
+  if (NONDEBUG_INSN_P (insn))
+    {
+      if (partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
+	{
+	  lra_reg_info[regno].biggest_mode = mode;
+	}
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  if (lra_reg_info[regno].reg_mode == VOIDmode)
+	    lra_reg_info[regno].reg_mode = reg_mode;
+	  else
+	    gcc_assert (maybe_eq (GET_MODE_SIZE (lra_reg_info[regno].reg_mode),
+				  GET_MODE_SIZE (reg_mode)));
+	}
+    }
   ir->subreg_p = subreg_p;
   ir->early_clobber_alts = early_clobber_alts;
   ir->regno = regno;
   ir->next = next;
+  if (has_subreg_object_p (regno))
+    {
+      const subreg_range &r
+	= get_range_blocks (regno, subreg_p, reg_mode, offset, size);
+      ir->start = r.start;
+      ir->end = r.end;
+      const subreg_range &r_hard = get_range_hard_regs (regno, r);
+      ir->start_reg = r_hard.start;
+      ir->end_reg = r_hard.end;
+    }
+  else
+    {
+      ir->start = 0;
+      ir->end = 1;
+    }
   return ir;
 }
 
@@ -887,11 +963,18 @@ collect_non_operand_hard_regs (rtx_insn *insn, rtx *x,
       return list;
   mode = GET_MODE (op);
   subreg_p = false;
+  poly_int64 size = GET_MODE_SIZE (mode);
+  poly_int64 offset = 0;
   if (code == SUBREG)
     {
       mode = wider_subreg_mode (op);
       if (read_modify_subreg_p (op))
-	subreg_p = true;
+	{
+	  offset = SUBREG_BYTE (op);
+	  subreg_p = true;
+	}
+      else
+	size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)));
       op = SUBREG_REG (op);
       code = GET_CODE (op);
     }
@@ -925,7 +1008,8 @@ collect_non_operand_hard_regs (rtx_insn *insn, rtx *x,
 		   && ! (FIRST_STACK_REG <= regno
 			 && regno <= LAST_STACK_REG));
 #endif
-	      list = new_insn_reg (data->insn, regno, type, mode, subreg_p,
+	      list = new_insn_reg (data->insn, regno, type, size, offset, mode,
+				   GET_MODE (op), subreg_p,
 				   early_clobber ? ALL_ALTERNATIVES : 0, list);
 	    }
 	}
@@ -1354,6 +1438,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].preferred_hard_regno_profit1 = 0;
   lra_reg_info[i].preferred_hard_regno_profit2 = 0;
   lra_reg_info[i].biggest_mode = VOIDmode;
+  lra_reg_info[i].reg_mode = VOIDmode;
   lra_reg_info[i].live_ranges = NULL;
   lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
   lra_reg_info[i].last_reload = 0;
@@ -1459,7 +1544,21 @@ lra_get_copy (int n)
   return copy_vec[n];
 }
 
-\f
+/* Return true if REG occupied the same blocks as OFFSET + SIZE subreg.  */
+static bool
+reg_same_range_p (lra_insn_reg *reg, poly_int64 offset, poly_int64 size,
+		  bool subreg_p)
+{
+  if (has_subreg_object_p (reg->regno))
+    {
+      const subreg_range &r
+	= get_range_blocks (reg->regno, subreg_p,
+			    lra_reg_info[reg->regno].reg_mode, offset, size);
+      return r.start == reg->start && r.end == reg->end;
+    }
+  else
+    return true;
+}
 
 /* This page contains code dealing with info about registers in
    insns.  */
@@ -1483,11 +1582,18 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
   code = GET_CODE (x);
   mode = GET_MODE (x);
   subreg_p = false;
+  poly_int64 size = GET_MODE_SIZE (mode);
+  poly_int64 offset = 0;
   if (GET_CODE (x) == SUBREG)
     {
       mode = wider_subreg_mode (x);
       if (read_modify_subreg_p (x))
-	subreg_p = true;
+	{
+	  offset = SUBREG_BYTE (x);
+	  subreg_p = true;
+	}
+      else
+	size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)));
       x = SUBREG_REG (x);
       code = GET_CODE (x);
     }
@@ -1499,7 +1605,8 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
       expand_reg_info ();
       if (bitmap_set_bit (&lra_reg_info[regno].insn_bitmap, INSN_UID (insn)))
 	{
-	  data->regs = new_insn_reg (data->insn, regno, type, mode, subreg_p,
+	  data->regs = new_insn_reg (data->insn, regno, type, size, offset,
+				     mode, GET_MODE (x), subreg_p,
 				     early_clobber_alts, data->regs);
 	  return;
 	}
@@ -1508,12 +1615,14 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
 	  for (curr = data->regs; curr != NULL; curr = curr->next)
 	    if (curr->regno == regno)
 	      {
-		if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
+		if (!reg_same_range_p (curr, offset, size, subreg_p)
+		    || curr->biggest_mode != mode)
 		  /* The info cannot be integrated into the found
 		     structure.  */
-		  data->regs = new_insn_reg (data->insn, regno, type, mode,
-					     subreg_p, early_clobber_alts,
-					     data->regs);
+		  data->regs
+		    = new_insn_reg (data->insn, regno, type, size, offset, mode,
+				    GET_MODE (x), subreg_p, early_clobber_alts,
+				    data->regs);
 		else
 		  {
 		    if (curr->type != type)
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
index bee97708a52..b703b9642f2 100644
--- a/gcc/subreg-live-range.h
+++ b/gcc/subreg-live-range.h
@@ -76,6 +76,7 @@ public:
   int max;
   std::set<subreg_range> ranges;
 
+  subreg_ranges () : max (1) {}
   subreg_ranges (int max) : max (max) { gcc_assert (maybe_ge (max, 1)); }
 
   /* Modify ranges.  */
-- 
2.36.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7]  ira/lra: Support subreg coalesce
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (6 preceding siblings ...)
  2023-11-08  3:47 ` [PATCH 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
@ 2023-11-08  3:55 ` juzhe.zhong
  2023-11-10  9:29   ` Lehua Ding
  2023-11-08  9:40 ` Richard Sandiford
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: juzhe.zhong @ 2023-11-08  3:55 UTC (permalink / raw)
  To: 丁乐华, gcc-patches
  Cc: vmakarov, richard.sandiford, 丁乐华

[-- Attachment #1: Type: text/plain, Size: 7091 bytes --]

Thanks Lehua.

Appreciate for supporting subreg liveness tracking with tons of work.

A nit comments, I think you should mention these following PRs:

106694
89967
106146
99161 

No need send V2 now. You can send V2 after Richard and Vlad reviewed.



juzhe.zhong@rivai.ai
 
From: Lehua Ding
Date: 2023-11-08 11:47
To: gcc-patches
CC: vmakarov; richard.sandiford; juzhe.zhong; lehua.ding
Subject: [PATCH 0/7] ira/lra: Support subreg coalesce
Hi,
 
These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).
 
Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):
 
```
#include <riscv_vector.h>
 
void
foo (int32_t *in, int32_t *out, size_t m)
{
  vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
  for (size_t i = 0; i < m; i++)
    {
      v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
      v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
    }
  *(vint32m1_t*)(out+4*0) = v0;
  *(vint32m1_t*)(out+4*1) = v1;
}
```
 
Before these patchs:
 
```
foo:
li a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v4,0(a0)
vmv1r.v v2,v4
vmv1r.v v1,v5
beq a2,zero,.L2
li a5,0
vsetivli zero,4,e32,m1,ta,ma
.L3:
addi a5,a5,1
vadd.vv v2,v2,v2
vmul.vv v1,v1,v1
bne a2,a5,.L3
.L2:
vs1r.v v2,0(a1)
addi a1,a1,16
vs1r.v v1,0(a1)
ret
```
 
After these patchs:
 
```
foo:
li a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v2,0(a0)
beq a2,zero,.L2
li a5,0
vsetivli zero,4,e32,m1,ta,ma
.L3:
addi a5,a5,1
vadd.vv v2,v2,v2
vmul.vv v3,v3,v3
bne a2,a5,.L3
.L2:
vs1r.v v2,0(a1)
addi a1,a1,16
vs1r.v v3,0(a1)
ret
```
 
As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.
 
If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.
 
```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole}
     (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole}
     (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
        (nil)))
```
 
ira dump:
 
;; a1(r136,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;;
;; ...
      Popping a1(r135,l0)  --         assign reg 97
      Popping a3(r136,l0)  --         assign reg 98
      Popping a4(r137,l0)  --         assign reg 15
      Popping a5(r140,l0)  --         assign reg 12
      Popping a10(r145,l0)  --         assign reg 12
      Popping a2(r139,l0)  --         assign reg 11
      Popping a9(r144,l0)  --         assign reg 11
      Popping a0(r142,l0)  --         assign reg 11
      Popping a6(r134,l0)  --         assign reg 100
      Popping a7(r143,l0)  --         assign reg 10
      Popping a8(r141,l0)  --         assign reg 15
 
The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):
 
```
#include <arm_sve.h>
 
int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out)
{
  svint64x4_t result = svld4_s64 (pg, base);
  svint64_t v0 = svget4_s64(result, 0);
  svint64_t v1 = svget4_s64(result, 1);
  svint64_t v2 = svget4_s64(result, 2);
  svint64_t v3 = svget4_s64(result, 3);
 
  for (int i = 0; i < n; i += 1)
    {
        svint64_t v18 = svld1_s64(pg, in1);
        svint64_t v19 = svld1_s64(pg, in2);
        v0 = svmad_s64_z(pg, v0, v18, v19);
        v1 = svmad_s64_z(pg, v1, v18, v19);
        v2 = svmad_s64_z(pg, v2, v18, v19);
        v3 = svmad_s64_z(pg, v3, v18, v19);
    }
  svst1_s64(pg, out+0,v0);
  svst1_s64(pg, out+1,v1);
  svst1_s64(pg, out+2,v2);
  svst1_s64(pg, out+3,v3);
}
```
 
Before these patchs:
 
```
bar:
ld4d {z4.d - z7.d}, p0/z, [x0]
mov z26.d, z4.d
mov z27.d, z5.d
mov z28.d, z6.d
mov z29.d, z7.d
cmp w1, 0
...
```
 
After these patchs:
 
```
bar:
ld4d {z28.d - z31.d}, p0/z, [x0]
cmp w1, 0
...
```
 
Lehua Ding (7):
  ira: Refactor the handling of register conflicts to make it more
    general
  ira: Add live_subreg problem and apply to ira pass
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Apply live_subreg df_problem to lra pass
  lra: Support subreg live range track and conflict detect
 
gcc/Makefile.in          |   1 +
gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
gcc/df.h                 |  93 +++-
gcc/hard-reg-set.h       |  33 ++
gcc/ira-build.cc         | 458 ++++++++++++++++----
gcc/ira-color.cc         | 851 ++++++++++++++++++++++++++-----------
gcc/ira-conflicts.cc     | 221 +++++++---
gcc/ira-emit.cc          |  24 +-
gcc/ira-int.h            |  67 ++-
gcc/ira-lives.cc         | 527 +++++++++++++++++------
gcc/ira.cc               |  77 ++--
gcc/lra-assigns.cc       | 111 ++++-
gcc/lra-coalesce.cc      |  20 +-
gcc/lra-constraints.cc   | 111 +++--
gcc/lra-int.h            |  33 ++
gcc/lra-lives.cc         | 661 ++++++++++++++++++++++++-----
gcc/lra-remat.cc         |  13 +-
gcc/lra-spills.cc        |  22 +-
gcc/lra.cc               | 139 +++++-
gcc/reginfo.cc           |  14 +
gcc/rtl.h                |  14 +
gcc/subreg-live-range.cc | 649 ++++++++++++++++++++++++++++
gcc/subreg-live-range.h  | 343 +++++++++++++++
gcc/timevar.def          |   1 +
24 files changed, 4564 insertions(+), 808 deletions(-)
create mode 100644 gcc/subreg-live-range.cc
create mode 100644 gcc/subreg-live-range.h
 
-- 
2.36.3
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general
  2023-11-08  3:47 ` [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general Lehua Ding
@ 2023-11-08  7:57   ` Richard Biener
  2023-11-08  8:34     ` Lehua Ding
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Biener @ 2023-11-08  7:57 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

On Wed, Nov 8, 2023 at 4:48 AM Lehua Ding <lehua.ding@rivai.ai> wrote:
>
> This patch does not make any functional changes. It mainly refactor two parts:
>
> 1. The ira_allocno's objects field is expanded to an scalable array, and multi-word
>    pseduo registers are split and tracked only when necessary.
> 2. Since the objects array has been expanded, there will be more subreg objects
>    that pass through later, rather than the previous fixed two. Therefore, it
>    is necessary to modify the detection of whether two objects conflict, and
>    the check method is to pull back the registers occupied by the object to
>    the first register of the allocno for judgment.

Did you profile this before/after?  RA performance is critical ...

> gcc/ChangeLog:
>
>         * hard-reg-set.h (struct HARD_REG_SET): Add operator>>.
>         * ira-build.cc (init_object_start_and_nregs): New func.
>         (find_object): Ditto.
>         (ira_create_allocno): Adjust.
>         (ira_set_allocno_class): Set subreg info.
>         (ira_create_allocno_objects): Adjust.
>         (init_regs_with_subreg): Collect access in subreg.
>         (ira_build): Call init_regs_with_subreg
>         (ira_destroy): Clear regs_with_subreg
>         * ira-color.cc (setup_profitable_hard_regs): Adjust.
>         (get_conflict_and_start_profitable_regs): Adjust.
>         (check_hard_reg_p): Adjust.
>         (assign_hard_reg): Adjust.
>         (improve_allocation): Adjust.
>         * ira-int.h (struct ira_object): Adjust fields.
>         (struct ira_allocno): Adjust objects filed.
>         (ALLOCNO_NUM_OBJECTS): Adjust.
>         (ALLOCNO_UNIT_SIZE): New.
>         (ALLOCNO_TRACK_SUBREG_P): New.
>         (ALLOCNO_NREGS): New.
>         (OBJECT_SIZE): New.
>         (OBJECT_OFFSET): New.
>         (OBJECT_START): New.
>         (OBJECT_NREGS): New.
>         (find_object): New.
>         (has_subreg_object_p): New.
>         (get_full_object): New.
>         * ira.cc (check_allocation): Adjust.
>
> ---
>  gcc/hard-reg-set.h |  33 +++++++
>  gcc/ira-build.cc   | 106 +++++++++++++++++++-
>  gcc/ira-color.cc   | 234 ++++++++++++++++++++++++++++++---------------
>  gcc/ira-int.h      |  45 ++++++++-
>  gcc/ira.cc         |  52 ++++------
>  5 files changed, 349 insertions(+), 121 deletions(-)
>
> diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> index b0bb9bce074..760eadba186 100644
> --- a/gcc/hard-reg-set.h
> +++ b/gcc/hard-reg-set.h
> @@ -113,6 +113,39 @@ struct HARD_REG_SET
>      return !operator== (other);
>    }
>
> +  HARD_REG_SET
> +  operator>> (unsigned int shift_amount) const

This is a quite costly operation, why do we need it instead
of keeping an "offset" for set queries?

> +  {
> +    if (shift_amount == 0)
> +      return *this;
> +
> +    HARD_REG_SET res;
> +    unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
> +    if (shift_amount >= total_bits)
> +      {
> +       unsigned int n_elt = shift_amount % total_bits;
> +       shift_amount -= n_elt * total_bits;
> +       for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
> +         res.elts[i] = elts[i + n_elt];
> +       /* clear upper n_elt elements.  */
> +       for (unsigned int i = 0; i < n_elt; i += 1)
> +         res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
> +      }
> +
> +    if (shift_amount > 0)
> +      {
> +       /* The left bits of an element be shifted.  */
> +       HARD_REG_ELT_TYPE left = 0;
> +       /* Total bits of an element.  */
> +       for (int i = ARRAY_SIZE (elts); i >= 0; --i)
> +         {
> +           res.elts[i] = (elts[i] >> shift_amount) | left;
> +           left = elts[i] << (total_bits - shift_amount);
> +         }
> +      }
> +    return res;
> +  }
> +
>    HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
>  };
>  typedef const HARD_REG_SET &const_hard_reg_set;
> diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
> index 93e46033170..07aba27c1c9 100644
> --- a/gcc/ira-build.cc
> +++ b/gcc/ira-build.cc
> @@ -440,6 +440,40 @@ initiate_allocnos (void)
>    memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
>  }
>
> +/* Update OBJ's start and nregs field according A and OBJ info.  */
> +static void
> +init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
> +{
> +  enum reg_class aclass = ALLOCNO_CLASS (a);
> +  gcc_assert (aclass != NO_REGS);
> +
> +  machine_mode mode = ALLOCNO_MODE (a);
> +  int nregs = ira_reg_class_max_nregs[aclass][mode];
> +  if (ALLOCNO_TRACK_SUBREG_P (a))
> +    {
> +      poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
> +      for (int i = 0; i < nregs; i += 1)
> +       {
> +         poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
> +         if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
> +           {
> +             OBJECT_START (obj) = i;
> +           }
> +         if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
> +           {
> +             OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
> +             break;
> +           }
> +       }
> +      gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
> +    }
> +  else
> +    {
> +      OBJECT_START (obj) = 0;
> +      OBJECT_NREGS (obj) = nregs;
> +    }
> +}
> +
>  /* Create and return an object corresponding to a new allocno A.  */
>  static ira_object_t
>  ira_create_object (ira_allocno_t a, int subword)
> @@ -460,15 +494,36 @@ ira_create_object (ira_allocno_t a, int subword)
>    OBJECT_MIN (obj) = INT_MAX;
>    OBJECT_MAX (obj) = -1;
>    OBJECT_LIVE_RANGES (obj) = NULL;
> +  OBJECT_SIZE (obj) = UNITS_PER_WORD;
> +  OBJECT_OFFSET (obj) = subword * UNITS_PER_WORD;
> +  OBJECT_START (obj) = -1;
> +  OBJECT_NREGS (obj) = -1;
>
>    ira_object_id_map_vec.safe_push (obj);
>    ira_object_id_map
>      = ira_object_id_map_vec.address ();
>    ira_objects_num = ira_object_id_map_vec.length ();
>
> +  if (aclass != NO_REGS)
> +    init_object_start_and_nregs (a, obj);
> +
> +  a->objects.push_back (obj);
> +
>    return obj;
>  }
>
> +/* Return the object in allocno A which match START & NREGS.  */
> +ira_object_t
> +find_object (ira_allocno_t a, int start, int nregs)
> +{
> +  for (ira_object_t obj : a->objects)

linear search?  really?

> +    {
> +      if (OBJECT_START (obj) == start && OBJECT_NREGS (obj) == nregs)
> +       return obj;
> +    }
> +  return NULL;
> +}
> +
>  /* Create and return the allocno corresponding to REGNO in
>     LOOP_TREE_NODE.  Add the allocno to the list of allocnos with the
>     same regno if CAP_P is FALSE.  */
> @@ -525,7 +580,8 @@ ira_create_allocno (int regno, bool cap_p,
>    ALLOCNO_MEMORY_COST (a) = 0;
>    ALLOCNO_UPDATED_MEMORY_COST (a) = 0;
>    ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) = 0;
> -  ALLOCNO_NUM_OBJECTS (a) = 0;
> +  ALLOCNO_UNIT_SIZE (a) = 0;
> +  ALLOCNO_TRACK_SUBREG_P (a) = false;
>
>    ALLOCNO_ADD_DATA (a) = NULL;
>    allocno_vec.safe_push (a);
> @@ -535,6 +591,9 @@ ira_create_allocno (int regno, bool cap_p,
>    return a;
>  }
>
> +/* Record the regs referenced by subreg.  */
> +static bitmap_head regs_with_subreg;
> +
>  /* Set up register class for A and update its conflict hard
>     registers.  */
>  void
> @@ -549,6 +608,19 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
>        OBJECT_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
>        OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
>      }
> +
> +  if (aclass == NO_REGS)
> +    return;
> +  /* SET the unit_size of one register.  */
> +  machine_mode mode = ALLOCNO_MODE (a);
> +  int nregs = ira_reg_class_max_nregs[aclass][mode];
> +  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD)
> +      && bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
> +    {
> +      ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
> +      ALLOCNO_TRACK_SUBREG_P (a) = true;
> +      return;
> +    }
>  }
>
>  /* Determine the number of objects we should associate with allocno A
> @@ -561,12 +633,12 @@ ira_create_allocno_objects (ira_allocno_t a)
>    int n = ira_reg_class_max_nregs[aclass][mode];
>    int i;
>
> -  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD))
> +  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD)
> +      || !bitmap_bit_p (&regs_with_subreg, ALLOCNO_REGNO (a)))
>      n = 1;
>
> -  ALLOCNO_NUM_OBJECTS (a) = n;
>    for (i = 0; i < n; i++)
> -    ALLOCNO_OBJECT (a, i) = ira_create_object (a, i);
> +    ira_create_object (a, i);
>  }
>
>  /* For each allocno, set ALLOCNO_NUM_OBJECTS and create the
> @@ -3460,6 +3532,30 @@ update_conflict_hard_reg_costs (void)
>      }
>  }
>
> +/* Traverse all instructions to determine which ones have access through subreg.
> + */
> +static void
> +init_regs_with_subreg ()
> +{
> +  bitmap_initialize (&regs_with_subreg, &reg_obstack);
> +  basic_block bb;
> +  rtx_insn *insn;
> +  df_ref def, use;
> +  FOR_ALL_BB_FN (bb, cfun)
> +    FOR_BB_INSNS (bb, insn)
> +      {
> +       if (!NONDEBUG_INSN_P (insn))
> +         continue;
> +       df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> +       FOR_EACH_INSN_INFO_DEF (def, insn_info)
> +         if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_SUBREG))
> +           bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (def));
> +       FOR_EACH_INSN_INFO_USE (use, insn_info)
> +         if (DF_REF_FLAGS (use) & (DF_REF_PARTIAL | DF_REF_SUBREG))
> +           bitmap_set_bit (&regs_with_subreg, DF_REF_REGNO (use));
> +      }
> +}
> +
>  /* Create a internal representation (IR) for IRA (allocnos, copies,
>     loop tree nodes).  The function returns TRUE if we generate loop
>     structure (besides nodes representing all function and the basic
> @@ -3475,6 +3571,7 @@ ira_build (void)
>    initiate_allocnos ();
>    initiate_prefs ();
>    initiate_copies ();
> +  init_regs_with_subreg ();
>    create_loop_tree_nodes ();
>    form_loop_tree ();
>    create_allocnos ();
> @@ -3565,4 +3662,5 @@ ira_destroy (void)
>    finish_allocnos ();
>    finish_cost_vectors ();
>    ira_finish_allocno_live_ranges ();
> +  bitmap_clear (&regs_with_subreg);
>  }
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index f2e8ea34152..6af8318e5f5 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1031,7 +1031,7 @@ static void
>  setup_profitable_hard_regs (void)
>  {
>    unsigned int i;
> -  int j, k, nobj, hard_regno, nregs, class_size;
> +  int j, k, nobj, hard_regno, class_size;
>    ira_allocno_t a;
>    bitmap_iterator bi;
>    enum reg_class aclass;
> @@ -1076,7 +1076,6 @@ setup_profitable_hard_regs (void)
>           || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
>         continue;
>        mode = ALLOCNO_MODE (a);
> -      nregs = hard_regno_nregs (hard_regno, mode);
>        nobj = ALLOCNO_NUM_OBJECTS (a);
>        for (k = 0; k < nobj; k++)
>         {
> @@ -1088,24 +1087,39 @@ setup_profitable_hard_regs (void)
>             {
>               ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
>
> -             /* We can process the conflict allocno repeatedly with
> -                the same result.  */
> -             if (nregs == nobj && nregs > 1)
> +             if (!has_subreg_object_p (a))
>                 {
> -                 int num = OBJECT_SUBWORD (conflict_obj);
> -
> -                 if (REG_WORDS_BIG_ENDIAN)
> -                   CLEAR_HARD_REG_BIT
> -                     (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
> -                      hard_regno + nobj - num - 1);
> -                 else
> -                   CLEAR_HARD_REG_BIT
> -                     (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
> -                      hard_regno + num);
> +                 ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
> +                   &= ~ira_reg_mode_hard_regset[hard_regno][mode];
> +                 continue;
> +               }
> +
> +             /* Clear all hard regs occupied by obj.  */
> +             if (REG_WORDS_BIG_ENDIAN)
> +               {
> +                 int start_regno
> +                   = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj);
> +                 for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
> +                   {
> +                     int regno = start_regno - i;
> +                     if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
> +                       CLEAR_HARD_REG_BIT (
> +                         ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
> +                         regno);
> +                   }
>                 }
>               else
> -               ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
> -                 &= ~ira_reg_mode_hard_regset[hard_regno][mode];
> +               {
> +                 int start_regno = hard_regno + OBJECT_START (obj);
> +                 for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
> +                   {
> +                     int regno = start_regno + i;
> +                     if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
> +                       CLEAR_HARD_REG_BIT (
> +                         ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
> +                         regno);
> +                   }
> +               }
>             }
>         }
>      }
> @@ -1677,18 +1691,25 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
>     aligned.  */
>  static inline void
>  get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
> -                                       HARD_REG_SET *conflict_regs,
> +                                       HARD_REG_SET *start_conflict_regs,
>                                         HARD_REG_SET *start_profitable_regs)
>  {
>    int i, nwords;
>    ira_object_t obj;
>
>    nwords = ALLOCNO_NUM_OBJECTS (a);
> -  for (i = 0; i < nwords; i++)
> -    {
> -      obj = ALLOCNO_OBJECT (a, i);
> -      conflict_regs[i] = OBJECT_TOTAL_CONFLICT_HARD_REGS (obj);
> -    }
> +  CLEAR_HARD_REG_SET (*start_conflict_regs);
> +  if (has_subreg_object_p (a))
> +    for (i = 0; i < nwords; i++)
> +      {
> +       obj = ALLOCNO_OBJECT (a, i);
> +       for (int j = 0; j < OBJECT_NREGS (obj); j += 1)
> +         *start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
> +                                 >> (OBJECT_START (obj) + j);
> +      }
> +  else
> +    *start_conflict_regs
> +      = OBJECT_TOTAL_CONFLICT_HARD_REGS (get_full_object (a));
>    if (retry_p)
>      *start_profitable_regs
>        = (reg_class_contents[ALLOCNO_CLASS (a)]
> @@ -1702,9 +1723,9 @@ get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
>     PROFITABLE_REGS and whose objects have CONFLICT_REGS.  */
>  static inline bool
>  check_hard_reg_p (ira_allocno_t a, int hard_regno,
> -                 HARD_REG_SET *conflict_regs, HARD_REG_SET profitable_regs)
> +                 HARD_REG_SET start_conflict_regs,
> +                 HARD_REG_SET profitable_regs)
>  {
> -  int j, nwords, nregs;
>    enum reg_class aclass;
>    machine_mode mode;
>
> @@ -1716,28 +1737,17 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
>    /* Checking only profitable hard regs.  */
>    if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
>      return false;
> -  nregs = hard_regno_nregs (hard_regno, mode);
> -  nwords = ALLOCNO_NUM_OBJECTS (a);
> -  for (j = 0; j < nregs; j++)
> +
> +  if (has_subreg_object_p (a))
> +    return !TEST_HARD_REG_BIT (start_conflict_regs, hard_regno);
> +  else
>      {
> -      int k;
> -      int set_to_test_start = 0, set_to_test_end = nwords;
> -
> -      if (nregs == nwords)
> -       {
> -         if (REG_WORDS_BIG_ENDIAN)
> -           set_to_test_start = nwords - j - 1;
> -         else
> -           set_to_test_start = j;
> -         set_to_test_end = set_to_test_start + 1;
> -       }
> -      for (k = set_to_test_start; k < set_to_test_end; k++)
> -       if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j))
> -         break;
> -      if (k != set_to_test_end)
> -       break;
> +      int nregs = hard_regno_nregs (hard_regno, mode);
> +      for (int i = 0; i < nregs; i += 1)
> +       if (TEST_HARD_REG_BIT (start_conflict_regs, hard_regno + i))
> +         return false;
> +      return true;
>      }
> -  return j == nregs;
>  }
>
>  /* Return number of registers needed to be saved and restored at
> @@ -1945,7 +1955,7 @@ spill_soft_conflicts (ira_allocno_t a, bitmap allocnos_to_spill,
>  static bool
>  assign_hard_reg (ira_allocno_t a, bool retry_p)
>  {
> -  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
> +  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
>    int i, j, hard_regno, best_hard_regno, class_size;
>    int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word;
>    int *a_costs;
> @@ -1962,8 +1972,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>    HARD_REG_SET soft_conflict_regs = {};
>
>    ira_assert (! ALLOCNO_ASSIGNED_P (a));
> -  get_conflict_and_start_profitable_regs (a, retry_p,
> -                                         conflicting_regs,
> +  get_conflict_and_start_profitable_regs (a, retry_p, &start_conflicting_regs,
>                                           &profitable_hard_regs);
>    aclass = ALLOCNO_CLASS (a);
>    class_size = ira_class_hard_regs_num[aclass];
> @@ -2041,7 +2050,6 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>                       (hard_regno, ALLOCNO_MODE (conflict_a),
>                        reg_class_contents[aclass])))
>                 {
> -                 int n_objects = ALLOCNO_NUM_OBJECTS (conflict_a);
>                   int conflict_nregs;
>
>                   mode = ALLOCNO_MODE (conflict_a);
> @@ -2076,24 +2084,95 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>                             note_conflict (r);
>                         }
>                     }
> +                 else if (has_subreg_object_p (a))
> +                   {
> +                     /* Set start_conflicting_regs if that cause obj and
> +                        conflict_obj overlap. the overlap position:
> +                                          +--------------+
> +                                          | conflict_obj |
> +                                          +--------------+
> +
> +                              +-----------+              +-----------+
> +                              |   obj     |     ...      |   obj     |
> +                              +-----------+              +-----------+
> +
> +                       Point: A                  B       C
> +
> +                       the hard regs from A to C point will cause overlap.
> +                       For REG_WORDS_BIG_ENDIAN:
> +                          A = hard_regno + ALLOCNO_NREGS (conflict_a) - 1
> +                              - OBJECT_START (conflict_obj)
> +                              - OBJECT_NREGS (obj) + 1
> +                          C = A + OBJECT_NREGS (obj)
> +                              + OBJECT_NREGS (conflict_obj) - 2
> +                       For !REG_WORDS_BIG_ENDIAN:
> +                          A = hard_regno + OBJECT_START (conflict_obj)
> +                              - OBJECT_NREGS (obj) + 1
> +                          C = A + OBJECT_NREGS (obj)
> +                              + OBJECT_NREGS (conflict_obj) - 2
> +                        */
> +                     int start_regno;
> +                     int conflict_allocno_nregs, conflict_object_nregs,
> +                       conflict_object_start;
> +                     if (has_subreg_object_p (conflict_a))
> +                       {
> +                         conflict_allocno_nregs = ALLOCNO_NREGS (conflict_a);
> +                         conflict_object_nregs = OBJECT_NREGS (conflict_obj);
> +                         conflict_object_start = OBJECT_START (conflict_obj);
> +                       }
> +                     else
> +                       {
> +                         conflict_allocno_nregs = conflict_object_nregs
> +                           = hard_regno_nregs (hard_regno, mode);
> +                         conflict_object_start = 0;
> +                       }
> +                     if (REG_WORDS_BIG_ENDIAN)
> +                       {
> +                         int A = hard_regno + conflict_allocno_nregs - 1
> +                                 - conflict_object_start - OBJECT_NREGS (obj)
> +                                 + 1;
> +                         start_regno = A + OBJECT_NREGS (obj) - 1
> +                                       + OBJECT_START (obj) - ALLOCNO_NREGS (a)
> +                                       + 1;
> +                       }
> +                     else
> +                       {
> +                         int A = hard_regno + conflict_object_start
> +                                 - OBJECT_NREGS (obj) + 1;
> +                         start_regno = A - OBJECT_START (obj);
> +                       }
> +
> +                     for (int i = 0;
> +                          i <= OBJECT_NREGS (obj) + conflict_object_nregs - 2;
> +                          i += 1)
> +                       {
> +                         int regno = start_regno + i;
> +                         if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
> +                           SET_HARD_REG_BIT (start_conflicting_regs, regno);
> +                       }
> +                     if (hard_reg_set_subset_p (profitable_hard_regs,
> +                                                start_conflicting_regs))
> +                       goto fail;
> +                   }
>                   else
>                     {
> -                     if (conflict_nregs == n_objects && conflict_nregs > 1)
> +                     if (has_subreg_object_p (conflict_a))
>                         {
> -                         int num = OBJECT_SUBWORD (conflict_obj);
> -
> -                         if (REG_WORDS_BIG_ENDIAN)
> -                           SET_HARD_REG_BIT (conflicting_regs[word],
> -                                             hard_regno + n_objects - num - 1);
> -                         else
> -                           SET_HARD_REG_BIT (conflicting_regs[word],
> -                                             hard_regno + num);
> +                         int start_hard_regno
> +                           = REG_WORDS_BIG_ENDIAN
> +                               ? hard_regno + ALLOCNO_NREGS (conflict_a)
> +                                   - OBJECT_START (conflict_obj)
> +                               : hard_regno + OBJECT_START (conflict_obj);
> +                         for (int i = 0; i < OBJECT_NREGS (conflict_obj);
> +                              i += 1)
> +                           SET_HARD_REG_BIT (start_conflicting_regs,
> +                                             start_hard_regno + i);
>                         }
>                       else
> -                       conflicting_regs[word]
> +                       start_conflicting_regs
>                           |= ira_reg_mode_hard_regset[hard_regno][mode];
>                       if (hard_reg_set_subset_p (profitable_hard_regs,
> -                                                conflicting_regs[word]))
> +                                                start_conflicting_regs))
>                         goto fail;
>                     }
>                 }
> @@ -2160,8 +2239,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>           && FIRST_STACK_REG <= hard_regno && hard_regno <= LAST_STACK_REG)
>         continue;
>  #endif
> -      if (! check_hard_reg_p (a, hard_regno,
> -                             conflicting_regs, profitable_hard_regs))
> +      if (!check_hard_reg_p (a, hard_regno, start_conflicting_regs,
> +                            profitable_hard_regs))
>         continue;
>        cost = costs[i];
>        full_cost = full_costs[i];
> @@ -3154,7 +3233,7 @@ improve_allocation (void)
>    machine_mode mode;
>    int *allocno_costs;
>    int costs[FIRST_PSEUDO_REGISTER];
> -  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
> +  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
>    ira_allocno_t a;
>    bitmap_iterator bi;
>    int saved_nregs;
> @@ -3193,7 +3272,7 @@ improve_allocation (void)
>                      - allocno_copy_cost_saving (a, hregno));
>        try_p = false;
>        get_conflict_and_start_profitable_regs (a, false,
> -                                             conflicting_regs,
> +                                             &start_conflicting_regs,
>                                               &profitable_hard_regs);
>        class_size = ira_class_hard_regs_num[aclass];
>        mode = ALLOCNO_MODE (a);
> @@ -3202,8 +3281,8 @@ improve_allocation (void)
>        for (j = 0; j < class_size; j++)
>         {
>           hregno = ira_class_hard_regs[aclass][j];
> -         if (! check_hard_reg_p (a, hregno,
> -                                 conflicting_regs, profitable_hard_regs))
> +         if (!check_hard_reg_p (a, hregno, start_conflicting_regs,
> +                                profitable_hard_regs))
>             continue;
>           ira_assert (ira_class_hard_reg_index[aclass][hregno] == j);
>           k = allocno_costs == NULL ? 0 : j;
> @@ -3287,16 +3366,15 @@ improve_allocation (void)
>                 }
>               conflict_nregs = hard_regno_nregs (conflict_hregno,
>                                                  ALLOCNO_MODE (conflict_a));
> -             auto note_conflict = [&](int r)
> -               {
> -                 if (check_hard_reg_p (a, r,
> -                                       conflicting_regs, profitable_hard_regs))
> -                   {
> -                     if (spill_a)
> -                       SET_HARD_REG_BIT (soft_conflict_regs, r);
> -                     costs[r] += spill_cost;
> -                   }
> -               };
> +             auto note_conflict = [&] (int r) {
> +               if (check_hard_reg_p (a, r, start_conflicting_regs,
> +                                     profitable_hard_regs))
> +                 {
> +                   if (spill_a)
> +                     SET_HARD_REG_BIT (soft_conflict_regs, r);
> +                   costs[r] += spill_cost;
> +                 }
> +             };
>               for (r = conflict_hregno;
>                    r >= 0 && (int) end_hard_regno (mode, r) > conflict_hregno;
>                    r--)
> @@ -3314,8 +3392,8 @@ improve_allocation (void)
>        for (j = 0; j < class_size; j++)
>         {
>           hregno = ira_class_hard_regs[aclass][j];
> -         if (check_hard_reg_p (a, hregno,
> -                               conflicting_regs, profitable_hard_regs)
> +         if (check_hard_reg_p (a, hregno, start_conflicting_regs,
> +                               profitable_hard_regs)
>               && min_cost > costs[hregno])
>             {
>               best = hregno;
> diff --git a/gcc/ira-int.h b/gcc/ira-int.h
> index 0685e1f4e8d..b6281d3df6d 100644
> --- a/gcc/ira-int.h
> +++ b/gcc/ira-int.h
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  #include "recog.h"
>  #include "function-abi.h"
> +#include <vector>
>
>  /* To provide consistency in naming, all IRA external variables,
>     functions, common typedefs start with prefix ira_.  */
> @@ -240,6 +241,13 @@ struct ira_object
>       Zero means the lowest-order subword (or the entire allocno in case
>       it is not being tracked in subwords).  */
>    int subword;
> +  /* Reprensent OBJECT occupied [start, start + nregs) registers of it's
> +     ALLOCNO.  */
> +  int start, nregs;
> +  /* Reprensent the size and offset of current object, use to track subreg
> +     range, For full reg, the size is GET_MODE_SIZE (ALLOCNO_MODE (allocno)),
> +     offset is 0.  */
> +  poly_int64 size, offset;
>    /* Allocated size of the conflicts array.  */
>    unsigned int conflicts_array_size;
>    /* A unique number for every instance of this structure, which is used
> @@ -295,6 +303,11 @@ struct ira_allocno
>       reload (at this point pseudo-register has only one allocno) which
>       did not get stack slot yet.  */
>    signed int hard_regno : 16;
> +  /* Unit size of one register that allocate for the allocno. Only use to
> +     compute the start and nregs of subreg which be tracked.  */
> +  poly_int64 unit_size;
> +  /* Flag means need track subreg live range for the allocno.  */
> +  bool track_subreg_p;
>    /* A bitmask of the ABIs used by calls that occur while the allocno
>       is live.  */
>    unsigned int crossed_calls_abis : NUM_ABI_IDS;
> @@ -353,8 +366,6 @@ struct ira_allocno
>       register class living at the point than number of hard-registers
>       of the class available for the allocation.  */
>    int excess_pressure_points_num;
> -  /* The number of objects tracked in the following array.  */
> -  int num_objects;
>    /* Accumulated frequency of calls which given allocno
>       intersects.  */
>    int call_freq;
> @@ -387,8 +398,8 @@ struct ira_allocno
>    /* An array of structures describing conflict information and live
>       ranges for each object associated with the allocno.  There may be
>       more than one such object in cases where the allocno represents a
> -     multi-word register.  */
> -  ira_object_t objects[2];
> +     multi-hardreg pesudo.  */
> +  std::vector<ira_object_t> objects;
>    /* Registers clobbered by intersected calls.  */
>     HARD_REG_SET crossed_calls_clobbered_regs;
>    /* Array of usage costs (accumulated and the one updated during
> @@ -468,8 +479,12 @@ struct ira_allocno
>  #define ALLOCNO_EXCESS_PRESSURE_POINTS_NUM(A) \
>    ((A)->excess_pressure_points_num)
>  #define ALLOCNO_OBJECT(A,N) ((A)->objects[N])
> -#define ALLOCNO_NUM_OBJECTS(A) ((A)->num_objects)
> +#define ALLOCNO_NUM_OBJECTS(A) ((int) (A)->objects.size ())
>  #define ALLOCNO_ADD_DATA(A) ((A)->add_data)
> +#define ALLOCNO_UNIT_SIZE(A) ((A)->unit_size)
> +#define ALLOCNO_TRACK_SUBREG_P(A) ((A)->track_subreg_p)
> +#define ALLOCNO_NREGS(A)                                                       \
> +  (ira_reg_class_max_nregs[ALLOCNO_CLASS (A)][ALLOCNO_MODE (A)])
>
>  /* Typedef for pointer to the subsequent structure.  */
>  typedef struct ira_emit_data *ira_emit_data_t;
> @@ -511,6 +526,8 @@ allocno_emit_reg (ira_allocno_t a)
>  }
>
>  #define OBJECT_ALLOCNO(O) ((O)->allocno)
> +#define OBJECT_SIZE(O) ((O)->size)
> +#define OBJECT_OFFSET(O) ((O)->offset)
>  #define OBJECT_SUBWORD(O) ((O)->subword)
>  #define OBJECT_CONFLICT_ARRAY(O) ((O)->conflicts_array)
>  #define OBJECT_CONFLICT_VEC(O) ((ira_object_t *)(O)->conflicts_array)
> @@ -524,6 +541,8 @@ allocno_emit_reg (ira_allocno_t a)
>  #define OBJECT_MAX(O) ((O)->max)
>  #define OBJECT_CONFLICT_ID(O) ((O)->id)
>  #define OBJECT_LIVE_RANGES(O) ((O)->live_ranges)
> +#define OBJECT_START(O) ((O)->start)
> +#define OBJECT_NREGS(O) ((O)->nregs)
>
>  /* Map regno -> allocnos with given regno (see comments for
>     allocno member `next_regno_allocno').  */
> @@ -1041,6 +1060,8 @@ extern void ira_free_cost_vector (int *, reg_class_t);
>  extern void ira_flattening (int, int);
>  extern bool ira_build (void);
>  extern void ira_destroy (void);
> +extern ira_object_t
> +find_object (ira_allocno_t, int, int);
>
>  /* ira-costs.cc */
>  extern void ira_init_costs_once (void);
> @@ -1708,4 +1729,18 @@ ira_caller_save_loop_spill_p (ira_allocno_t a, ira_allocno_t subloop_a,
>    return call_cost && call_cost >= spill_cost;
>  }
>
> +/* Return true if allocno A has subreg object.  */
> +inline bool
> +has_subreg_object_p (ira_allocno_t a)
> +{
> +  return ALLOCNO_NUM_OBJECTS (a) > 1;
> +}
> +
> +/* Return the full object of allocno A.  */
> +inline ira_object_t
> +get_full_object (ira_allocno_t a)
> +{
> +  return find_object (a, 0, ALLOCNO_NREGS (a));
> +}
> +
>  #endif /* GCC_IRA_INT_H */
> diff --git a/gcc/ira.cc b/gcc/ira.cc
> index d7530f01380..2fa6e0e5c94 100644
> --- a/gcc/ira.cc
> +++ b/gcc/ira.cc
> @@ -2623,7 +2623,7 @@ static void
>  check_allocation (void)
>  {
>    ira_allocno_t a;
> -  int hard_regno, nregs, conflict_nregs;
> +  int hard_regno;
>    ira_allocno_iterator ai;
>
>    FOR_EACH_ALLOCNO (a, ai)
> @@ -2634,28 +2634,18 @@ check_allocation (void)
>        if (ALLOCNO_CAP_MEMBER (a) != NULL
>           || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
>         continue;
> -      nregs = hard_regno_nregs (hard_regno, ALLOCNO_MODE (a));
> -      if (nregs == 1)
> -       /* We allocated a single hard register.  */
> -       n = 1;
> -      else if (n > 1)
> -       /* We allocated multiple hard registers, and we will test
> -          conflicts in a granularity of single hard regs.  */
> -       nregs = 1;
>
>        for (i = 0; i < n; i++)
>         {
>           ira_object_t obj = ALLOCNO_OBJECT (a, i);
>           ira_object_t conflict_obj;
>           ira_object_conflict_iterator oci;
> -         int this_regno = hard_regno;
> -         if (n > 1)
> -           {
> -             if (REG_WORDS_BIG_ENDIAN)
> -               this_regno += n - i - 1;
> -             else
> -               this_regno += i;
> -           }
> +         int this_regno;
> +         if (REG_WORDS_BIG_ENDIAN)
> +           this_regno = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj)
> +                        - OBJECT_NREGS (obj) + 1;
> +         else
> +           this_regno = hard_regno + OBJECT_START (obj);
>           FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
>             {
>               ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
> @@ -2665,24 +2655,18 @@ check_allocation (void)
>               if (ira_soft_conflict (a, conflict_a))
>                 continue;
>
> -             conflict_nregs = hard_regno_nregs (conflict_hard_regno,
> -                                                ALLOCNO_MODE (conflict_a));
> -
> -             if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1
> -                 && conflict_nregs == ALLOCNO_NUM_OBJECTS (conflict_a))
> -               {
> -                 if (REG_WORDS_BIG_ENDIAN)
> -                   conflict_hard_regno += (ALLOCNO_NUM_OBJECTS (conflict_a)
> -                                           - OBJECT_SUBWORD (conflict_obj) - 1);
> -                 else
> -                   conflict_hard_regno += OBJECT_SUBWORD (conflict_obj);
> -                 conflict_nregs = 1;
> -               }
> +             if (REG_WORDS_BIG_ENDIAN)
> +               conflict_hard_regno = conflict_hard_regno
> +                                     + ALLOCNO_NREGS (conflict_a) - 1
> +                                     - OBJECT_START (conflict_obj)
> +                                     - OBJECT_NREGS (conflict_obj) + 1;
> +             else
> +               conflict_hard_regno
> +                 = conflict_hard_regno + OBJECT_START (conflict_obj);
>
> -             if ((conflict_hard_regno <= this_regno
> -                && this_regno < conflict_hard_regno + conflict_nregs)
> -               || (this_regno <= conflict_hard_regno
> -                   && conflict_hard_regno < this_regno + nregs))
> +             if (!(this_regno + OBJECT_NREGS (obj) <= conflict_hard_regno
> +                   || conflict_hard_regno + OBJECT_NREGS (conflict_obj)
> +                        <= this_regno))
>                 {
>                   fprintf (stderr, "bad allocation for %d and %d\n",
>                            ALLOCNO_REGNO (a), ALLOCNO_REGNO (conflict_a));
> --
> 2.36.3
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general
  2023-11-08  7:57   ` Richard Biener
@ 2023-11-08  8:34     ` Lehua Ding
  0 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-08  8:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

Hi Richard,

Thanks for taking the time to review the code.

On 2023/11/8 15:57, Richard Biener wrote:
> On Wed, Nov 8, 2023 at 4:48 AM Lehua Ding <lehua.ding@rivai.ai> wrote:
>>
>> This patch does not make any functional changes. It mainly refactor two parts:
>>
>> 1. The ira_allocno's objects field is expanded to an scalable array, and multi-word
>>     pseduo registers are split and tracked only when necessary.
>> 2. Since the objects array has been expanded, there will be more subreg objects
>>     that pass through later, rather than the previous fixed two. Therefore, it
>>     is necessary to modify the detection of whether two objects conflict, and
>>     the check method is to pull back the registers occupied by the object to
>>     the first register of the allocno for judgment.
> 
> Did you profile this before/after?  RA performance is critical ...

Based on the data I ran earlier, the performance changes on spec2017 
were very slight. I'll run again and give you the data.Based on my 
expectations, the impact on existing performance should all be minimal. 
Except for examples like the ones I put up.

>> diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
>> index b0bb9bce074..760eadba186 100644
>> --- a/gcc/hard-reg-set.h
>> +++ b/gcc/hard-reg-set.h
>> @@ -113,6 +113,39 @@ struct HARD_REG_SET
>>       return !operator== (other);
>>     }
>>
>> +  HARD_REG_SET
>> +  operator>> (unsigned int shift_amount) const
> 
> This is a quite costly operation, why do we need it instead
> of keeping an "offset" for set queries?

Because there are logic operations after the shift. For a mutil hardreg 
pseudo register, it will record the physical registers of each part of 
the conflict, and different parts of the offset are different, and we 
need to unify these differences to the conflict against the first single 
reg of the pseduo register. That is to say, first we need to convert it 
to a conflict against the first_single_reg, and then we need to collect 
all the conflicting registers (by OR operation). like this:

*start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) >> 
(OBJECT_START (obj) + j)

>> +/* Return the object in allocno A which match START & NREGS.  */
>> +ira_object_t
>> +find_object (ira_allocno_t a, int start, int nregs)
>> +{
>> +  for (ira_object_t obj : a->objects)
> 
> linear search?  really?

I was thinking about the fact that most allocno's have only one objects, 
and most of the others don't have more than 10, so I chose this easiest 
way to find them. Thanks for the heads up, it's really not very good 
here, I'll see if there's a faster way.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7]  ira/lra: Support subreg coalesce
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (7 preceding siblings ...)
  2023-11-08  3:55 ` [PATCH 0/7] ira/lra: Support subreg coalesce juzhe.zhong
@ 2023-11-08  9:40 ` Richard Sandiford
  2023-11-08 19:13   ` Jeff Law
  2023-11-10  9:26   ` Lehua Ding
  2023-11-08 16:56 ` Dimitar Dimitrov
  2023-11-09 20:24 ` Vladimir Makarov
  10 siblings, 2 replies; 37+ messages in thread
From: Richard Sandiford @ 2023-11-08  9:40 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, juzhe.zhong

Lehua Ding <lehua.ding@rivai.ai> writes:
> Hi,
>
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).

Thanks a lot for the series.  This is definitely something we've
needed for a while.

I probably won't be able to look at it in detail for a couple of weeks
(and the real review should come from Vlad anyway), but one initial
comment:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.

I agree with Richi of course that compile-time is a concern.
The patch seems to add quite a bit of new data to ira_allocno,
but perhaps that's OK.  ira_object + ira_allocno is already quite big.

However:

@@ -387,8 +398,8 @@ struct ira_allocno
   /* An array of structures describing conflict information and live
      ranges for each object associated with the allocno.  There may be
      more than one such object in cases where the allocno represents a
-     multi-word register.  */
-  ira_object_t objects[2];
+     multi-hardreg pesudo.  */
+  std::vector<ira_object_t> objects;
   /* Registers clobbered by intersected calls.  */
    HARD_REG_SET crossed_calls_clobbered_regs;
   /* Array of usage costs (accumulated and the one updated during

adds an extra level of indirection (and separate extra storage) for
every allocno, not just multi-hardreg ones.  It'd be worth optimising
the data structures' representation of single-hardreg pseudos even if
that slows down the multi-hardreg code, since single-hardreg pseudos are
so much more common.  And the different single-hardreg and multi-hardreg
representations could be hidden behind accessors, to make life easier
for consumers.  (Of course, performance of the accessors is also then
an issue. :))

Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7]  ira/lra: Support subreg coalesce
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (8 preceding siblings ...)
  2023-11-08  9:40 ` Richard Sandiford
@ 2023-11-08 16:56 ` Dimitar Dimitrov
  2023-11-10  8:46   ` Lehua Ding
  2023-11-12 10:08   ` Lehua Ding
  2023-11-09 20:24 ` Vladimir Makarov
  10 siblings, 2 replies; 37+ messages in thread
From: Dimitar Dimitrov @ 2023-11-08 16:56 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

On Wed, Nov 08, 2023 at 11:47:33AM +0800, Lehua Ding wrote:
> Hi,
> 
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).

Hi Lehua,

This patch set breaks the build for at least three embedded targets. See
below.

For avr the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded ‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
  149 |         set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);


For arm-none-eabi the newlib build fails with:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal compiler error: Floating point exception
  279 | }
      | ^
0x1176e0f crash_signal
        /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0xf6008d get_range_hard_regs(int, subreg_range const&)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0xf6008d get_range_hard_regs(int, subreg_range const&)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
0xf60312 new_insn_reg
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0xf6064d add_regs_to_insn_regno_info
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
0xf62909 lra_update_insn_regno_info(rtx_insn*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0xf62e46 lra_update_insn_regno_info(rtx_insn*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
0xf62e46 lra_push_insn_1
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0xf62f2d lra_push_insn(rtx_insn*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0xf62f2d push_insns
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0xf63302 push_insns
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
0xf63302 lra(_IO_FILE*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0xf0e399 do_reload 
        /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xf0e399 execute
        /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


For pru-elf the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c: In function 'linear_search_fdes':
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c:1035:1: internal compiler error: Floating point exception
 1035 | }
      | ^
0x1694f2e crash_signal
        /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0x1313178 get_range_hard_regs(int, subreg_range const&)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0x131343a new_insn_reg
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0x13174f0 add_regs_to_insn_regno_info
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1608
0x1318479 lra_update_insn_regno_info(rtx_insn*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0x13196ab lra_push_insn_1
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0x13196de lra_push_insn(rtx_insn*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0x13197da push_insns
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0x131b6dc lra(_IO_FILE*)
        /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0x129f237 do_reload
        /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0x129f6c6 execute
        /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


The divide by zero error above is interesting. I'm not sure why ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding ])) -1
     (nil))

Regards,
Dimitar

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08  9:40 ` Richard Sandiford
@ 2023-11-08 19:13   ` Jeff Law
  2023-11-10  9:43     ` Lehua Ding
  2023-11-11 15:33     ` Richard Sandiford
  2023-11-10  9:26   ` Lehua Ding
  1 sibling, 2 replies; 37+ messages in thread
From: Jeff Law @ 2023-11-08 19:13 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches, vmakarov, juzhe.zhong, richard.sandiford



On 11/8/23 02:40, Richard Sandiford wrote:
> Lehua Ding <lehua.ding@rivai.ai> writes:
>> Hi,
>>
>> These patchs try to support subreg coalesce feature in
>> register allocation passes (ira and lra).
> 
> Thanks a lot for the series.  This is definitely something we've
> needed for a while.
> 
> I probably won't be able to look at it in detail for a couple of weeks
> (and the real review should come from Vlad anyway), but one initial
> comment:
Absolutely agreed on the above.

The other thing to ponder.  Jivan and I have been banging on Joern's 
sub-object tracking bits for a totally different problem in the RISC-V 
space.  But there may be some overlap.

Essentially Joern's code tracks liveness for a few chunks in registers. 
bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
propagating liveness from the destination through to the sources.  SO 
for example if we have

(set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))

If we had previously determined that only bits 0..15 were live in DEST, 
then we'll propagate that into the source registers.

The goal is to ultimately transform something like

(set (dest:mode) (any_extend:mode (reg:narrower_mode)))

into

(set (dest:mode) (subreg:mode (reg:narrower_mode)))

Where the latter typically will get simplified and propagated away.


Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
it from a correctness standpoint.  It'll also need the usual cleanups.

Anyway, point being I think it'll be worth looking at Lehua's bits and 
Joern's bits to see if there's anything that can and should be shared. 
Given I'm getting fairly familiar with Joern's bits, that likely falls 
to me.

Jeff

> 
> Tracking subreg liveness will sometimes expose dead code that
> wasn't obvious without it.  PR89606 has an example of this.
> There the dead code was introduced by init-regs, and there's a
> debate about (a) whether init-regs should still be run and (b) if it
> should still be run, whether it should use subreg liveness tracking too.
> 
> But I think such dead code is possible even without init-regs.
> So for the purpose of this series, I think the init-regs behaviour
> in that PR creates a helpful example.
> 
> I agree with Richi of course that compile-time is a concern.
> The patch seems to add quite a bit of new data to ira_allocno,
> but perhaps that's OK.  ira_object + ira_allocno is already quite big.
> 
> However:
> 
> @@ -387,8 +398,8 @@ struct ira_allocno
>     /* An array of structures describing conflict information and live
>        ranges for each object associated with the allocno.  There may be
>        more than one such object in cases where the allocno represents a
> -     multi-word register.  */
> -  ira_object_t objects[2];
> +     multi-hardreg pesudo.  */
> +  std::vector<ira_object_t> objects;
>     /* Registers clobbered by intersected calls.  */
>      HARD_REG_SET crossed_calls_clobbered_regs;
>     /* Array of usage costs (accumulated and the one updated during
> 
> adds an extra level of indirection (and separate extra storage) for
> every allocno, not just multi-hardreg ones.  It'd be worth optimising
> the data structures' representation of single-hardreg pseudos even if
> that slows down the multi-hardreg code, since single-hardreg pseudos are
> so much more common.  And the different single-hardreg and multi-hardreg
> representations could be hidden behind accessors, to make life easier
> for consumers.  (Of course, performance of the accessors is also then
> an issue. :))
> 
> Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (9 preceding siblings ...)
  2023-11-08 16:56 ` Dimitar Dimitrov
@ 2023-11-09 20:24 ` Vladimir Makarov
  2023-11-10  7:59   ` Richard Biener
  2023-11-12 12:01   ` Lehua Ding
  10 siblings, 2 replies; 37+ messages in thread
From: Vladimir Makarov @ 2023-11-09 20:24 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches; +Cc: richard.sandiford, juzhe.zhong


On 11/7/23 22:47, Lehua Ding wrote:
>
> Lehua Ding (7):
>    ira: Refactor the handling of register conflicts to make it more
>      general
>    ira: Add live_subreg problem and apply to ira pass
>    ira: Support subreg live range track
>    ira: Support subreg copy
>    ira: Add all nregs >= 2 pseudos to tracke subreg list
>    lra: Apply live_subreg df_problem to lra pass
>    lra: Support subreg live range track and conflict detect
>
Thank you very much for addressing subreg RA.  It is a big work.  I 
wanted to address this long time ago but have no time to do this by myself.

I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC.  
I used -O3 for SPEC2017 compilation.

Here are the results:

                baseline baseline(+patches)
specint2017:  8.51 vs 8.58 (+0.8%)
specfp2017:   21.1 vs 21.1 (+0%)
compile time: 2426.41s vs 2580.58s (+6.4%)

Spec2017 average code size change: -0.07%

Improving specint by 0.8% is impressive for me.

Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
(although on smaller benchmark I saw only 3% slowdown). I don't know how 
but we should mitigate this speed degradation.  May be we can find a hot 
spot in the new code (but I think it is not a linear search pointed by 
Richard Biener as the object vectors most probably contain 1-2 elements) 
and this code spot can be improved, or we could use this only for 
-O3/fast, or the code can be function or target dependent.

I also find GCC consumes more memory with the patches. May be it can be 
improved too (although I am not sure about this).

I'll start to review the patches on the next week.  I don't expect that 
I'll find something serious to reject the patches but again we should 
work on mitigation of the compilation speed problem.  We can fill a new 
PR for this and resolve the problem during the release cycle.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-09 20:24 ` Vladimir Makarov
@ 2023-11-10  7:59   ` Richard Biener
  2023-11-12 12:01   ` Lehua Ding
  1 sibling, 0 replies; 37+ messages in thread
From: Richard Biener @ 2023-11-10  7:59 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Lehua Ding, gcc-patches, richard.sandiford, juzhe.zhong

On Thu, Nov 9, 2023 at 9:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
> On 11/7/23 22:47, Lehua Ding wrote:
> >
> > Lehua Ding (7):
> >    ira: Refactor the handling of register conflicts to make it more
> >      general
> >    ira: Add live_subreg problem and apply to ira pass
> >    ira: Support subreg live range track
> >    ira: Support subreg copy
> >    ira: Add all nregs >= 2 pseudos to tracke subreg list
> >    lra: Apply live_subreg df_problem to lra pass
> >    lra: Support subreg live range track and conflict detect
> >
> Thank you very much for addressing subreg RA.  It is a big work.  I
> wanted to address this long time ago but have no time to do this by myself.
>
> I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC.
> I used -O3 for SPEC2017 compilation.
>
> Here are the results:
>
>                 baseline baseline(+patches)
> specint2017:  8.51 vs 8.58 (+0.8%)
> specfp2017:   21.1 vs 21.1 (+0%)
> compile time: 2426.41s vs 2580.58s (+6.4%)
>
> Spec2017 average code size change: -0.07%
>
> Improving specint by 0.8% is impressive for me.
>
> Unfortunately, it is achieved by decreasing compilation speed by 6.4%
> (although on smaller benchmark I saw only 3% slowdown). I don't know how
> but we should mitigate this speed degradation.  May be we can find a hot
> spot in the new code (but I think it is not a linear search pointed by
> Richard Biener as the object vectors most probably contain 1-2 elements)
> and this code spot can be improved, or we could use this only for
> -O3/fast, or the code can be function or target dependent.
>
> I also find GCC consumes more memory with the patches. May be it can be
> improved too (although I am not sure about this).

Note I think it's important that this can be disabled by default for -O1
which we recommend when you feed GCC with large machine-generated
code which is also where I guess you'll find the effect is way worse.

That includes disabling the memory usage side-effect which I guess might
be hard given you grow generic data structures.

> I'll start to review the patches on the next week.  I don't expect that
> I'll find something serious to reject the patches but again we should
> work on mitigation of the compilation speed problem.  We can fill a new
> PR for this and resolve the problem during the release cycle.
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08 16:56 ` Dimitar Dimitrov
@ 2023-11-10  8:46   ` Lehua Ding
  2023-11-10  8:53     ` Lehua Ding
  2023-11-12 10:08   ` Lehua Ding
  1 sibling, 1 reply; 37+ messages in thread
From: Lehua Ding @ 2023-11-10  8:46 UTC (permalink / raw)
  To: Dimitar Dimitrov; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

Hi Dimitar,

Thanks for the tests.

> This patch set breaks the build for at least three embedded targets. See
> below.
> 
> For avr the GCC build fails with:
> /mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded ‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
>    149 |         set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);

I think it's because `HARD_REG_SET` and `unsigned int` are of the same 
type on avr target(i.e. No more than 32 registers on avr target), so 
these two bellow function prototypes conflict, I'll adjust it.

static void
set_subreg_conflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)

static void
set_subreg_conflict_hard_regs (ira_allocno_t a, unsigned int regno)

> For arm-none-eabi the newlib build fails with:
> /mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal compiler error: Floating point exception
>    279 | }
>        | ^
> 0x1176e0f crash_signal
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
> 0xf6008d get_range_hard_regs(int, subreg_range const&)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
> 0xf6008d get_range_hard_regs(int, subreg_range const&)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
> 0xf60312 new_insn_reg
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
> 0xf6064d add_regs_to_insn_regno_info
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
> 0xf62909 lra_update_insn_regno_info(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
> 0xf62e46 lra_update_insn_regno_info(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
> 0xf62e46 lra_push_insn_1
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
> 0xf62f2d lra_push_insn(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
> 0xf62f2d push_insns
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
> 0xf63302 push_insns
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
> 0xf63302 lra(_IO_FILE*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
> 0xf0e399 do_reload
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
> 0xf0e399 execute
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148
> 
> The divide by zero error above is interesting. I'm not sure why ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following rtx:
> (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding ])) -1
>       (nil))

I just cross compiled an arm-none-eabi compiler and didn't encounter 
this error, can you give me a little more config info about build? For 
example, flags_for_target, etc. Thanks again.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10  8:46   ` Lehua Ding
@ 2023-11-10  8:53     ` Lehua Ding
  2023-11-10 16:00       ` Dimitar Dimitrov
  0 siblings, 1 reply; 37+ messages in thread
From: Lehua Ding @ 2023-11-10  8:53 UTC (permalink / raw)
  To: Dimitar Dimitrov; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

>> The divide by zero error above is interesting. I'm not sure why 
>> ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the 
>> following rtx:
>> (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ 
>> encoding ])) -1
>>       (nil))
> 
> I just cross compiled an arm-none-eabi compiler and didn't encounter 
> this error, can you give me a little more config info about build? For 
> example, flags_for_target, etc. Thanks again.
> 

Forgot, please also provide the version information of newlib code.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08  9:40 ` Richard Sandiford
  2023-11-08 19:13   ` Jeff Law
@ 2023-11-10  9:26   ` Lehua Ding
  2023-11-10 10:16     ` Richard Sandiford
  1 sibling, 1 reply; 37+ messages in thread
From: Lehua Ding @ 2023-11-10  9:26 UTC (permalink / raw)
  To: gcc-patches, vmakarov, juzhe.zhong, richard.sandiford

Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:
> Tracking subreg liveness will sometimes expose dead code that
> wasn't obvious without it.  PR89606 has an example of this.
> There the dead code was introduced by init-regs, and there's a
> debate about (a) whether init-regs should still be run and (b) if it
> should still be run, whether it should use subreg liveness tracking too.
> 
> But I think such dead code is possible even without init-regs.
> So for the purpose of this series, I think the init-regs behaviour
> in that PR creates a helpful example.

Yes, I think the init-regs should be enhanced to reduce unnecessary 
initialization. My previous internal patchs did this in a separate 
patch. Maybe I should split the live_subreg problem out of the second 
patch and not couple it with these patches. That way it can be reviewed 
separately.

> I agree with Richi of course that compile-time is a concern.
> The patch seems to add quite a bit of new data to ira_allocno,
> but perhaps that's OK.  ira_object + ira_allocno is already quite big.
> 
> However:
> 
> @@ -387,8 +398,8 @@ struct ira_allocno
>     /* An array of structures describing conflict information and live
>        ranges for each object associated with the allocno.  There may be
>        more than one such object in cases where the allocno represents a
> -     multi-word register.  */
> -  ira_object_t objects[2];
> +     multi-hardreg pesudo.  */
> +  std::vector<ira_object_t> objects;
>     /* Registers clobbered by intersected calls.  */
>      HARD_REG_SET crossed_calls_clobbered_regs;
>     /* Array of usage costs (accumulated and the one updated during
> 
> adds an extra level of indirection (and separate extra storage) for
> every allocno, not just multi-hardreg ones.  It'd be worth optimising
> the data structures' representation of single-hardreg pseudos even if
> that slows down the multi-hardreg code, since single-hardreg pseudos are
> so much more common.  And the different single-hardreg and multi-hardreg
> representations could be hidden behind accessors, to make life easier
> for consumers.  (Of course, performance of the accessors is also then
> an issue. :))

Okay, I'll try. Thank you so much.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08  3:55 ` [PATCH 0/7] ira/lra: Support subreg coalesce juzhe.zhong
@ 2023-11-10  9:29   ` Lehua Ding
  0 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-10  9:29 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: vmakarov, richard.sandiford



On 2023/11/8 11:55, juzhe.zhong@rivai.ai wrote:
> Thanks Lehua.
> 
> Appreciate for supporting subreg liveness tracking with tons of work.
> 
> A nit comments, I think you should mention these following PRs:
> 
> 106694
> 89967
> 106146
> 99161
> 
> No need send V2 now. You can send V2 after Richard and Vlad reviewed.
Okay, thanks :)

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08 19:13   ` Jeff Law
@ 2023-11-10  9:43     ` Lehua Ding
  2023-11-11 15:33     ` Richard Sandiford
  1 sibling, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-10  9:43 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, vmakarov, juzhe.zhong, richard.sandiford

Hi Jeff,

On 2023/11/9 3:13, Jeff Law wrote:
> The other thing to ponder.  Jivan and I have been banging on Joern's 
> sub-object tracking bits for a totally different problem in the RISC-V 
> space.  But there may be some overlap.
> 
> Essentially Joern's code tracks liveness for a few chunks in registers. 
> bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
> propagating liveness from the destination through to the sources.  SO 
> for example if we have
> 
> (set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))
> 
> If we had previously determined that only bits 0..15 were live in DEST, 
> then we'll propagate that into the source registers.
> 
> The goal is to ultimately transform something like
> 
> (set (dest:mode) (any_extend:mode (reg:narrower_mode)))
> 
> into
> 
> (set (dest:mode) (subreg:mode (reg:narrower_mode)))
> 
> Where the latter typically will get simplified and propagated away.
> 
> 
> Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
> it from a correctness standpoint.  It'll also need the usual cleanups.
> 
> Anyway, point being I think it'll be worth looking at Lehua's bits and 
> Joern's bits to see if there's anything that can and should be shared. 
> Given I'm getting fairly familiar with Joern's bits, that likely falls 
> to me.

Maybe subreg live range track classes (in patch 2) could be shared. 
Including range's UNION, Diff, and other operations should be similar. 
I'll see if I'm going to extract a separate patch to review this part. 
What do you think?

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10  9:26   ` Lehua Ding
@ 2023-11-10 10:16     ` Richard Sandiford
  2023-11-10 10:30       ` Lehua Ding
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Sandiford @ 2023-11-10 10:16 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, juzhe.zhong

Lehua Ding <lehua.ding@rivai.ai> writes:
> Hi Richard,
>
> On 2023/11/8 17:40, Richard Sandiford wrote:
>> Tracking subreg liveness will sometimes expose dead code that
>> wasn't obvious without it.  PR89606 has an example of this.
>> There the dead code was introduced by init-regs, and there's a
>> debate about (a) whether init-regs should still be run and (b) if it
>> should still be run, whether it should use subreg liveness tracking too.
>> 
>> But I think such dead code is possible even without init-regs.
>> So for the purpose of this series, I think the init-regs behaviour
>> in that PR creates a helpful example.
>
> Yes, I think the init-regs should be enhanced to reduce unnecessary 
> initialization. My previous internal patchs did this in a separate 
> patch. Maybe I should split the live_subreg problem out of the second 
> patch and not couple it with these patches. That way it can be reviewed 
> separately.

But my point was that this kind of dead code is possible even without
init-regs.  So I think we should have something that removes the dead
code.  And we can try it on that PR (without changing init-regs).

Thanks,
Richard

>
>> I agree with Richi of course that compile-time is a concern.
>> The patch seems to add quite a bit of new data to ira_allocno,
>> but perhaps that's OK.  ira_object + ira_allocno is already quite big.
>> 
>> However:
>> 
>> @@ -387,8 +398,8 @@ struct ira_allocno
>>     /* An array of structures describing conflict information and live
>>        ranges for each object associated with the allocno.  There may be
>>        more than one such object in cases where the allocno represents a
>> -     multi-word register.  */
>> -  ira_object_t objects[2];
>> +     multi-hardreg pesudo.  */
>> +  std::vector<ira_object_t> objects;
>>     /* Registers clobbered by intersected calls.  */
>>      HARD_REG_SET crossed_calls_clobbered_regs;
>>     /* Array of usage costs (accumulated and the one updated during
>> 
>> adds an extra level of indirection (and separate extra storage) for
>> every allocno, not just multi-hardreg ones.  It'd be worth optimising
>> the data structures' representation of single-hardreg pseudos even if
>> that slows down the multi-hardreg code, since single-hardreg pseudos are
>> so much more common.  And the different single-hardreg and multi-hardreg
>> representations could be hidden behind accessors, to make life easier
>> for consumers.  (Of course, performance of the accessors is also then
>> an issue. :))
>
> Okay, I'll try. Thank you so much.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10 10:16     ` Richard Sandiford
@ 2023-11-10 10:30       ` Lehua Ding
  2023-11-10 10:39         ` Richard Sandiford
  0 siblings, 1 reply; 37+ messages in thread
From: Lehua Ding @ 2023-11-10 10:30 UTC (permalink / raw)
  To: gcc-patches, vmakarov, juzhe.zhong, richard.sandiford

On 2023/11/10 18:16, Richard Sandiford wrote:
> Lehua Ding <lehua.ding@rivai.ai> writes:
>> Hi Richard,
>>
>> On 2023/11/8 17:40, Richard Sandiford wrote:
>>> Tracking subreg liveness will sometimes expose dead code that
>>> wasn't obvious without it.  PR89606 has an example of this.
>>> There the dead code was introduced by init-regs, and there's a
>>> debate about (a) whether init-regs should still be run and (b) if it
>>> should still be run, whether it should use subreg liveness tracking too.
>>>
>>> But I think such dead code is possible even without init-regs.
>>> So for the purpose of this series, I think the init-regs behaviour
>>> in that PR creates a helpful example.
>>
>> Yes, I think the init-regs should be enhanced to reduce unnecessary
>> initialization. My previous internal patchs did this in a separate
>> patch. Maybe I should split the live_subreg problem out of the second
>> patch and not couple it with these patches. That way it can be reviewed
>> separately.
> 
> But my point was that this kind of dead code is possible even without
> init-regs.  So I think we should have something that removes the dead
> code.  And we can try it on that PR (without changing init-regs).

Got it, so we should add a fast remove dead code job after init-regs pass.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10 10:30       ` Lehua Ding
@ 2023-11-10 10:39         ` Richard Sandiford
  2023-11-10 14:28           ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Sandiford @ 2023-11-10 10:39 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, juzhe.zhong

Lehua Ding <lehua.ding@rivai.ai> writes:
> On 2023/11/10 18:16, Richard Sandiford wrote:
>> Lehua Ding <lehua.ding@rivai.ai> writes:
>>> Hi Richard,
>>>
>>> On 2023/11/8 17:40, Richard Sandiford wrote:
>>>> Tracking subreg liveness will sometimes expose dead code that
>>>> wasn't obvious without it.  PR89606 has an example of this.
>>>> There the dead code was introduced by init-regs, and there's a
>>>> debate about (a) whether init-regs should still be run and (b) if it
>>>> should still be run, whether it should use subreg liveness tracking too.
>>>>
>>>> But I think such dead code is possible even without init-regs.
>>>> So for the purpose of this series, I think the init-regs behaviour
>>>> in that PR creates a helpful example.
>>>
>>> Yes, I think the init-regs should be enhanced to reduce unnecessary
>>> initialization. My previous internal patchs did this in a separate
>>> patch. Maybe I should split the live_subreg problem out of the second
>>> patch and not couple it with these patches. That way it can be reviewed
>>> separately.
>> 
>> But my point was that this kind of dead code is possible even without
>> init-regs.  So I think we should have something that removes the dead
>> code.  And we can try it on that PR (without changing init-regs).
>
> Got it, so we should add a fast remove dead code job after init-regs pass.

I'm just not sure how fast it would be, given that it needs the subreg
liveness info.  Could it be done during RA itself, during one of the existing
instruction walks?  E.g. if IRA sees a dead instruction, it could remove it
rather than recording conflict information for it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10 10:39         ` Richard Sandiford
@ 2023-11-10 14:28           ` Jeff Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-11-10 14:28 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches, vmakarov, juzhe.zhong, richard.sandiford



On 11/10/23 03:39, Richard Sandiford wrote:
> Lehua Ding <lehua.ding@rivai.ai> writes:
>> On 2023/11/10 18:16, Richard Sandiford wrote:
>>> Lehua Ding <lehua.ding@rivai.ai> writes:
>>>> Hi Richard,
>>>>
>>>> On 2023/11/8 17:40, Richard Sandiford wrote:
>>>>> Tracking subreg liveness will sometimes expose dead code that
>>>>> wasn't obvious without it.  PR89606 has an example of this.
>>>>> There the dead code was introduced by init-regs, and there's a
>>>>> debate about (a) whether init-regs should still be run and (b) if it
>>>>> should still be run, whether it should use subreg liveness tracking too.
>>>>>
>>>>> But I think such dead code is possible even without init-regs.
>>>>> So for the purpose of this series, I think the init-regs behaviour
>>>>> in that PR creates a helpful example.
>>>>
>>>> Yes, I think the init-regs should be enhanced to reduce unnecessary
>>>> initialization. My previous internal patchs did this in a separate
>>>> patch. Maybe I should split the live_subreg problem out of the second
>>>> patch and not couple it with these patches. That way it can be reviewed
>>>> separately.
>>>
>>> But my point was that this kind of dead code is possible even without
>>> init-regs.  So I think we should have something that removes the dead
>>> code.  And we can try it on that PR (without changing init-regs).
>>
>> Got it, so we should add a fast remove dead code job after init-regs pass.
> 
> I'm just not sure how fast it would be, given that it needs the subreg
> liveness info.  Could it be done during RA itself, during one of the existing
> instruction walks?  E.g. if IRA sees a dead instruction, it could remove it
> rather than recording conflict information for it.
> 
Yea, it's a real concern.  I haven't done the analysis yet, but I have a 
  sense that Joern's ext-dce work which Jivan and I are working on 
(which does sub-object liveness tracking) is having a compile-time 
impact as well.

Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10  8:53     ` Lehua Ding
@ 2023-11-10 16:00       ` Dimitar Dimitrov
  2023-11-12  6:06         ` Lehua Ding
  0 siblings, 1 reply; 37+ messages in thread
From: Dimitar Dimitrov @ 2023-11-10 16:00 UTC (permalink / raw)
  To: Lehua Ding; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

On Fri, Nov 10, 2023 at 04:53:57PM +0800, Lehua Ding wrote:
> > > The divide by zero error above is interesting. I'm not sure why
> > > ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in
> > > the following rtx:
> > > (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [
> > > encoding ])) -1
> > >       (nil))
> > 
> > I just cross compiled an arm-none-eabi compiler and didn't encounter
> > this error, can you give me a little more config info about build? For
> > example, flags_for_target, etc. Thanks again.
> > 
> 
> Forgot, please also provide the version information of newlib code.
> 

These are the GIT commit hashes which I tested:
  gcc 39d81b667373b0033f44702a4b532a4618dde9ff
  binutils c96ceed9dce7617f270aa4742645706e535f74b7
  newlib 39f734a857e2692224715b03b99fc7bd83e94a0f

This is the script I'm using to build arm-none-eabi:
   https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-arm.sh
The build steps and config parameters are easily seen there.

Note that the Linaro CI is also detecting issues. It hits ICEs when
building libgcc:
  https://patchwork.sourceware.org/project/gcc/patch/20231108034740.834590-8-lehua.ding@rivai.ai/

Regards,
Dimitar


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08 19:13   ` Jeff Law
  2023-11-10  9:43     ` Lehua Ding
@ 2023-11-11 15:33     ` Richard Sandiford
  2023-11-11 17:46       ` Jeff Law
  2023-11-12  1:16       ` 钟居哲
  1 sibling, 2 replies; 37+ messages in thread
From: Richard Sandiford @ 2023-11-11 15:33 UTC (permalink / raw)
  To: Jeff Law; +Cc: Lehua Ding, gcc-patches, vmakarov, juzhe.zhong

Jeff Law <jeffreyalaw@gmail.com> writes:
> On 11/8/23 02:40, Richard Sandiford wrote:
>> Lehua Ding <lehua.ding@rivai.ai> writes:
>>> Hi,
>>>
>>> These patchs try to support subreg coalesce feature in
>>> register allocation passes (ira and lra).
>> 
>> Thanks a lot for the series.  This is definitely something we've
>> needed for a while.
>> 
>> I probably won't be able to look at it in detail for a couple of weeks
>> (and the real review should come from Vlad anyway), but one initial
>> comment:
> Absolutely agreed on the above.
>
> The other thing to ponder.  Jivan and I have been banging on Joern's 
> sub-object tracking bits for a totally different problem in the RISC-V 
> space.  But there may be some overlap.
>
> Essentially Joern's code tracks liveness for a few chunks in registers. 
> bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
> propagating liveness from the destination through to the sources.  SO 
> for example if we have
>
> (set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))
>
> If we had previously determined that only bits 0..15 were live in DEST, 
> then we'll propagate that into the source registers.
>
> The goal is to ultimately transform something like
>
> (set (dest:mode) (any_extend:mode (reg:narrower_mode)))
>
> into
>
> (set (dest:mode) (subreg:mode (reg:narrower_mode)))
>
> Where the latter typically will get simplified and propagated away.
>
>
> Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
> it from a correctness standpoint.  It'll also need the usual cleanups.

Ah, nice!  How configurable are the bit ranges?  We might be able to use
something similar to track lanes in a vector operation, to detect the
dead code in:

   ins v0.b[4], w0
   ...
   ins v0.b[4], w1

It sounds like the bit ranges you have now would do that for some
common/useful cases, even if it doesn't handle the general case.

Maybe dead lanes are better tracked at the gimple level though, not sure.
(But AArch64 might need to lower lane operations more than it does now if
we want gimple to handle it.)

Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-11 15:33     ` Richard Sandiford
@ 2023-11-11 17:46       ` Jeff Law
  2023-11-12  1:16       ` 钟居哲
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-11-11 17:46 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches, vmakarov, juzhe.zhong, richard.sandiford



On 11/11/23 08:33, Richard Sandiford wrote:

>> Joern's code is a bit of a mess, but Jivan and I are slowly untangling
>> it from a correctness standpoint.  It'll also need the usual cleanups.
> 
> Ah, nice!  How configurable are the bit ranges?  We might be able to use
> something similar to track lanes in a vector operation, to detect the
> dead code in:
> 
>     ins v0.b[4], w0
>     ...
>     ins v0.b[4], w1
> 
> It sounds like the bit ranges you have now would do that for some
> common/useful cases, even if it doesn't handle the general case.
It could probably be extended to handle more cases.  Right now the 
regions tracked are static.  Bits 0..7, 8..16, 16..31 and 32..64.  I 
don't think extending it to additional regions would be terribly hard.

> 
> Maybe dead lanes are better tracked at the gimple level though, not sure.
> (But AArch64 might need to lower lane operations more than it does now if
> we want gimple to handle it.)
I'd think the best place depends on what you want to do with the dead 
lane information.  THe more complex the transformation you want to make 
the more likely gimple is the right spot.  If you're looking to do 
something simplistic like Joern's code does when it finds dead chunks 
RTL seems like the natural choice.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-11 15:33     ` Richard Sandiford
  2023-11-11 17:46       ` Jeff Law
@ 2023-11-12  1:16       ` 钟居哲
  2023-11-12 11:53         ` Richard Sandiford
  1 sibling, 1 reply; 37+ messages in thread
From: 钟居哲 @ 2023-11-12  1:16 UTC (permalink / raw)
  To: richard.sandiford, Jeff Law
  Cc: 丁乐华, gcc-patches, vmakarov

[-- Attachment #1: Type: text/plain, Size: 5884 bytes --]

Hi, Richard.

>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>> (But AArch64 might need to lower lane operations more than it does now if
>> we want gimple to handle it.)

We were trying to address such issue at GIMPLE leve at the beginning.
Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types.
However, for RVV, that's not enough to address all issues.
Consider this following situation:
https://godbolt.org/z/fhTvEjvr8 

You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v".
Since GCC is not able to tracking subreg liveness, wheras LLVM can.

The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV:

1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t".
    It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
    Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE.
    
2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters
    which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations.
    For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same
    instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers.  Such type we can't
    define them as tuple type for following reasons:
    1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type.
         If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or
         Array with array ? It makes type so strange.
    2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able
         to change the documents.
    3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely
         used, changing type definition will destroy ecosystem.  So for compability, we are not able define
         LMUL > 1 as tuple type.

For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide
vget to generate subreg access of the vector mode.

So, at the discussion stage, we decided to address subpart access of vector mode in more generic way,
which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE,
but also address issues for LMUL > 1.

3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
    Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer.
    So, the first draft of our solution is supporting register coalescing before RA which is opened source:
    riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next ・ riscv-collab/riscv-gcc (github.com)
    by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted
    Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be
    more reasonable and elegant approach. 

So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches.
And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-11-11 23:33
To: Jeff Law
CC: Lehua Ding; gcc-patches; vmakarov; juzhe.zhong
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
Jeff Law <jeffreyalaw@gmail.com> writes:
> On 11/8/23 02:40, Richard Sandiford wrote:
>> Lehua Ding <lehua.ding@rivai.ai> writes:
>>> Hi,
>>>
>>> These patchs try to support subreg coalesce feature in
>>> register allocation passes (ira and lra).
>> 
>> Thanks a lot for the series.  This is definitely something we've
>> needed for a while.
>> 
>> I probably won't be able to look at it in detail for a couple of weeks
>> (and the real review should come from Vlad anyway), but one initial
>> comment:
> Absolutely agreed on the above.
>
> The other thing to ponder.  Jivan and I have been banging on Joern's 
> sub-object tracking bits for a totally different problem in the RISC-V 
> space.  But there may be some overlap.
>
> Essentially Joern's code tracks liveness for a few chunks in registers. 
> bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
> propagating liveness from the destination through to the sources.  SO 
> for example if we have
>
> (set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))
>
> If we had previously determined that only bits 0..15 were live in DEST, 
> then we'll propagate that into the source registers.
>
> The goal is to ultimately transform something like
>
> (set (dest:mode) (any_extend:mode (reg:narrower_mode)))
>
> into
>
> (set (dest:mode) (subreg:mode (reg:narrower_mode)))
>
> Where the latter typically will get simplified and propagated away.
>
>
> Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
> it from a correctness standpoint.  It'll also need the usual cleanups.
 
Ah, nice!  How configurable are the bit ranges?  We might be able to use
something similar to track lanes in a vector operation, to detect the
dead code in:
 
   ins v0.b[4], w0
   ...
   ins v0.b[4], w1
 
It sounds like the bit ranges you have now would do that for some
common/useful cases, even if it doesn't handle the general case.
 
Maybe dead lanes are better tracked at the gimple level though, not sure.
(But AArch64 might need to lower lane operations more than it does now if
we want gimple to handle it.)
 
Richard
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-10 16:00       ` Dimitar Dimitrov
@ 2023-11-12  6:06         ` Lehua Ding
  0 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-12  6:06 UTC (permalink / raw)
  To: Dimitar Dimitrov; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

Hi Dimitar,

On 2023/11/11 0:00, Dimitar Dimitrov wrote:
> On Fri, Nov 10, 2023 at 04:53:57PM +0800, Lehua Ding wrote:
>>>> The divide by zero error above is interesting. I'm not sure why
>>>> ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in
>>>> the following rtx:
>>>> (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [
>>>> encoding ])) -1
>>>>        (nil))
>>>
>>> I just cross compiled an arm-none-eabi compiler and didn't encounter
>>> this error, can you give me a little more config info about build? For
>>> example, flags_for_target, etc. Thanks again.
>>>
>>
>> Forgot, please also provide the version information of newlib code.
>>
> 
> These are the GIT commit hashes which I tested:
>    gcc 39d81b667373b0033f44702a4b532a4618dde9ff
>    binutils c96ceed9dce7617f270aa4742645706e535f74b7
>    newlib 39f734a857e2692224715b03b99fc7bd83e94a0f
> 
> This is the script I'm using to build arm-none-eabi:
>     https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-arm.sh
> The build steps and config parameters are easily seen there.
> 
> Note that the Linaro CI is also detecting issues. It hits ICEs when
> building libgcc:
>    https://patchwork.sourceware.org/project/gcc/patch/20231108034740.834590-8-lehua.ding@rivai.ai/

Thanks so much for the information, I can reproduce the problem now! I 
will fixed these bugs in the V2 patchs.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-08 16:56 ` Dimitar Dimitrov
  2023-11-10  8:46   ` Lehua Ding
@ 2023-11-12 10:08   ` Lehua Ding
  1 sibling, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-12 10:08 UTC (permalink / raw)
  To: Dimitar Dimitrov; +Cc: gcc-patches, vmakarov, richard.sandiford, juzhe.zhong

Hi Dimitar,

I solved the problem you reported in V2 patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636166.html), 
is it possible for you to help confirm this? Thank you very much.

On 2023/11/9 0:56, Dimitar Dimitrov wrote:
> On Wed, Nov 08, 2023 at 11:47:33AM +0800, Lehua Ding wrote:
>> Hi,
>>
>> These patchs try to support subreg coalesce feature in
>> register allocation passes (ira and lra).
> 
> Hi Lehua,
> 
> This patch set breaks the build for at least three embedded targets. See
> below.
> 
> For avr the GCC build fails with:
> /mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded ‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
>    149 |         set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);
> 
> 
> For arm-none-eabi the newlib build fails with:
> /mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal compiler error: Floating point exception
>    279 | }
>        | ^
> 0x1176e0f crash_signal
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
> 0xf6008d get_range_hard_regs(int, subreg_range const&)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
> 0xf6008d get_range_hard_regs(int, subreg_range const&)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
> 0xf60312 new_insn_reg
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
> 0xf6064d add_regs_to_insn_regno_info
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
> 0xf62909 lra_update_insn_regno_info(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
> 0xf62e46 lra_update_insn_regno_info(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
> 0xf62e46 lra_push_insn_1
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
> 0xf62f2d lra_push_insn(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
> 0xf62f2d push_insns
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
> 0xf63302 push_insns
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
> 0xf63302 lra(_IO_FILE*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
> 0xf0e399 do_reload
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
> 0xf0e399 execute
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148
> 
> 
> For pru-elf the GCC build fails with:
> /mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c: In function 'linear_search_fdes':
> /mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c:1035:1: internal compiler error: Floating point exception
>   1035 | }
>        | ^
> 0x1694f2e crash_signal
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
> 0x1313178 get_range_hard_regs(int, subreg_range const&)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
> 0x131343a new_insn_reg
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
> 0x13174f0 add_regs_to_insn_regno_info
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1608
> 0x1318479 lra_update_insn_regno_info(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
> 0x13196ab lra_push_insn_1
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
> 0x13196de lra_push_insn(rtx_insn*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
> 0x13197da push_insns
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
> 0x131b6dc lra(_IO_FILE*)
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
> 0x129f237 do_reload
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
> 0x129f6c6 execute
>          /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148
> 
> 
> The divide by zero error above is interesting. I'm not sure why ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following rtx:
> (debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding ])) -1
>       (nil))
> 
> Regards,
> Dimitar
> 

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-12  1:16       ` 钟居哲
@ 2023-11-12 11:53         ` Richard Sandiford
  2023-11-13  1:11           ` juzhe.zhong
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Sandiford @ 2023-11-12 11:53 UTC (permalink / raw)
  To: 钟居哲
  Cc: Jeff Law, 丁乐华, gcc-patches, vmakarov

钟居哲 <juzhe.zhong@rivai.ai> writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types.
> However, for RVV, that's not enough to address all issues.
> Consider this following situation:
> https://godbolt.org/z/fhTvEjvr8 
>
> You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v".
> Since GCC is not able to tracking subreg liveness, wheras LLVM can.
>
> The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV:
>
> 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t".
>     It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
>     Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE.
>     
> 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters
>     which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations.
>     For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same
>     instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers.  Such type we can't
>     define them as tuple type for following reasons:
>     1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type.
>          If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or
>          Array with array ? It makes type so strange.
>     2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able
>          to change the documents.
>     3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely
>          used, changing type definition will destroy ecosystem.  So for compability, we are not able define
>          LMUL > 1 as tuple type.
>
> For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide
> vget to generate subreg access of the vector mode.
>
> So, at the discussion stage, we decided to address subpart access of vector mode in more generic way,
> which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE,
> but also address issues for LMUL > 1.
>
> 3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
>     Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer.
>     So, the first draft of our solution is supporting register coalescing before RA which is opened source:
>     riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · riscv-collab/riscv-gcc (github.com)
>     by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted
>     Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be
>     more reasonable and elegant approach. 
>
> So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches.
> And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems.

Ah, sorry, I caused a misunderstanding.  In the message quoted above,
I'd moved on from talking about tracking liveness of vectors in a tuple.
I was instead talking about tracking the liveness of individual lanes
in a single vector.

I was responding to Jeff's description of the bit-level liveness tracking
pass.  That pass solves a generic issue: redundant sign and zero extensions.
But it sounded like it could also be reused for tracking lanes of a vector
(by using different bit ranges from the ones that Jeff listed).

The thing that I was saying might be better done on gimple was tracking
lanes of an individual vector.  In other words, I was arguing against
my own question.

I should have changed the subject line when responding, sorry.

I wasn't suggesting that we should avoid subreg tracking in the RA.
That's definitely needed for AArch64, and in general.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-09 20:24 ` Vladimir Makarov
  2023-11-10  7:59   ` Richard Biener
@ 2023-11-12 12:01   ` Lehua Ding
  2023-11-12 12:12     ` Lehua Ding
  2023-11-13 19:25     ` Vladimir Makarov
  1 sibling, 2 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-12 12:01 UTC (permalink / raw)
  To: Vladimir Makarov, gcc-patches; +Cc: richard.sandiford, juzhe.zhong

Hi Vladimir,

On 2023/11/10 4:24, Vladimir Makarov wrote:
> 
> On 11/7/23 22:47, Lehua Ding wrote:
>>
>> Lehua Ding (7):
>>    ira: Refactor the handling of register conflicts to make it more
>>      general
>>    ira: Add live_subreg problem and apply to ira pass
>>    ira: Support subreg live range track
>>    ira: Support subreg copy
>>    ira: Add all nregs >= 2 pseudos to tracke subreg list
>>    lra: Apply live_subreg df_problem to lra pass
>>    lra: Support subreg live range track and conflict detect
>>
> Thank you very much for addressing subreg RA.  It is a big work.  I 
> wanted to address this long time ago but have no time to do this by myself.
> 
> I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC. 
> I used -O3 for SPEC2017 compilation.
> 
> Here are the results:
> 
>                 baseline baseline(+patches)
> specint2017:  8.51 vs 8.58 (+0.8%)
> specfp2017:   21.1 vs 21.1 (+0%)
> compile time: 2426.41s vs 2580.58s (+6.4%)
> 
> Spec2017 average code size change: -0.07%
> 
> Improving specint by 0.8% is impressive for me.
> 
> Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
> (although on smaller benchmark I saw only 3% slowdown). I don't know how 
> but we should mitigate this speed degradation.  May be we can find a hot 
> spot in the new code (but I think it is not a linear search pointed by 
> Richard Biener as the object vectors most probably contain 1-2 elements) 
> and this code spot can be improved, or we could use this only for 
> -O3/fast, or the code can be function or target dependent.
> 
> I also find GCC consumes more memory with the patches. May be it can be 
> improved too (although I am not sure about this).

Thanks for the specint performance data. I'll do my best to get the 
compile time and memory issues fixed. I'm very curious to know if the 
way used to solve the subreg coalesce problem makes sense to you?

> I'll start to review the patches on the next week.  I don't expect that 
> I'll find something serious to reject the patches but again we should 
> work on mitigation of the compilation speed problem.  We can fill a new 
> PR for this and resolve the problem during the release cycle.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-12 12:01   ` Lehua Ding
@ 2023-11-12 12:12     ` Lehua Ding
  2023-11-13 19:25     ` Vladimir Makarov
  1 sibling, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-12 12:12 UTC (permalink / raw)
  To: Vladimir Makarov, gcc-patches; +Cc: richard.sandiford, juzhe.zhong

Hi Vladimir,

While you're starting your review, please review v3 version that fixes 
some ICE issues, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636178.html

On 2023/11/12 20:01, Lehua Ding wrote:
> Hi Vladimir,
> 
> On 2023/11/10 4:24, Vladimir Makarov wrote:
>>
>> On 11/7/23 22:47, Lehua Ding wrote:
>>>
>>> Lehua Ding (7):
>>>    ira: Refactor the handling of register conflicts to make it more
>>>      general
>>>    ira: Add live_subreg problem and apply to ira pass
>>>    ira: Support subreg live range track
>>>    ira: Support subreg copy
>>>    ira: Add all nregs >= 2 pseudos to tracke subreg list
>>>    lra: Apply live_subreg df_problem to lra pass
>>>    lra: Support subreg live range track and conflict detect
>>>
>> Thank you very much for addressing subreg RA.  It is a big work.  I 
>> wanted to address this long time ago but have no time to do this by 
>> myself.
>>
>> I tried to evaluate your patches on x86-64 (i7-9700k) release mode 
>> GCC. I used -O3 for SPEC2017 compilation.
>>
>> Here are the results:
>>
>>                 baseline baseline(+patches)
>> specint2017:  8.51 vs 8.58 (+0.8%)
>> specfp2017:   21.1 vs 21.1 (+0%)
>> compile time: 2426.41s vs 2580.58s (+6.4%)
>>
>> Spec2017 average code size change: -0.07%
>>
>> Improving specint by 0.8% is impressive for me.
>>
>> Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
>> (although on smaller benchmark I saw only 3% slowdown). I don't know 
>> how but we should mitigate this speed degradation.  May be we can find 
>> a hot spot in the new code (but I think it is not a linear search 
>> pointed by Richard Biener as the object vectors most probably contain 
>> 1-2 elements) and this code spot can be improved, or we could use this 
>> only for -O3/fast, or the code can be function or target dependent.
>>
>> I also find GCC consumes more memory with the patches. May be it can 
>> be improved too (although I am not sure about this).
> 
> Thanks for the specint performance data. I'll do my best to get the 
> compile time and memory issues fixed. I'm very curious to know if the 
> way used to solve the subreg coalesce problem makes sense to you?
> 
>> I'll start to review the patches on the next week.  I don't expect 
>> that I'll find something serious to reject the patches but again we 
>> should work on mitigation of the compilation speed problem.  We can 
>> fill a new PR for this and resolve the problem during the release cycle.
> 

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-12 11:53         ` Richard Sandiford
@ 2023-11-13  1:11           ` juzhe.zhong
  2023-11-13  3:34             ` Lehua Ding
  0 siblings, 1 reply; 37+ messages in thread
From: juzhe.zhong @ 2023-11-13  1:11 UTC (permalink / raw)
  To: richard.sandiford
  Cc: jeffreyalaw, 丁乐华, gcc-patches, vmakarov

[-- Attachment #1: Type: text/plain, Size: 5514 bytes --]

>> Ah, nice!  How configurable are the bit ranges?
I think Lehua's patch is configurable for bit ranges.
Since his patch allow target flexible tracking subreg livenesss according to REGMODE_NATURAL_SIZE

+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+	 && multiple_p (total_size, natural_size)
+	 && regno >= FIRST_PSEUDO_REGISTER;
+}
It depends on how targets configure REGMODE_NATURAL_SIZE target hook.

If we return QImode size, his patch is enable tracking bit ranges 7 bits subreg.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-11-12 19:53
To: 钟居哲
CC: Jeff Law; 丁乐华; gcc-patches; vmakarov
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
钟居哲 <juzhe.zhong@rivai.ai> writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types.
> However, for RVV, that's not enough to address all issues.
> Consider this following situation:
> https://godbolt.org/z/fhTvEjvr8 
>
> You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v".
> Since GCC is not able to tracking subreg liveness, wheras LLVM can.
>
> The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV:
>
> 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t".
>     It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
>     Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE.
>     
> 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters
>     which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations.
>     For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same
>     instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers.  Such type we can't
>     define them as tuple type for following reasons:
>     1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type.
>          If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or
>          Array with array ? It makes type so strange.
>     2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able
>          to change the documents.
>     3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely
>          used, changing type definition will destroy ecosystem.  So for compability, we are not able define
>          LMUL > 1 as tuple type.
>
> For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide
> vget to generate subreg access of the vector mode.
>
> So, at the discussion stage, we decided to address subpart access of vector mode in more generic way,
> which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE,
> but also address issues for LMUL > 1.
>
> 3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
>     Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer.
>     So, the first draft of our solution is supporting register coalescing before RA which is opened source:
>     riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · riscv-collab/riscv-gcc (github.com)
>     by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted
>     Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be
>     more reasonable and elegant approach. 
>
> So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches.
> And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems.
 
Ah, sorry, I caused a misunderstanding.  In the message quoted above,
I'd moved on from talking about tracking liveness of vectors in a tuple.
I was instead talking about tracking the liveness of individual lanes
in a single vector.
 
I was responding to Jeff's description of the bit-level liveness tracking
pass.  That pass solves a generic issue: redundant sign and zero extensions.
But it sounded like it could also be reused for tracking lanes of a vector
(by using different bit ranges from the ones that Jeff listed).
 
The thing that I was saying might be better done on gimple was tracking
lanes of an individual vector.  In other words, I was arguing against
my own question.
 
I should have changed the subject line when responding, sorry.
 
I wasn't suggesting that we should avoid subreg tracking in the RA.
That's definitely needed for AArch64, and in general.
 
Thanks,
Richard
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-13  1:11           ` juzhe.zhong
@ 2023-11-13  3:34             ` Lehua Ding
  0 siblings, 0 replies; 37+ messages in thread
From: Lehua Ding @ 2023-11-13  3:34 UTC (permalink / raw)
  To: juzhe.zhong, richard.sandiford; +Cc: jeffreyalaw, gcc-patches, vmakarov



On 2023/11/13 9:11, juzhe.zhong@rivai.ai wrote:
>>> Ah, nice!  How configurable are the bit ranges?
> I think Lehua's patch is configurable for bit ranges.
> Since his patch allow target flexible tracking subreg livenesss 
> according to REGMODE_NATURAL_SIZE
> 
> +/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
> +bool
> +need_track_subreg (int regno, machine_mode reg_mode)
> +{
> +  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
> +  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
> +  return maybe_gt (total_size, natural_size)
> +	 && multiple_p (total_size, natural_size)
> +	 && regno >= FIRST_PSEUDO_REGISTER;
> +}
> 
> It depends on how targets configure REGMODE_NATURAL_SIZE target hook.
> 
> If we return QImode size, his patch is enable tracking bit ranges 7 bits 
> subreg.

Yes, the current subreg_ranges class provides 
remove_range/add_range/remove_ranges/add_ranges interfaces to modify 
ranges. Each subreg_range contains start and end fields representing the 
range [start, end). For live_subreg problem, the value returned by 
REGMODE_NATURAL_SIZE is used as the unit, for bit track like Jeff's 
side, it can be used bit as the unit.

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/7] ira/lra: Support subreg coalesce
  2023-11-12 12:01   ` Lehua Ding
  2023-11-12 12:12     ` Lehua Ding
@ 2023-11-13 19:25     ` Vladimir Makarov
  1 sibling, 0 replies; 37+ messages in thread
From: Vladimir Makarov @ 2023-11-13 19:25 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches; +Cc: richard.sandiford, juzhe.zhong


On 11/12/23 07:01, Lehua Ding wrote:
> Thanks for the specint performance data. I'll do my best to get the 
> compile time and memory issues fixed. I'm very curious to know if the 
> way used to solve the subreg coalesce problem makes sense to you?
>
If it works,  it is ok for me.  There is always a room for any 
optimization even if it decreases compilation speed considerably. We 
just need to keep the same speed for optimization level <= 2.  We can 
put really expensive optimizations to -O3 or -Ofast.

Although the first thing I would try myself is to do subreg liveness 
analysis only locally (inside BBs).  The majority cases I saw to improve 
subreg RA were local (inside a BB).   For such approach, we probably 
would have only minor compiler speed slowdown and could use the 
optimization by default.


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-11-13 19:26 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-08  3:47 [PATCH 0/7] ira/lra: Support subreg coalesce Lehua Ding
2023-11-08  3:47 ` [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general Lehua Ding
2023-11-08  7:57   ` Richard Biener
2023-11-08  8:34     ` Lehua Ding
2023-11-08  3:47 ` [PATCH 2/7] ira: Add live_subreg problem and apply to ira pass Lehua Ding
2023-11-08  3:47 ` [PATCH 3/7] ira: Support subreg live range track Lehua Ding
2023-11-08  3:47 ` [PATCH 4/7] ira: Support subreg copy Lehua Ding
2023-11-08  3:47 ` [PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
2023-11-08  3:47 ` [PATCH 6/7] lra: Apply live_subreg df_problem to lra pass Lehua Ding
2023-11-08  3:47 ` [PATCH 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
2023-11-08  3:55 ` [PATCH 0/7] ira/lra: Support subreg coalesce juzhe.zhong
2023-11-10  9:29   ` Lehua Ding
2023-11-08  9:40 ` Richard Sandiford
2023-11-08 19:13   ` Jeff Law
2023-11-10  9:43     ` Lehua Ding
2023-11-11 15:33     ` Richard Sandiford
2023-11-11 17:46       ` Jeff Law
2023-11-12  1:16       ` 钟居哲
2023-11-12 11:53         ` Richard Sandiford
2023-11-13  1:11           ` juzhe.zhong
2023-11-13  3:34             ` Lehua Ding
2023-11-10  9:26   ` Lehua Ding
2023-11-10 10:16     ` Richard Sandiford
2023-11-10 10:30       ` Lehua Ding
2023-11-10 10:39         ` Richard Sandiford
2023-11-10 14:28           ` Jeff Law
2023-11-08 16:56 ` Dimitar Dimitrov
2023-11-10  8:46   ` Lehua Ding
2023-11-10  8:53     ` Lehua Ding
2023-11-10 16:00       ` Dimitar Dimitrov
2023-11-12  6:06         ` Lehua Ding
2023-11-12 10:08   ` Lehua Ding
2023-11-09 20:24 ` Vladimir Makarov
2023-11-10  7:59   ` Richard Biener
2023-11-12 12:01   ` Lehua Ding
2023-11-12 12:12     ` Lehua Ding
2023-11-13 19:25     ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).