public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2 0/7] ira/lra: Support subreg coalesce
@ 2023-11-12  9:58 Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem Lehua Ding
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):

```
#include <riscv_vector.h>

void
foo (int32_t *in, int32_t *out, size_t m)
{
  vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
  for (size_t i = 0; i < m; i++)
    {
      v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
      v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
    }
  *(vint32m1_t*)(out+4*0) = v0;
  *(vint32m1_t*)(out+4*1) = v1;
}
```

Before these patchs:

```
foo:
	li	a5,32
	vsetvli	zero,a5,e32,m2,ta,ma
	vle32.v	v4,0(a0)
	vmv1r.v	v2,v4
	vmv1r.v	v1,v5
	beq	a2,zero,.L2
	li	a5,0
	vsetivli	zero,4,e32,m1,ta,ma
.L3:
	addi	a5,a5,1
	vadd.vv	v2,v2,v2
	vmul.vv	v1,v1,v1
	bne	a2,a5,.L3
.L2:
	vs1r.v	v2,0(a1)
	addi	a1,a1,16
	vs1r.v	v1,0(a1)
	ret
```

After these patchs:

```
foo:
	li	a5,32
	vsetvli	zero,a5,e32,m2,ta,ma
	vle32.v	v2,0(a0)
	beq	a2,zero,.L2
	li	a5,0
	vsetivli	zero,4,e32,m1,ta,ma
.L3:
	addi	a5,a5,1
	vadd.vv	v2,v2,v2
	vmul.vv	v3,v3,v3
	bne	a2,a5,.L3
.L2:
	vs1r.v	v2,0(a1)
	addi	a1,a1,16
	vs1r.v	v3,0(a1)
	ret
```

As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.

If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.

```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole}
     (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
        (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole}
     (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
        (nil)))
```

ira dump:

;; a1(r136,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;;
;; ...
      Popping a1(r135,l0)  --         assign reg 97
      Popping a3(r136,l0)  --         assign reg 98
      Popping a4(r137,l0)  --         assign reg 15
      Popping a5(r140,l0)  --         assign reg 12
      Popping a10(r145,l0)  --         assign reg 12
      Popping a2(r139,l0)  --         assign reg 11
      Popping a9(r144,l0)  --         assign reg 11
      Popping a0(r142,l0)  --         assign reg 11
      Popping a6(r134,l0)  --         assign reg 100
      Popping a7(r143,l0)  --         assign reg 10
      Popping a8(r141,l0)  --         assign reg 15

The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):

```
#include <arm_sve.h>

int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out)
{
  svint64x4_t result = svld4_s64 (pg, base);
  svint64_t v0 = svget4_s64(result, 0);
  svint64_t v1 = svget4_s64(result, 1);
  svint64_t v2 = svget4_s64(result, 2);
  svint64_t v3 = svget4_s64(result, 3);

  for (int i = 0; i < n; i += 1)
    {
        svint64_t v18 = svld1_s64(pg, in1);
        svint64_t v19 = svld1_s64(pg, in2);
        v0 = svmad_s64_z(pg, v0, v18, v19);
        v1 = svmad_s64_z(pg, v1, v18, v19);
        v2 = svmad_s64_z(pg, v2, v18, v19);
        v3 = svmad_s64_z(pg, v3, v18, v19);
    }
  svst1_s64(pg, out+0,v0);
  svst1_s64(pg, out+1,v1);
  svst1_s64(pg, out+2,v2);
  svst1_s64(pg, out+3,v3);
}
```

Before these patchs:

```
bar:
	ld4d	{z4.d - z7.d}, p0/z, [x0]
	mov	z26.d, z4.d
	mov	z27.d, z5.d
	mov	z28.d, z6.d
	mov	z29.d, z7.d
	cmp	w1, 0
	...
```

After these patchs:

```
bar:
	ld4d	{z28.d - z31.d}, p0/z, [x0]
	cmp	w1, 0
	...
```

Lehua Ding (7):
  df: Add DF_LIVE_SUBREG problem
  ira: Switch to live_subreg data
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Switch to live_subreg data flow
  lra: Support subreg live range track and conflict detect

 gcc/Makefile.in          |   1 +
 gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
 gcc/df.h                 |  67 +++
 gcc/hard-reg-set.h       |  33 ++
 gcc/ira-build.cc         | 456 ++++++++++++++++----
 gcc/ira-color.cc         | 851 ++++++++++++++++++++++++++-----------
 gcc/ira-conflicts.cc     | 221 +++++++---
 gcc/ira-emit.cc          |  24 +-
 gcc/ira-int.h            |  67 ++-
 gcc/ira-lives.cc         | 507 ++++++++++++++++------
 gcc/ira.cc               |  73 ++--
 gcc/lra-assigns.cc       | 111 ++++-
 gcc/lra-coalesce.cc      |  20 +-
 gcc/lra-constraints.cc   | 111 +++--
 gcc/lra-int.h            |  33 ++
 gcc/lra-lives.cc         | 660 ++++++++++++++++++++++++-----
 gcc/lra-remat.cc         |  13 +-
 gcc/lra-spills.cc        |  22 +-
 gcc/lra.cc               | 139 +++++-
 gcc/regs.h               |   7 +
 gcc/subreg-live-range.cc | 628 +++++++++++++++++++++++++++
 gcc/subreg-live-range.h  | 333 +++++++++++++++
 gcc/timevar.def          |   1 +
 23 files changed, 4490 insertions(+), 777 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 2/7] ira: Switch to live_subreg data Lehua Ding
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch adds a live_subreg problem to extend the original live_reg to
track the liveness of subreg. We will only try to trace speudo registers
who's mode size is a multiple of nature size and eventually a small portion
of the inside will appear to use subreg. With live_reg problem, live_subreg
prbolem will have the following output. full_in/out mean the entire pesudo
live in/out, partial_in/out mean the subregs of the pesudo are live in/out,
and range_in/out indicates which part of the pesudo is live. all_in/out is
the union of full_in/out and partial_in/out:

  bitmap_head all_in, full_in;
  bitmap_head all_out, full_out;
  bitmap_head partial_in;
  bitmap_head partial_out;
  subregs_live *range_in = NULL;
  subregs_live *range_out = NULL;

gcc/ChangeLog:

	* Makefile.in: Add new object file.
	* df-problems.cc (struct df_live_subreg_problem_data):
	The data of the new live_subreg problem.
	(need_track_subreg): New function.
	(get_range): Ditto.
	(remove_subreg_range): Ditto.
	(add_subreg_range): Ditto.
	(df_live_subreg_free_bb_info): Ditto.
	(df_live_subreg_alloc): Ditto.
	(df_live_subreg_reset): Ditto.
	(df_live_subreg_bb_local_compute): Ditto.
	(df_live_subreg_local_compute): Ditto.
	(df_live_subreg_init): Ditto.
	(df_live_subreg_check_result): Ditto.
	(df_live_subreg_confluence_0): Ditto.
	(df_live_subreg_confluence_n): Ditto.
	(df_live_subreg_transfer_function): Ditto.
	(df_live_subreg_finalize): Ditto.
	(df_live_subreg_free): Ditto.
	(df_live_subreg_top_dump): Ditto.
	(df_live_subreg_bottom_dump): Ditto.
	(df_live_subreg_add_problem): Ditto.
	* df.h (enum df_problem_id): Add live_subreg id.
	(DF_LIVE_SUBREG_INFO): Data accessor.
	(DF_LIVE_SUBREG_IN): Ditto.
	(DF_LIVE_SUBREG_OUT): Ditto.
	(DF_LIVE_SUBREG_FULL_IN): Ditto.
	(DF_LIVE_SUBREG_FULL_OUT): Ditto.
	(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
	(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
	(DF_LIVE_SUBREG_RANGE_IN): Ditto.
	(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
	(class subregs_live): New class.
	(class basic_block_subreg_live_info): Ditto.
	(class df_live_subreg_bb_info): Ditto.
	(df_live_subreg): Ditto.
	(df_live_subreg_add_problem): Ditto.
	(df_live_subreg_finalize): Ditto.
	(class subreg_range): Ditto.
	(need_track_subreg): Ditto.
	(remove_subreg_range): Ditto.
	(add_subreg_range): Ditto.
	(df_live_subreg_get_bb_info): Ditto.
	* regs.h (get_nblocks): Helper function.
	* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
	* subreg-live-range.cc: New file.
	* subreg-live-range.h: New file.

---
 gcc/Makefile.in          |   1 +
 gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
 gcc/df.h                 |  67 +++
 gcc/regs.h               |   7 +
 gcc/subreg-live-range.cc | 628 +++++++++++++++++++++++++++
 gcc/subreg-live-range.h  | 333 +++++++++++++++
 gcc/timevar.def          |   1 +
 7 files changed, 1925 insertions(+), 1 deletion(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 29cec21c825..e4403b5a30c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1675,6 +1675,7 @@ OBJS = \
 	store-motion.o \
 	streamer-hooks.o \
 	stringpool.o \
+        subreg-live-range.o \
 	substring-locations.o \
 	target-globals.o \
 	targhooks.o \
diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f..2585c762fd1 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "df.h"
+#include "subreg-live-range.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
@@ -1344,8 +1345,894 @@ df_lr_verify_transfer_functions (void)
   bitmap_clear (&all_blocks);
 }
 
+/*----------------------------------------------------------------------------
+   REGISTER AND SUBREG LIVES
+   Like DF_RL, but fine-grained tracking of subreg lifecycle.
+   ----------------------------------------------------------------------------*/
+
+/* Private data used to verify the solution for this problem.  */
+struct df_live_subreg_problem_data
+{
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack live_subreg_bitmaps;
+  bool has_subreg_live_p;
+};
+
+/* Helper functions */
+
+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+	 && multiple_p (total_size, natural_size)
+	 && regno >= FIRST_PSEUDO_REGISTER;
+}
+
+/* Return subreg_range of REF.  */
+static subreg_range
+get_range (df_ref ref)
+{
+  rtx reg = DF_REF_REAL_REG (ref);
+  machine_mode reg_mode = GET_MODE (reg);
+
+  if (!read_modify_subreg_p (DF_REF_REG (ref)))
+    return subreg_range (0, get_nblocks (reg_mode));
+
+  rtx subreg = DF_REF_REG (ref);
+  machine_mode subreg_mode = GET_MODE (subreg);
+  poly_int64 offset = SUBREG_BYTE (subreg);
+  int nblocks = get_nblocks (reg_mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+  poly_int64 subreg_size = GET_MODE_SIZE (subreg_mode);
+  poly_int64 left = offset + subreg_size;
+
+  int subreg_start = -1;
+  int subreg_nblocks = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nblocks < 0 && maybe_le (left, right))
+	{
+	  subreg_nblocks = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nblocks > 0);
+
+  return subreg_range (subreg_start, subreg_start + subreg_nblocks);
+}
+
+/* Remove REF from BB_INFO use.  */
+void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		     machine_mode mode, const subreg_range &range)
+{
+  int max = get_nblocks (mode);
+  bitmap full = &bb_info->full_use;
+  bitmap partial = &bb_info->partial_use;
+  subregs_live *range_live = bb_info->range_use;
+
+  if (!range.full_p (max))
+    {
+      if (bitmap_bit_p (full, regno))
+	{
+	  bitmap_clear_bit (full, regno);
+	  gcc_assert (!bitmap_bit_p (partial, regno));
+	  gcc_assert (range_live->empty_p (regno));
+	  subreg_ranges temp = subreg_ranges (max);
+	  temp.make_full ();
+	  temp.remove_range (max, range);
+	  range_live->add_ranges (regno, temp);
+	  bitmap_set_bit (partial, regno);
+	  return;
+	}
+      else if (bitmap_bit_p (partial, regno))
+	{
+	  range_live->remove_range (regno, max, range);
+	  if (range_live->empty_p (regno))
+	    bitmap_clear_bit (partial, regno);
+	}
+    }
+  else if (bitmap_bit_p (full, regno))
+    {
+      bitmap_clear_bit (full, regno);
+      gcc_assert (!bitmap_bit_p (partial, regno));
+    }
+  else if (bitmap_bit_p (partial, regno))
+    {
+      bitmap_clear_bit (partial, regno);
+      range_live->remove_live (regno);
+    }
+}
+
+/* Return true if ref is a tracked subreg access.  */
+bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref)
+{
+  unsigned int regno = DF_REF_REGNO (ref);
+  machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+  bool subreg_p = read_modify_subreg_p (DF_REF_REG (ref));
+
+  if (need_track_subreg (regno, mode))
+    {
+      remove_subreg_range (bb_info, regno, mode, get_range (ref));
+      return subreg_p;
+    }
+  else
+    {
+      bitmap_clear_bit (&bb_info->full_use, regno);
+      gcc_assert (!bitmap_bit_p (&bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&bb_info->partial_def, regno));
+      return false;
+    }
+}
+
+/* add REF to BB_INFO def/use.  */
+void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		  machine_mode mode, const subreg_range &range, bool is_def)
+{
+  int max = get_nblocks (mode);
+  bitmap full = is_def ? &bb_info->full_def : &bb_info->full_use;
+  bitmap partial = is_def ? &bb_info->partial_def : &bb_info->partial_use;
+  subregs_live *range_live = is_def ? bb_info->range_def : bb_info->range_use;
+
+  if (!range.full_p (max))
+    {
+      if (bitmap_bit_p (full, regno))
+	return;
+      range_live->add_range (regno, max, range);
+      if (range_live->full_p (regno))
+	{
+	  bitmap_set_bit (full, regno);
+	  gcc_assert (bitmap_bit_p (partial, regno));
+	  bitmap_clear_bit (partial, regno);
+	  range_live->remove_live (regno);
+	}
+      else if (!bitmap_bit_p (partial, regno))
+	bitmap_set_bit (partial, regno);
+    }
+  else if (!bitmap_bit_p (full, regno))
+    {
+      bitmap_set_bit (full, regno);
+      if (bitmap_bit_p (partial, regno))
+	{
+	  bitmap_clear_bit (partial, regno);
+	  range_live->remove_live (regno);
+	}
+    }
+}
+
+/* Return true if ref is a tracked subreg access.  */
+bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+		  bool is_def)
+{
+  unsigned int regno = DF_REF_REGNO (ref);
+  machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+  bool subreg_p = read_modify_subreg_p (DF_REF_REG (ref));
+
+  if (need_track_subreg (regno, mode))
+    {
+      add_subreg_range (bb_info, regno, mode, get_range (ref), is_def);
+      return subreg_p;
+    }
+  else
+    {
+      bitmap full = is_def ? &bb_info->full_def : &bb_info->full_use;
+      bitmap partial = is_def ? &bb_info->partial_def : &bb_info->partial_use;
+      bitmap_set_bit (full, regno);
+      gcc_assert (!bitmap_bit_p (partial, regno));
+
+      if (is_def && DF_REF_FLAGS (ref) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+	add_subreg_range (bb_info, ref, false);
+      return false;
+    }
+}
+
+/* Free basic block info.  */
+
+static void
+df_live_subreg_free_bb_info (basic_block bb ATTRIBUTE_UNUSED, void *vbb_info)
+{
+  df_live_subreg_bb_info *bb_info = (df_live_subreg_bb_info *) vbb_info;
+  if (bb_info)
+    {
+      delete bb_info->range_def;
+      bb_info->range_def = NULL;
+      delete bb_info->range_use;
+      bb_info->range_use = NULL;
+      delete bb_info->range_in;
+      bb_info->range_in = NULL;
+      delete bb_info->range_out;
+      bb_info->range_out = NULL;
+
+      bitmap_clear (&bb_info->full_use);
+      bitmap_clear (&bb_info->partial_use);
+      bitmap_clear (&bb_info->full_def);
+      bitmap_clear (&bb_info->partial_def);
+      bitmap_clear (&bb_info->all_in);
+      bitmap_clear (&bb_info->full_in);
+      bitmap_clear (&bb_info->partial_in);
+      bitmap_clear (&bb_info->all_out);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+    }
+}
+
+/* Allocate or reset bitmaps for DF_LIVE_SUBREG blocks. The solution bits are
+   not touched unless the block is new.  */
+
+static void
+df_live_subreg_alloc (bitmap all_blocks ATTRIBUTE_UNUSED)
+{
+  struct df_live_subreg_problem_data *problem_data;
+  df_grow_bb_info (df_live_subreg);
+  if (df_live_subreg->problem_data)
+    problem_data
+      = (struct df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  else
+    {
+      problem_data = XNEW (struct df_live_subreg_problem_data);
+      df_live_subreg->problem_data = problem_data;
+
+      bitmap_obstack_initialize (&problem_data->live_subreg_bitmaps);
+      problem_data->has_subreg_live_p = false;
+    }
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, bb->index);
+
+  bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, ENTRY_BLOCK);
+  bitmap_set_bit (df_live_subreg->out_of_date_transfer_functions, EXIT_BLOCK);
+
+  unsigned int bb_index;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (df_live_subreg->out_of_date_transfer_functions, 0,
+			    bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+
+      /* When bitmaps are already initialized, just clear them.  */
+      if (bb_info->full_use.obstack)
+	{
+	  bitmap_clear (&bb_info->full_def);
+	  bitmap_clear (&bb_info->partial_def);
+	  bitmap_clear (&bb_info->full_use);
+	  bitmap_clear (&bb_info->partial_use);
+	  bitmap_clear (&bb_info->all_in);
+	  bitmap_clear (&bb_info->full_in);
+	  bitmap_clear (&bb_info->partial_in);
+	  bitmap_clear (&bb_info->all_out);
+	  bitmap_clear (&bb_info->full_out);
+	  bitmap_clear (&bb_info->partial_out);
+	}
+      else
+	{
+	  bitmap_initialize (&bb_info->full_def,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_def,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_use,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_use,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->all_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_in,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->all_out,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->full_out,
+			     &problem_data->live_subreg_bitmaps);
+	  bitmap_initialize (&bb_info->partial_out,
+			     &problem_data->live_subreg_bitmaps);
+	}
+
+      if (bb_info->range_def)
+	{
+	  bb_info->range_def->clear ();
+	  bb_info->range_use->clear ();
+	  bb_info->range_in->clear ();
+	  bb_info->range_out->clear ();
+	}
+      else
+	{
+	  bb_info->range_def = new subregs_live ();
+	  bb_info->range_use = new subregs_live ();
+	  bb_info->range_in = new subregs_live ();
+	  bb_info->range_out = new subregs_live ();
+	}
+    }
+  df_live_subreg->optional_p = true;
+}
+
+/* Reset the global solution for recalculation.  */
+
+static void
+df_live_subreg_reset (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+      gcc_assert (bb_info);
+      bitmap_clear (&bb_info->all_in);
+      bitmap_clear (&bb_info->full_in);
+      bitmap_clear (&bb_info->partial_in);
+      bitmap_clear (&bb_info->all_out);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+      bb_info->range_in->clear ();
+      bb_info->range_out->clear ();
+    }
+}
+
+/* Compute local live register info for basic block BB.  */
+
+static void
+df_live_subreg_bb_local_compute (unsigned int bb_index)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bb_index);
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  rtx_insn *insn;
+  df_ref def, use;
+
+  /* Process the registers set in an exception handler.  */
+  FOR_EACH_ARTIFICIAL_DEF (def, bb_index)
+    if ((DF_REF_FLAGS (def) & DF_REF_AT_TOP) == 0)
+      {
+	problem_data->has_subreg_live_p
+	  |= add_subreg_range (bb_info, def, true);
+	problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+      }
+
+  /* Process the hardware registers that are always live.  */
+  FOR_EACH_ARTIFICIAL_USE (use, bb_index)
+    /* Add use to set of uses in this BB.  */
+    if ((DF_REF_FLAGS (use) & DF_REF_AT_TOP) == 0)
+      problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+
+  FOR_BB_INSNS_REVERSE (bb, insn)
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+      FOR_EACH_INSN_INFO_DEF (def, insn_info)
+	{
+	  problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+	  problem_data->has_subreg_live_p
+	    |= add_subreg_range (bb_info, def, true);
+	}
+
+      FOR_EACH_INSN_INFO_USE (use, insn_info)
+	{
+	  unsigned int regno = DF_REF_REGNO (use);
+	  machine_mode mode = GET_MODE (DF_REF_REAL_REG (use));
+	  /* Ignore the use of SET_DEST which is (subreg (reg) offset).  */
+	  if (need_track_subreg (regno, mode)
+	      && DF_REF_FLAGS (use) & (DF_REF_READ_WRITE | DF_REF_SUBREG))
+	    continue;
+	  problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+	}
+    }
+
+  /* Process the registers set in an exception handler or the hard
+     frame pointer if this block is the target of a non local
+     goto.  */
+  FOR_EACH_ARTIFICIAL_DEF (def, bb_index)
+    if (DF_REF_FLAGS (def) & DF_REF_AT_TOP)
+      {
+	problem_data->has_subreg_live_p
+	  |= add_subreg_range (bb_info, def, true);
+	problem_data->has_subreg_live_p |= remove_subreg_range (bb_info, def);
+      }
+
+#ifdef EH_USES
+  /* Process the uses that are live into an exception handler.  */
+  FOR_EACH_ARTIFICIAL_USE (use, bb_index)
+    /* Add use to set of uses in this BB.  */
+    if (DF_REF_FLAGS (use) & DF_REF_AT_TOP)
+      problem_data->has_subreg_live_p |= add_subreg_range (bb_info, use);
+#endif
+}
+
+/* Compute local live register info for each basic block within BLOCKS.  */
+
+static void
+df_live_subreg_local_compute (bitmap all_blocks ATTRIBUTE_UNUSED)
+{
+  unsigned int bb_index, i;
+  bitmap_iterator bi;
+
+  bitmap_clear (&df->hardware_regs_used);
+
+  /* The all-important stack pointer must always be live.  */
+  bitmap_set_bit (&df->hardware_regs_used, STACK_POINTER_REGNUM);
+
+  /* Global regs are always live, too.  */
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (global_regs[i])
+      bitmap_set_bit (&df->hardware_regs_used, i);
+
+  /* Before reload, there are a few registers that must be forced
+     live everywhere -- which might not already be the case for
+     blocks within infinite loops.  */
+  if (!reload_completed)
+    {
+      unsigned int pic_offset_table_regnum = PIC_OFFSET_TABLE_REGNUM;
+      /* Any reference to any pseudo before reload is a potential
+	 reference of the frame pointer.  */
+      bitmap_set_bit (&df->hardware_regs_used, FRAME_POINTER_REGNUM);
+
+      /* Pseudos with argument area equivalences may require
+	 reloading via the argument pointer.  */
+      if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
+	  && fixed_regs[ARG_POINTER_REGNUM])
+	bitmap_set_bit (&df->hardware_regs_used, ARG_POINTER_REGNUM);
+
+      /* Any constant, or pseudo with constant equivalences, may
+	 require reloading from memory using the pic register.  */
+      if (pic_offset_table_regnum != INVALID_REGNUM
+	  && fixed_regs[pic_offset_table_regnum])
+	bitmap_set_bit (&df->hardware_regs_used, pic_offset_table_regnum);
+    }
+
+  EXECUTE_IF_SET_IN_BITMAP (df_live_subreg->out_of_date_transfer_functions, 0,
+			    bb_index, bi)
+    {
+      if (bb_index == EXIT_BLOCK)
+	{
+	  /* The exit block is special for this problem and its bits are
+	     computed from thin air.  */
+	  class df_live_subreg_bb_info *bb_info
+	    = df_live_subreg_get_bb_info (EXIT_BLOCK);
+	  bitmap_copy (&bb_info->full_use, df->exit_block_uses);
+	}
+      else
+	df_live_subreg_bb_local_compute (bb_index);
+    }
+
+  bitmap_clear (df_live_subreg->out_of_date_transfer_functions);
+}
+
+/* Initialize the solution vectors.  */
+
+static void
+df_live_subreg_init (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+      bitmap_copy (&bb_info->full_in, &bb_info->full_use);
+      bitmap_copy (&bb_info->partial_in, &bb_info->partial_use);
+      bb_info->range_in->copy_lives (*bb_info->range_use);
+      bitmap_clear (&bb_info->full_out);
+      bitmap_clear (&bb_info->partial_out);
+      bb_info->range_out->clear ();
+    }
+}
+
+/* Check the result is golden.  */
+static void
+df_live_subreg_check_result (bitmap full, bitmap partial,
+			     subregs_live *partial_live)
+{
+  unsigned int regno;
+  bitmap_iterator bi;
+  gcc_assert (!bitmap_intersect_p (full, partial));
+  EXECUTE_IF_SET_IN_BITMAP (full, 0, regno, bi)
+    gcc_assert (partial_live->empty_p (regno));
+  EXECUTE_IF_SET_IN_BITMAP (partial, 0, regno, bi)
+    gcc_assert (!partial_live->empty_p (regno));
+}
+
+/* Confluence function that processes infinite loops.  This might be a
+   noreturn function that throws.  And even if it isn't, getting the
+   unwind info right helps debugging.  */
+static void
+df_live_subreg_confluence_0 (basic_block bb)
+{
+  bitmap full_out = &df_live_subreg_get_bb_info (bb->index)->full_out;
+  if (bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
+    bitmap_copy (full_out, &df->hardware_regs_used);
+}
+
+/* Confluence function that ignores fake edges.  */
+
+static bool
+df_live_subreg_confluence_n (edge e)
+{
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  class df_live_subreg_bb_info *src_bb_info
+    = df_live_subreg_get_bb_info (e->src->index);
+  class df_live_subreg_bb_info *dest_bb_info
+    = df_live_subreg_get_bb_info (e->dest->index);
+
+  if (!problem_data->has_subreg_live_p)
+    {
+      bool changed = false;
+
+      /* Call-clobbered registers die across exception and call edges.
+	 Conservatively treat partially-clobbered registers as surviving
+	 across the edges; they might or might not, depending on what
+	 mode they have.  */
+      /* ??? Abnormal call edges ignored for the moment, as this gets
+	 confused by sibling call edges, which crashes reg-stack.  */
+      if (e->flags & EDGE_EH)
+	{
+	  bitmap_view<HARD_REG_SET> eh_kills (eh_edge_abi.full_reg_clobbers ());
+	  changed
+	    = bitmap_ior_and_compl_into (&src_bb_info->full_out,
+					 &dest_bb_info->full_in, eh_kills);
+	}
+      else
+	changed
+	  = bitmap_ior_into (&src_bb_info->full_out, &dest_bb_info->full_in);
+
+      changed
+	|= bitmap_ior_into (&src_bb_info->full_out, &df->hardware_regs_used);
+      return changed;
+    }
+
+  /* If there has subreg live need be tracked. Calculation formula:
+       temp_full mean:
+	 1. partial in out/in, full in other in/out
+	 2. partial in out and in, and mrege range is full
+       temp_range mean:
+	 the range of regno which partial live
+       src_bb_info->partial_out = (src_bb_info->partial_out |
+				   dest_bb_info->partial_in) & ~temp_full
+       src_bb_info->range_out = copy(temp_range)
+       src_bb_info->full_out |= dest_bb_info->full_in | temp_full
+       */
+  subregs_live temp_range;
+  temp_range.add_lives (*src_bb_info->range_out);
+  temp_range.add_lives (*dest_bb_info->range_in);
+
+  bitmap_head temp_partial_all;
+  bitmap_initialize (&temp_partial_all, &bitmap_default_obstack);
+  bitmap_ior (&temp_partial_all, &src_bb_info->partial_out,
+	      &dest_bb_info->partial_in);
+
+  bitmap_head temp_full;
+  bitmap_initialize (&temp_full, &bitmap_default_obstack);
+
+  /* Collect regno that become full after merge src_bb_info->partial_out
+     and dest_bb_info->partial_in.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_all, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      if (bitmap_bit_p (&src_bb_info->full_out, regno)
+	  || bitmap_bit_p (&dest_bb_info->full_in, regno))
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	  continue;
+	}
+      else if (!bitmap_bit_p (&src_bb_info->partial_out, regno)
+	       || !bitmap_bit_p (&dest_bb_info->partial_in, regno))
+	continue;
+
+      subreg_ranges temp = src_bb_info->range_out->lives.at (regno);
+      temp.add_ranges (dest_bb_info->range_in->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	}
+    }
+
+  /* Calculating src_bb_info->partial_out and src_bb_info->range_out.  */
+  bool changed = bitmap_and_compl (&src_bb_info->partial_out, &temp_partial_all,
+				   &temp_full);
+  changed |= src_bb_info->range_out->copy_lives (temp_range);
+
+  /* Calculating src_bb_info->full_out.  */
+  bitmap_ior_into (&temp_full, &dest_bb_info->full_in);
+
+  /* Call-clobbered registers die across exception and call edges.
+     Conservatively treat partially-clobbered registers as surviving
+     across the edges; they might or might not, depending on what
+     mode they have.  */
+  /* ??? Abnormal call edges ignored for the moment, as this gets
+     confused by sibling call edges, which crashes reg-stack.  */
+  if (e->flags & EDGE_EH)
+    {
+      bitmap_view<HARD_REG_SET> eh_kills (eh_edge_abi.full_reg_clobbers ());
+      changed |= bitmap_ior_and_compl_into (&src_bb_info->full_out, &temp_full,
+					    eh_kills);
+    }
+  else
+    changed |= bitmap_ior_into (&src_bb_info->full_out, &temp_full);
+
+  changed |= bitmap_ior_into (&src_bb_info->full_out, &df->hardware_regs_used);
+
+  bitmap_clear (&temp_full);
+  bitmap_clear (&temp_partial_all);
+
+  df_live_subreg_check_result (&src_bb_info->full_out,
+			       &src_bb_info->partial_out,
+			       src_bb_info->range_out);
+  return changed;
+}
+
+/* Transfer function.  */
+
+static bool
+df_live_subreg_transfer_function (int bb_index)
+{
+  class df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb_index);
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  if (!problem_data->has_subreg_live_p)
+    {
+      bitmap in = &bb_info->full_in;
+      bitmap out = &bb_info->full_out;
+      bitmap use = &bb_info->full_use;
+      bitmap def = &bb_info->full_def;
+
+      return bitmap_ior_and_compl (in, use, out, def);
+    }
+
+  /* If there has subreg live need be tracked, follow the bellow calculation
+     formula:
+       all_def = full_def | partial_def
+       temp_partial_out = ((full_out & partail_def)
+			   | (partail_out & ~all_def)
+			   | (partial_out remove partail_def not empty))
+			  & ~full_use
+       temp_partial_be_full = (temp_partial_out & partial_use) merge be full
+       full_in = full_use | full_out & ~all_def | temp_partial_be_full
+       partail_in = (temp_partial_out | partial_use) & ~temp_partial_be_full  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  bool changed = false;
+  bitmap_head temp_partial_out;
+  bitmap_head temp_partial_be_full;
+  bitmap_head all_def;
+  subregs_live temp_range_out;
+  bitmap_initialize (&temp_partial_out, &bitmap_default_obstack);
+  bitmap_initialize (&temp_partial_be_full, &bitmap_default_obstack);
+  bitmap_initialize (&all_def, &bitmap_default_obstack);
+
+  bitmap_ior (&all_def, &bb_info->full_def, &bb_info->partial_def);
+
+  /* temp_partial_out = (full_out & partail_def) */
+  bitmap_and (&temp_partial_out, &bb_info->full_out, &bb_info->partial_def);
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_out, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      subreg_ranges temp (bb_info->range_def->lives.at (regno).max);
+      temp.make_full ();
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      temp_range_out.add_ranges (regno, temp);
+    }
+
+  /* temp_partial_out |= (partail_out & ~all_def) */
+  bitmap_ior_and_compl_into (&temp_partial_out, &bb_info->partial_out,
+			     &all_def);
+  EXECUTE_IF_AND_COMPL_IN_BITMAP (&bb_info->partial_out, &all_def,
+				  FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      temp_range_out.add_ranges (regno, bb_info->range_out->lives.at (regno));
+    }
+
+  /* temp_partial_out |= (partial_out remove partail_def not empty) */
+  EXECUTE_IF_AND_IN_BITMAP (&bb_info->partial_out, &bb_info->partial_def, 0,
+			    regno, bi)
+    {
+      subreg_ranges temp = bb_info->range_out->lives.at (regno);
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      if (!temp.empty_p ())
+	{
+	  bitmap_set_bit (&temp_partial_out, regno);
+	  temp_range_out.add_ranges (regno, temp);
+	}
+    }
+
+  /* temp_partial_out = temp_partial_out & ~full_use */
+  bitmap_and_compl_into (&temp_partial_out, &bb_info->full_use);
+  EXECUTE_IF_SET_IN_BITMAP (&bb_info->full_use, 0, regno, bi)
+    if (!temp_range_out.empty_p (regno))
+      temp_range_out.remove_live (regno);
+
+  /* temp_partial_be_full = (temp_partial_out & partial_use) merge become full
+   */
+  temp_range_out.add_lives (*bb_info->range_use);
+  /* Remove all range which in partial_use and in full_out and not in all_def.
+   */
+  EXECUTE_IF_SET_IN_BITMAP (&bb_info->full_out, 0, regno, bi)
+    if (!bitmap_bit_p (&all_def, regno) && !temp_range_out.empty_p (regno))
+      temp_range_out.remove_live (regno);
+
+  EXECUTE_IF_AND_IN_BITMAP (&temp_partial_out, &bb_info->partial_use, 0, regno,
+			    bi)
+    {
+      subreg_ranges temp = temp_range_out.lives.at (regno);
+      temp.add_ranges (bb_info->range_use->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_partial_be_full, regno);
+	  temp_range_out.remove_live (regno);
+	}
+    }
+
+  /* Calculating full_in.  */
+  bitmap_ior_and_compl_into (&temp_partial_be_full, &bb_info->full_out,
+			     &all_def);
+  changed |= bitmap_ior (&bb_info->full_in, &temp_partial_be_full,
+			 &bb_info->full_use);
+
+  /* Calculating partial_in and range_in.  */
+  bitmap_ior_into (&temp_partial_out, &bb_info->partial_use);
+  changed |= bitmap_and_compl (&bb_info->partial_in, &temp_partial_out,
+			       &temp_partial_be_full);
+  changed |= bb_info->range_in->copy_lives (temp_range_out);
+
+  bitmap_clear (&temp_partial_out);
+  bitmap_clear (&temp_partial_be_full);
+  bitmap_clear (&all_def);
+
+  df_live_subreg_check_result (&bb_info->full_in, &bb_info->partial_in,
+			       bb_info->range_in);
+
+  return changed;
+}
+
+/* Run the fast dce as a side effect of building LR.  */
+
+void
+df_live_subreg_finalize (bitmap all_blocks)
+{
+  unsigned int bb_index;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
+    {
+      class df_live_subreg_bb_info *bb_info
+	= df_live_subreg_get_bb_info (bb_index);
+      gcc_assert (bb_info);
+      bitmap_ior (&bb_info->all_in, &bb_info->full_in, &bb_info->partial_in);
+      bitmap_ior (&bb_info->all_out, &bb_info->full_out, &bb_info->partial_out);
+    }
+}
+
+/* Free all storage associated with the problem.  */
+
+static void
+df_live_subreg_free (void)
+{
+  df_live_subreg_problem_data *problem_data
+    = (df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  if (df_live_subreg->block_info)
+    {
+      df_live_subreg->block_info_size = 0;
+      free (df_live_subreg->block_info);
+      df_live_subreg->block_info = NULL;
+      bitmap_obstack_release (&problem_data->live_subreg_bitmaps);
+      free (df_live_subreg->problem_data);
+      df_live_subreg->problem_data = NULL;
+    }
+
+  BITMAP_FREE (df_live_subreg->out_of_date_transfer_functions);
+  free (df_live_subreg);
+}
+
+/* Debugging info at top of bb.  */
+
+static void
+df_live_subreg_top_dump (basic_block bb, FILE *file)
+{
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb->index);
+  if (!bb_info)
+    return;
+
+  fprintf (file, ";; subreg live all in  \t");
+  df_print_regset (file, &bb_info->all_in);
+  fprintf (file, ";;   subreg live full in  \t");
+  df_print_regset (file, &bb_info->full_in);
+  fprintf (file, ";;   subreg live partial in  \t");
+  df_print_regset (file, &bb_info->partial_in);
+  fprintf (file, ";;   subreg live range in  \t");
+  bb_info->range_in->dump (file, "");
+
+  fprintf (file, "\n;;   subreg live full use  \t");
+  df_print_regset (file, &bb_info->full_use);
+  fprintf (file, ";;   subreg live partial use  \t");
+  df_print_regset (file, &bb_info->partial_use);
+  fprintf (file, ";;   subreg live range use  \t");
+  bb_info->range_use->dump (file, "");
+
+  fprintf (file, "\n;;   subreg live full def  \t");
+  df_print_regset (file, &bb_info->full_def);
+  fprintf (file, ";;   subreg live partial def  \t");
+  df_print_regset (file, &bb_info->partial_def);
+  fprintf (file, ";;   subreg live range def \t");
+  bb_info->range_def->dump (file, "");
+}
+
+/* Debugging info at bottom of bb.  */
+
+static void
+df_live_subreg_bottom_dump (basic_block bb, FILE *file)
+{
+  df_live_subreg_bb_info *bb_info = df_live_subreg_get_bb_info (bb->index);
+  if (!bb_info)
+    return;
+
+  fprintf (file, ";; subreg live all out  \t");
+  df_print_regset (file, &bb_info->all_out);
+  fprintf (file, ";;   subreg live full out  \t");
+  df_print_regset (file, &bb_info->full_out);
+  fprintf (file, ";;   subreg live partial out  \t");
+  df_print_regset (file, &bb_info->partial_out);
+  fprintf (file, ";;   subreg live range out  \t");
+  bb_info->range_out->dump (file, "");
+}
+
+/* All of the information associated with every instance of the problem.  */
+
+static const struct df_problem problem_LIVE_SUBREG = {
+  DF_LIVE_SUBREG,		    /* Problem id.  */
+  DF_BACKWARD,			    /* Direction.  */
+  df_live_subreg_alloc,		    /* Allocate the problem specific data.  */
+  df_live_subreg_reset,		    /* Reset global information.  */
+  df_live_subreg_free_bb_info,	    /* Free basic block info.  */
+  df_live_subreg_local_compute,	    /* Local compute function.  */
+  df_live_subreg_init,		    /* Init the solution specific data.  */
+  df_worklist_dataflow,		    /* Worklist solver.  */
+  df_live_subreg_confluence_0,	    /* Confluence operator 0.  */
+  df_live_subreg_confluence_n,	    /* Confluence operator n.  */
+  df_live_subreg_transfer_function, /* Transfer function.  */
+  df_live_subreg_finalize,	    /* Finalize function.  */
+  df_live_subreg_free,		    /* Free all of the problem information.  */
+  df_live_subreg_free,	      /* Remove this problem from the stack of dataflow
+				 problems.  */
+  NULL,			      /* Debugging.  */
+  df_live_subreg_top_dump,    /* Debugging start block.  */
+  df_live_subreg_bottom_dump, /* Debugging end block.  */
+  NULL,			      /* Debugging start insn.  */
+  NULL,			      /* Debugging end insn.  */
+  NULL,			      /* Incremental solution verify start.  */
+  NULL,			      /* Incremental solution verify end.  */
+  &problem_LR,		      /* Dependent problem.  */
+  sizeof (df_live_subreg_bb_info), /* Size of entry of block_info array. */
+  TV_DF_LIVE_SUBREG,		   /* Timing variable.  */
+  false /* Reset blocks on dropping out of blocks_to_analyze.  */
+};
+
+/* Create a new DATAFLOW instance and add it to an existing instance
+   of DF.  The returned structure is what is used to get at the
+   solution.  */
+
+void
+df_live_subreg_add_problem (void)
+{
+  df_add_problem (&problem_LIVE_SUBREG);
+
+  /* These will be initialized when df_scan_blocks processes each
+     block.  */
+  df_live_subreg->out_of_date_transfer_functions
+    = BITMAP_ALLOC (&df_bitmap_obstack);
+}
 
-\f
 /*----------------------------------------------------------------------------
    LIVE AND MAY-INITIALIZED REGISTERS.
 
diff --git a/gcc/df.h b/gcc/df.h
index 402657a7076..50a6cf99863 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -47,6 +47,7 @@ enum df_problem_id
   {
     DF_SCAN,
     DF_LR,                /* Live Registers backward. */
+    DF_LIVE_SUBREG,       /* Live Ranges and Live Subreg */
     DF_LIVE,              /* Live Registers & Uninitialized Registers */
     DF_RD,                /* Reaching Defs. */
     DF_CHAIN,             /* Def-Use and/or Use-Def Chains. */
@@ -619,6 +620,7 @@ public:
 #define DF_SCAN_BB_INFO(BB) (df_scan_get_bb_info ((BB)->index))
 #define DF_RD_BB_INFO(BB) (df_rd_get_bb_info ((BB)->index))
 #define DF_LR_BB_INFO(BB) (df_lr_get_bb_info ((BB)->index))
+#define DF_LIVE_SUBREG_INFO(BB) (df_live_subreg_get_bb_info ((BB)->index))
 #define DF_LIVE_BB_INFO(BB) (df_live_get_bb_info ((BB)->index))
 #define DF_WORD_LR_BB_INFO(BB) (df_word_lr_get_bb_info ((BB)->index))
 #define DF_MD_BB_INFO(BB) (df_md_get_bb_info ((BB)->index))
@@ -632,6 +634,15 @@ public:
 #define DF_MIR_IN(BB) (&DF_MIR_BB_INFO (BB)->in)
 #define DF_MIR_OUT(BB) (&DF_MIR_BB_INFO (BB)->out)
 
+#define DF_LIVE_SUBREG_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->all_in)
+#define DF_LIVE_SUBREG_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->all_out)
+#define DF_LIVE_SUBREG_FULL_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->full_in)
+#define DF_LIVE_SUBREG_FULL_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->full_out)
+#define DF_LIVE_SUBREG_PARTIAL_IN(BB) (&DF_LIVE_SUBREG_INFO (BB)->partial_in)
+#define DF_LIVE_SUBREG_PARTIAL_OUT(BB) (&DF_LIVE_SUBREG_INFO (BB)->partial_out)
+#define DF_LIVE_SUBREG_RANGE_IN(BB) (DF_LIVE_SUBREG_INFO (BB)->range_in)
+#define DF_LIVE_SUBREG_RANGE_OUT(BB) (DF_LIVE_SUBREG_INFO (BB)->range_out)
+
 /* These macros are used by passes that are not tolerant of
    uninitialized variables.  This intolerance should eventually
    be fixed.  */
@@ -878,6 +889,32 @@ public:
   bitmap_head out;   /* At the bottom of the block.  */
 };
 
+class subregs_live;
+
+class basic_block_subreg_live_info
+{
+public:
+  bitmap_head full_def;
+  bitmap_head full_use;
+  /* Only for pseudo registers.  */
+  bitmap_head partial_def;
+  bitmap_head partial_use;
+  subregs_live *range_def = NULL;
+  subregs_live *range_use = NULL;
+};
+
+/* Live registers and live ranges including specifial subreg.  */
+class df_live_subreg_bb_info : public basic_block_subreg_live_info
+{
+public:
+  bitmap_head all_in, full_in;
+  bitmap_head all_out, full_out;
+  /* Only for pseudo registers.  */
+  bitmap_head partial_in;
+  bitmap_head partial_out;
+  subregs_live *range_in = NULL;
+  subregs_live *range_out = NULL;
+};
 
 /* Uninitialized registers.  All bitmaps are referenced by the
    register number.  Anded results of the forwards and backward live
@@ -946,6 +983,7 @@ extern class df_d *df;
 #define df_note    (df->problems_by_index[DF_NOTE])
 #define df_md      (df->problems_by_index[DF_MD])
 #define df_mir     (df->problems_by_index[DF_MIR])
+#define df_live_subreg (df->problems_by_index[DF_LIVE_SUBREG])
 
 /* This symbol turns on checking that each modification of the cfg has
   been identified to the appropriate df routines.  It is not part of
@@ -1031,6 +1069,25 @@ extern void df_lr_add_problem (void);
 extern void df_lr_verify_transfer_functions (void);
 extern void df_live_verify_transfer_functions (void);
 extern void df_live_add_problem (void);
+extern void
+df_live_subreg_add_problem (void);
+extern void
+df_live_subreg_finalize (bitmap all_blocks);
+class subreg_range;
+extern bool
+need_track_subreg (int regno, machine_mode mode);
+extern void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		     machine_mode mode, const subreg_range &range);
+extern bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref);
+extern void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+		  machine_mode mode, const subreg_range &range,
+		  bool is_def = false);
+extern bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+		  bool is_def = false);
 extern void df_live_set_all_dirty (void);
 extern void df_chain_add_problem (unsigned int);
 extern void df_word_lr_add_problem (void);
@@ -1124,6 +1181,16 @@ df_lr_get_bb_info (unsigned int index)
     return NULL;
 }
 
+inline class df_live_subreg_bb_info *
+df_live_subreg_get_bb_info (unsigned int index)
+{
+  if (index < df_live_subreg->block_info_size)
+    return &(
+      (class df_live_subreg_bb_info *) df_live_subreg->block_info)[index];
+  else
+    return NULL;
+}
+
 inline class df_md_bb_info *
 df_md_get_bb_info (unsigned int index)
 {
diff --git a/gcc/regs.h b/gcc/regs.h
index aea093ed795..84c6bdb980c 100644
--- a/gcc/regs.h
+++ b/gcc/regs.h
@@ -389,4 +389,11 @@ range_in_hard_reg_set_p (const_hard_reg_set set, unsigned regno, int nregs)
   return true;
 }
 
+/* Return the number of blocks the MODE overlap. One block equal mode's natural
+   size. So, satisfy the following equation:
+     (nblocks - 1) * natural_size < GET_MODE_SIZE (mode)
+       <= nblocks * natural_size. */
+#define get_nblocks(mode)                                                      \
+  (exact_div (GET_MODE_SIZE (mode), REGMODE_NATURAL_SIZE (mode)).to_constant ())
+
 #endif /* GCC_REGS_H */
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 00000000000..43a5eafedf1
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,628 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.ding@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+#include "selftest.h"
+#include "print-rtl.h"
+
+/* class subreg_range */
+void
+subreg_range::dump (FILE *file) const
+{
+  fprintf (file, "[%d, %d)", start, end);
+}
+
+/* class subreg_ranges */
+bool
+subreg_ranges::add_range (int max, const subreg_range &new_range)
+{
+  subreg_range range = new_range;
+  if (full_p ())
+    return false;
+  else if (max == 1)
+    {
+      gcc_assert (range.start == 0 && range.end == 1);
+      make_full ();
+      return true;
+    }
+
+  if (this->max == 1)
+    change_max (max);
+
+  gcc_assert (this->max == max);
+  gcc_assert (range.start < range.end);
+
+  bool changed = empty_p ();
+  auto it = ranges.begin ();
+  while (it != ranges.end ())
+    {
+      const subreg_range &r = *it;
+      gcc_assert (r.start < r.end);
+
+      /* The possible positional relationship of R and RANGE.
+	 1~5 means R.start's possible position relative to RANGE
+	 A~G means R.end's possible position relative to RANGE
+	 caseN means when R.start at N positon, the R.end can be in which
+	 positions.
+
+		     RANGE.start     RANGE.end
+			  [               )
+			  |               |
+	R.start   1       2       3       4       5
+	R.end             |               |
+	  case1       A   B       C       D       E
+	  case2           |       C       D       E
+	  case3           |           F   D       E
+	  case4           |               |       E
+	  case5           |               |               G
+
+	*/
+
+      /* R.start at 1 position.   */
+      if (r.start < range.start)
+	{
+	  /* R.end at A position. That means R and RANGE do not overlap.  */
+	  if (r.end < range.start)
+	    it++;
+	  /* R.end at B/C position. That means RANGE's left part overlap R's
+	     right part. Expand RANGE.start to R.start and remove R.  */
+	  else if (r.end < range.end)
+	    {
+	      changed = true;
+	      range.start = r.start;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at D/E position. That means R already contains RANGE, nothing
+	     todo.  */
+	  else
+	    return false;
+	}
+      /* R.start at 2 position.  */
+      else if (r.start == range.start)
+	{
+	  /* R.end at C/D position. That means RANGE contains R, remove R and
+	     insert RANGE.  */
+	  if (r.end < range.end)
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R already contains RANGE, nothing
+	     todo.  */
+	  else
+	    return false;
+	}
+      /* R.start at 3 position.  */
+      else if (r.start > range.start && r.start < range.end)
+	{
+	  /* R.end at F/D position. That means RANGE contains R, just remove R
+	     and insert RANGE later.  */
+	  if (r.end <= range.end)
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position.  That means RANGE's right part overlap R's
+	     left part. Expand RANGE.end to R.end and remove R.  */
+	  else if (r.end > range.end)
+	    {
+	      changed = true;
+	      range.end = r.end;
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 4 position and R.end at E position. That means RANGE and R
+	 are adjacent and can be merged. */
+      else if (r.start == range.end)
+	{
+	  changed = true;
+	  range.end = r.end;
+	  it = ranges.erase (it);
+	}
+      /* R.start at 5 position and R.end at G position. That means R and RANGE
+	 do not overlap.  */
+      else
+	break;
+    }
+  ranges.insert (range);
+  return changed;
+}
+
+bool
+subreg_ranges::remove_range (int max, const subreg_range &range)
+{
+  if (empty_p ())
+    return false;
+  else if (max == 1)
+    {
+      gcc_assert (range.start == 0 && range.end == 1);
+      make_empty ();
+      return true;
+    }
+
+  if (this->max == 1)
+    {
+      gcc_assert (full_p ());
+      change_max (max);
+    }
+  gcc_assert (this->max == max);
+  gcc_assert (range.start < range.end);
+
+  bool changed = false;
+  auto it = ranges.begin ();
+  std::set<subreg_range> new_ranges;
+  while (it != ranges.end ())
+    {
+      auto &r = *it;
+      gcc_assert (r.start < r.end);
+
+      /* The possible positional relationship of R and RANGE.
+	 1~5 means R.start's possible position relative to RANGE
+	 A~G means R.end's possible position relative to RANGE
+	 caseN means when R.start at N positon, the R.end can be in which
+	 positions.
+
+		     RANGE.start     RANGE.end
+			  [               )
+			  |               |
+	R.start   1       2       3       4       5
+	R.end             |               |
+	  case1       A   B       C       D       E
+	  case2           |       C       D       E
+	  case3           |           F   D       E
+	  case4           |               |       E
+	  case5           |               |               G
+
+	*/
+
+      /* R.start at 1 position.  */
+      if (r.start < range.start)
+	{
+	  /* R.end at A/B position. That means RANGE and R do not overlap,
+	     nothing to remove.  */
+	  if (r.end <= range.start)
+	    it++;
+	  /* R.end at C/D position. That means R's rigth part contains RANGE,
+	     need shrink R.end to RANGE.start.  */
+	  else if (r.end <= range.end)
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (r.start, range.start));
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R's center part contains RANGE,
+	     need split R to two range, one range is [R.start, RANGE.start),
+	     another range is [RANGE.end, R.end).  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (r.start, range.start));
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 2 position.  */
+      else if (r.start == range.start)
+	{
+	  /* R.end at C/D position. That means RANGE contains R, remove R.  */
+	  if (r.end <= range.end)
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means R's left part contains RANGE,
+	     shrink R.start to RANGE.end.  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 3 position. */
+      else if (r.start > range.start && r.start < range.end)
+	{
+	  /* R.end at F/D position. That means RANGE contains R, remove R.  */
+	  if (r.end <= range.end)
+	    {
+	      changed = true;
+	      it = ranges.erase (it);
+	    }
+	  /* R.end at E position. That means RANGE's right part overlap R's left
+	     part, shrink R.start to RANGE.end.  */
+	  else
+	    {
+	      changed = true;
+	      new_ranges.insert (subreg_range (range.end, r.end));
+	      it = ranges.erase (it);
+	      break;
+	    }
+	}
+      /* R.start at 4/5 position. That means RANGE and R do not overlap.  */
+      else
+	break;
+    }
+  for (auto &r : new_ranges)
+    add_range (this->max, r);
+  return changed;
+}
+
+bool
+subreg_ranges::add_ranges (const subreg_ranges &sr)
+{
+  gcc_assert (max == sr.max || max == 1 || sr.max == 1);
+
+  if (full_p () || sr.empty_p ())
+    return false;
+  else if (sr.full_p ())
+    {
+      make_full ();
+      return true;
+    }
+
+  bool changed = false;
+  for (auto &r : sr.ranges)
+    changed |= add_range (sr.max, r);
+  return changed;
+}
+
+bool
+subreg_ranges::remove_ranges (const subreg_ranges &sr)
+{
+  if (empty_p () || sr.empty_p ())
+    return false;
+  else if (sr.full_p ())
+    {
+      make_empty ();
+      return true;
+    }
+
+  gcc_assert (max == sr.max || max == 1 || sr.max == 1);
+
+  bool changed = false;
+  for (auto &r : sr.ranges)
+    changed |= remove_range (sr.max, r);
+  return changed;
+}
+
+bool
+subreg_ranges::same_p (const subreg_ranges &sr) const
+{
+  if (max == 1 || sr.max == 1)
+    return (empty_p () && sr.empty_p ()) || (full_p () && sr.full_p ());
+  else if (max == sr.max)
+    {
+      if (ranges.size () != sr.ranges.size ())
+	return false;
+      /* Make sure that the elements in each position are the same.  */
+      auto it1 = ranges.begin ();
+      auto it2 = sr.ranges.begin ();
+      while (it1 != ranges.end ())
+	{
+	  const subreg_range &r1 = *it1;
+	  const subreg_range &r2 = *it2;
+	  if (r1.start != r2.start || r1.end != r2.end)
+	    return false;
+	  it1++;
+	  it2++;
+	}
+      return true;
+    }
+  else
+    gcc_unreachable ();
+}
+
+bool
+subreg_ranges::include_ranges_p (const subreg_ranges &sr) const
+{
+  gcc_assert (max == sr.max || max == 1 || sr.max == 1);
+  if (full_p ())
+    return true;
+  if (empty_p () && sr.empty_p ())
+    return true;
+  if (same_p (sr))
+    return true;
+
+  for (const auto &r : sr.ranges)
+    if (!include_range_p (sr.max, r))
+      return false;
+  return true;
+}
+
+bool
+subreg_ranges::include_range_p (int max, const subreg_range &range) const
+{
+  gcc_assert (this->max == max);
+  for (const auto &r : ranges)
+    {
+      if (r.start <= range.start && r.end >= range.end)
+	return true;
+      else if (r.start >= range.end)
+	return false;
+    }
+  return false;
+}
+
+void
+subreg_ranges::dump (FILE *file) const
+{
+  if (empty_p ())
+    {
+      fprintf (file, "empty");
+      return;
+    }
+  else if (full_p ())
+    {
+      fprintf (file, "full");
+      return;
+    }
+
+  fprintf (file, "patial(max:%d", max);
+  fprintf (file, " {");
+  for (auto &range : ranges)
+    {
+      fprintf (file, " ");
+      range.dump (file);
+    }
+  fprintf (file, " })");
+}
+
+/* class subregs_live */
+bool
+subregs_live::copy_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  subregs_live temp;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (lives.count (regno) == 0 && !sr.empty_p ())
+	{
+	  changed = true;
+	  temp.add_ranges (regno, sr);
+	}
+      else if (lives.count (regno) != 0)
+	{
+	  changed |= !lives.at (regno).same_p (sr);
+	  temp.add_ranges (regno, sr);
+	}
+    }
+
+  for (auto &kv : lives)
+    {
+      unsigned int regno = kv.first;
+      subreg_ranges &sr = kv.second;
+      if (temp.lives.count (regno) == 0 && !sr.empty_p ())
+	changed = true;
+    }
+  lives = temp.lives;
+  return changed;
+}
+
+bool
+subregs_live::add_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+
+      if (lives.count (regno) == 0)
+	{
+	  changed = true;
+	  lives.insert ({regno, sr});
+	}
+      else
+	changed |= lives.at (regno).add_ranges (sr);
+    }
+  return changed;
+}
+
+bool
+subregs_live::remove_lives (const subregs_live &sl)
+{
+  bool changed = false;
+  for (auto &kv : sl.lives)
+    {
+      unsigned int regno = kv.first;
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+
+      if (lives.count (regno) != 0)
+	{
+	  changed |= lives.at (regno).remove_ranges (sr);
+	  if (lives.at (regno).empty_p ())
+	    lives.erase (regno);
+	}
+    }
+  return changed;
+}
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+    {
+      fprintf (file, "%sempty\n", indent);
+      return;
+    }
+  fprintf (file, "%s", indent);
+  for (auto &kv : lives)
+    {
+      const subreg_ranges &sr = kv.second;
+      if (sr.empty_p ())
+	continue;
+      fprintf (file, "%d ", kv.first);
+      if (!sr.full_p ())
+	{
+	  sr.dump (file);
+	  fprintf (file, "  ");
+	}
+    }
+  fprintf (file, "\n");
+}
+
+/* class live_point */
+void
+live_point::dump (FILE *file) const
+{
+  if (!use_reg.empty_p ())
+    {
+      fprintf (file, "use ");
+      use_reg.dump (file);
+      if (!def_reg.empty_p ())
+	{
+	  fprintf (file, ", def ");
+	  def_reg.dump (file);
+	}
+    }
+  else if (!def_reg.empty_p ())
+    {
+      fprintf (file, "def ");
+      def_reg.dump (file);
+    }
+  else
+    gcc_unreachable ();
+}
+
+/* class live_points */
+void
+live_points::dump (FILE *file) const
+{
+  fprintf (file, "%u :", id);
+  if (points.empty ())
+    {
+      fprintf (file, " empty");
+      return;
+    }
+  for (const auto &kv : points)
+    {
+      fprintf (file, " ");
+      kv.second.dump (file);
+      fprintf (file, " at point %u;", kv.first);
+    }
+}
+
+/* class reg_live_ranges */
+void
+subregs_live_points::dump (FILE *file) const
+{
+  if (subreg_points.empty ())
+    {
+      fprintf (file, ";;     empty\n");
+      return;
+    }
+  for (const auto &kv : subreg_points)
+    {
+      fprintf (file, ";;     ");
+      kv.second.dump (file);
+      fprintf (file, "\n");
+    }
+}
+
+/* Define some usefull debug functions.  */
+
+DEBUG_FUNCTION void
+debug (const subreg_range &r)
+{
+  r.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subreg_ranges &sr)
+{
+  sr.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live &l)
+{
+  l.dump (stderr, "");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live *l)
+{
+  debug (*l);
+}
+
+DEBUG_FUNCTION void
+debug (const live_point &l)
+{
+  l.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const live_points &ls)
+{
+  ls.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live_points &sls)
+{
+  sls.dump (stderr);
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live_points *sls)
+{
+  debug (*sls);
+}
+
+#if CHECKING_P
+
+namespace selftest {
+
+class subreg_range_tests
+{
+public:
+  static void run ()
+  {
+    /* class subreg_range tests.  */
+    subreg_range r1 = subreg_range (1, 2);
+    subreg_range r2 = subreg_range (2, 3);
+    subreg_range r3 = subreg_range (2, 3);
+    ASSERT_FALSE (r1.same_p (r2));
+    ASSERT_TRUE (r2.same_p (r3));
+    ASSERT_TRUE (r1 < r2);
+    ASSERT_FALSE (r2 < r1);
+
+    /* class subreg_ranges tests.  */
+  }
+};
+
+void
+subreg_live_range_tests ()
+{
+  subreg_range_tests::run ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 00000000000..76e442d08e8
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,333 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.ding@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SUBREG_LIVE_RANGE_H
+#define GCC_SUBREG_LIVE_RANGE_H
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include <set>
+#include <map>
+
+/* class subreg_range represent bytes range [start, end) of a reg.  */
+class subreg_range
+{
+public:
+  int start; /* Range start point.  */
+  int end;   /* Range start point.  */
+
+  subreg_range (int start, int end) : start (start), end (end)
+  {
+    gcc_assert (start < end);
+  }
+
+  /* For sorting.  */
+  bool operator<(const subreg_range &r) const
+  {
+    if (end <= r.start)
+      return true;
+    else if (start >= r.end)
+      return false;
+    else
+      /* Cannot sorting for overlap range.  */
+      gcc_unreachable ();
+  }
+  /* Return true if R same with self.  */
+  bool same_p (const subreg_range &r) const
+  {
+    return start == r.start && end == r.end;
+  }
+
+  /* Return true if range is full for 0-MAX range.  */
+  bool full_p (int max) const { return start == 0 && end == max; }
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+/* class subreg_ranges represent multiple disjoint and discontinuous
+   subreg_range.  */
+class subreg_ranges
+{
+public:
+  /* The maximum boundary value of range. If for a unknown mode hard register,
+     the max set to 1.  */
+  int max;
+  std::set<subreg_range> ranges;
+
+  subreg_ranges () : max (1) {}
+  subreg_ranges (int max) : max (max) { gcc_assert (max >= 1); }
+
+  /* Modify ranges.  */
+  /* Return true if ranges changed.  */
+  bool add_range (int max, const subreg_range &range);
+  /* Return true if ranges changed.  */
+  bool remove_range (int max, const subreg_range &range);
+  /* Add SR, return true if ranges changed.  */
+  bool add_ranges (const subreg_ranges &sr);
+  /* Clear ranges of SR, return true if ranges changed.  */
+  bool remove_ranges (const subreg_ranges &sr);
+  /* Make range empty.  */
+  void make_empty () { ranges.clear (); }
+  /* Make range full.  */
+  void make_full ()
+  {
+    make_empty ();
+    ranges.insert (subreg_range (0, max));
+  }
+  /* Change max to MAX, corresponding adjust ranges.  */
+  void change_max (int max)
+  {
+    gcc_assert (this->max == 1);
+    this->max = max;
+    if (full_p ())
+      make_full ();
+  }
+
+  /* Predicators.  */
+  bool full_p () const
+  {
+    if (ranges.size () != 1)
+      return false;
+    const subreg_range &r = *ranges.begin ();
+    return r.start == 0 && r.end == max;
+  }
+  bool empty_p () const { return ranges.empty (); }
+  bool same_p (const subreg_ranges &sr) const;
+  bool same_p (int max, const subreg_range &range) const
+  {
+    subreg_ranges sr = subreg_ranges (max);
+    sr.add_range (max, range);
+    return same_p (sr);
+  }
+  bool include_ranges_p (const subreg_ranges &sr) const;
+  bool include_range_p (int max, const subreg_range &range) const;
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+/* class subregs_live record the live subreg_ranges of registers.  */
+class subregs_live
+{
+public:
+  /* The key is usually the register's regno.  */
+  std::map<unsigned int, subreg_ranges> lives;
+
+  /* Add/clear live range.  */
+  bool add_range (unsigned int regno, int max, const subreg_range &range)
+  {
+    if (lives.count (regno) == 0)
+      lives.insert ({regno, subreg_ranges (max)});
+    return lives.at (regno).add_range (max, range);
+  }
+  bool remove_range (unsigned int regno, int max, const subreg_range &range)
+  {
+    if (lives.count (regno) != 0)
+      {
+	bool changed = lives.at (regno).remove_range (max, range);
+	if (lives.at (regno).empty_p ())
+	  remove_live (regno);
+	return changed;
+      }
+    return false;
+  }
+  /* Add a unexist register live range.  */
+  void add_ranges (unsigned int regno, const subreg_ranges &ranges)
+  {
+    if (lives.count (regno) == 0)
+      lives.insert ({regno, ranges});
+    else
+      lives.at (regno).add_ranges (ranges);
+  }
+  bool copy_lives (const subregs_live &sl);
+  bool add_lives (const subregs_live &sl);
+  bool remove_lives (const subregs_live &sl);
+  void remove_live (unsigned int regno) { lives.erase (regno); }
+  /* Remove all register live range.  */
+  void clear () { lives.clear (); }
+  void clear (unsigned min_regno)
+  {
+    if (lives.lower_bound (min_regno) != lives.end ())
+      lives.erase (lives.lower_bound (min_regno), lives.end ());
+  }
+
+  /* Return true if regno's live range is full.  */
+  bool full_p (unsigned int regno) const
+  {
+    return lives.count (regno) != 0 && lives.at (regno).full_p ();
+  }
+  /* Return true if regno's live range is empty.  */
+  bool empty_p (unsigned int regno) const
+  {
+    return lives.count (regno) == 0 || lives.at (regno).empty_p ();
+  }
+  /* Return true if SL same with this.  */
+  bool same_p (const subregs_live &sl)
+  {
+    if (lives.size () != sl.lives.size ())
+      return false;
+    for (auto &kv : lives)
+      {
+	unsigned int regno = kv.first;
+	if (sl.empty_p (regno))
+	  return false;
+	const subreg_ranges &sr = kv.second;
+	if (!sr.same_p (sl.lives.at (regno)))
+	  return false;
+      }
+    return true;
+  }
+
+  /* Debug methods.  */
+  void dump (FILE *file, const char *indent = ";;     ") const;
+};
+
+class live_point
+{
+public:
+  int point;
+  /* subreg range be defined in current point.  */
+  subreg_ranges def_reg;
+  /* subreg range be used in current point.  */
+  subreg_ranges use_reg;
+
+  live_point (int max, const subreg_range &range, bool is_def)
+    : def_reg (max), use_reg (max)
+  {
+    add_range (max, range, is_def);
+  }
+  live_point (const subreg_ranges &sr, bool is_def)
+    : def_reg (sr.max), use_reg (sr.max)
+  {
+    add_ranges (sr, is_def);
+  }
+  live_point (int point, int max) : point (point), def_reg (max), use_reg (max)
+  {}
+
+  void add_range (int max, const subreg_range &r, bool is_def)
+  {
+    if (is_def)
+      def_reg.add_range (max, r);
+    else
+      use_reg.add_range (max, r);
+  }
+
+  void add_ranges (const subreg_ranges &sr, bool is_def)
+  {
+    if (is_def)
+      def_reg.add_ranges (sr);
+    else
+      use_reg.add_ranges (sr);
+  }
+
+  void dump (FILE *file) const;
+};
+
+class live_points
+{
+public:
+  int id;
+  int max;
+  std::map<int, live_point> points;
+
+  live_points (int id, int max) : id (id), max (max) {}
+
+  void add_point (int max, const subreg_range &range, bool is_def, int point)
+  {
+    gcc_assert (this->max == max || this->max == 1 || max == 1);
+    if (points.count (point) == 0)
+      points.insert ({point, {max, range, is_def}});
+    else
+      points.at (point).add_range (max, range, is_def);
+  }
+  void dump (FILE *file) const;
+};
+
+class subregs_live_points
+{
+public:
+  std::map<int, live_points> subreg_points;
+  std::map<int, int> last_start_points;
+  std::map<int, subreg_ranges> subreg_live_ranges;
+
+  void add_point (int id, int max, const subreg_range &range, bool is_def,
+		  int point)
+  {
+    if (!is_def && empty_live_p (id))
+      {
+	if (last_start_points.count (id) == 0)
+	  last_start_points.insert ({id, point});
+	else
+	  last_start_points.at (id) = point;
+      }
+
+    if (subreg_points.count (id) == 0)
+      subreg_points.insert ({id, live_points (id, max)});
+
+    subreg_points.at (id).add_point (max, range, is_def, point);
+
+    if (subreg_live_ranges.count (id) == 0)
+      subreg_live_ranges.insert ({id, subreg_ranges (max)});
+
+    if (is_def)
+      subreg_live_ranges.at (id).remove_range (max, range);
+    else
+      subreg_live_ranges.at (id).add_range (max, range);
+  }
+
+  void add_range (int id, int max, const subreg_range &range, bool is_def)
+  {
+    if (subreg_live_ranges.count (id) == 0)
+      subreg_live_ranges.insert ({id, subreg_ranges (max)});
+
+    if (is_def)
+      subreg_live_ranges.at (id).remove_range (max, range);
+    else
+      subreg_live_ranges.at (id).add_range (max, range);
+  }
+
+  bool full_live_p (int id)
+  {
+    return subreg_live_ranges.count (id) != 0
+	   && subreg_live_ranges.at (id).full_p ();
+  }
+
+  bool empty_live_p (int id)
+  {
+    return subreg_live_ranges.count (id) == 0
+	   || subreg_live_ranges.at (id).empty_p ();
+  }
+
+  int get_start_point (int id)
+  {
+    int start_point = last_start_points.at (id);
+    gcc_assert (start_point != -1);
+    return start_point;
+  }
+
+  void clear_live_ranges () { subreg_live_ranges.clear (); }
+
+  /* Debug methods.  */
+  void dump (FILE *file) const;
+};
+
+#endif /* GCC_SUBREG_LIVE_RANGE_H */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index d21b08c030d..7c173d3c7c8 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -120,6 +120,7 @@ DEFTIMEVAR (TV_DF_SCAN		     , "df scan insns")
 DEFTIMEVAR (TV_DF_MD		     , "df multiple defs")
 DEFTIMEVAR (TV_DF_RD		     , "df reaching defs")
 DEFTIMEVAR (TV_DF_LR		     , "df live regs")
+DEFTIMEVAR (TV_DF_LIVE_SUBREG	     , "df live subregs")
 DEFTIMEVAR (TV_DF_LIVE		     , "df live&initialized regs")
 DEFTIMEVAR (TV_DF_MIR		     , "df must-initialized regs")
 DEFTIMEVAR (TV_DF_CHAIN		     , "df use-def / def-use chains")
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 2/7] ira: Switch to live_subreg data
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 3/7] ira: Support subreg live range track Lehua Ding
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch switch the use of live_reg data to live_subreg data.

gcc/ChangeLog:

	* ira-build.cc (create_bb_allocnos): Switch.
	(create_loop_allocnos): Ditto.
	* ira-color.cc (ira_loop_edge_freq): Ditto.
	* ira-emit.cc (generate_edge_moves): Ditto.
	(add_ranges_and_copies): Ditto.
	* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
	(add_conflict_from_region_landing_pads): Ditto.
	(process_bb_node_lives): Ditto.
	* ira.cc (find_moveable_pseudos): Ditto.
	(interesting_dest_for_shprep_1): Ditto.
	(allocate_initial_values): Ditto.
	(ira): Ditto.

---
 gcc/ira-build.cc |  7 ++++---
 gcc/ira-color.cc |  8 ++++----
 gcc/ira-emit.cc  | 12 ++++++------
 gcc/ira-lives.cc |  7 ++++---
 gcc/ira.cc       | 16 +++++++++-------
 5 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..f931c6e304c 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1919,7 +1919,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
       create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
      another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+			     i, bi)
     if (ira_curr_regno_allocno_map[i] == NULL)
       ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1935,9 +1936,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = DF_LIVE_SUBREG_IN (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_OUT (e->src),
 			     FIRST_PSEUDO_REGISTER, i, bi)
     if (bitmap_bit_p (live_in_regs, i))
       {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..4aa3e316282 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2783,8 +2783,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p)
       FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
 	if (e->src != loop_node->loop->latch
 	    && (regno < 0
-		|| (bitmap_bit_p (df_get_live_out (e->src), regno)
-		    && bitmap_bit_p (df_get_live_in (e->dest), regno))))
+		|| (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+		    && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno))))
 	  freq += EDGE_FREQUENCY (e);
     }
   else
@@ -2792,8 +2792,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p)
       auto_vec<edge> edges = get_loop_exit_edges (loop_node->loop);
       FOR_EACH_VEC_ELT (edges, i, e)
 	if (regno < 0
-	    || (bitmap_bit_p (df_get_live_out (e->src), regno)
-		&& bitmap_bit_p (df_get_live_in (e->dest), regno)))
+	    || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+		&& bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno)))
 	  freq += EDGE_FREQUENCY (e);
     }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index bcc4f09f7c4..84ed482e568 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
     return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = DF_LIVE_SUBREG_IN (e->dest);
+  regs_live_out_src = DF_LIVE_SUBREG_OUT (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 			     FIRST_PSEUDO_REGISTER, regno, bi)
     if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 	 destination block) to use for searching allocnos by their
 	 regnos because of subsequent IR flattening.  */
       node = IRA_BB_NODE (bb)->parent;
-      bitmap_copy (live_through, df_get_live_in (bb));
+      bitmap_copy (live_through, DF_LIVE_SUBREG_IN (bb));
       add_range_and_copies_from_move_list
 	(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-      bitmap_copy (live_through, df_get_live_out (bb));
+      bitmap_copy (live_through, DF_LIVE_SUBREG_OUT (bb));
       add_range_and_copies_from_move_list
 	(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
       FOR_EACH_EDGE (e, ei, bb->succs)
 	{
-	  bitmap_and (live_through,
-		      df_get_live_in (e->dest), df_get_live_out (bb));
+	  bitmap_and (live_through, DF_LIVE_SUBREG_IN (e->dest),
+		      DF_LIVE_SUBREG_OUT (bb));
 	  add_range_and_copies_from_move_list
 	    ((move_t) e->aux, node, live_through,
 	     REG_FREQ_FROM_EDGE_FREQ (EDGE_FREQUENCY (e)));
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 81af5c06460..05e2be12a26 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -1194,7 +1194,8 @@ process_out_of_region_eh_regs (basic_block bb)
   if (! eh_p)
     return;
 
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_out (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_OUT (bb), FIRST_PSEUDO_REGISTER, i,
+			    bi)
     {
       ira_allocno_t a = ira_curr_regno_allocno_map[i];
       for (int n = ALLOCNO_NUM_OBJECTS (a) - 1; n >= 0; n--)
@@ -1228,7 +1229,7 @@ add_conflict_from_region_landing_pads (eh_region region, ira_object_t obj,
       if ((landing_label = lp->landing_pad) != NULL
 	  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
 	  && (region->type != ERT_CLEANUP
-	      || bitmap_bit_p (df_get_live_in (landing_bb),
+	      || bitmap_bit_p (DF_LIVE_SUBREG_IN (landing_bb),
 			       ALLOCNO_REGNO (a))))
 	{
 	  HARD_REG_SET new_conflict_regs
@@ -1265,7 +1266,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	  high_pressure_start_point[ira_pressure_classes[i]] = -1;
 	}
       curr_bb_node = loop_tree_node;
-      reg_live_out = df_get_live_out (bb);
+      reg_live_out = DF_LIVE_SUBREG_OUT (bb);
       sparseset_clear (objects_live);
       REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
       hard_regs_live &= ~(eliminable_regset | ira_no_alloc_regs);
diff --git a/gcc/ira.cc b/gcc/ira.cc
index d7530f01380..c7f27b17002 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -4735,8 +4735,8 @@ find_moveable_pseudos (void)
       bitmap_initialize (local, 0);
       bitmap_initialize (transp, 0);
       bitmap_initialize (moveable, 0);
-      bitmap_copy (live, df_get_live_out (bb));
-      bitmap_and_into (live, df_get_live_in (bb));
+      bitmap_copy (live, DF_LIVE_SUBREG_OUT (bb));
+      bitmap_and_into (live, DF_LIVE_SUBREG_IN (bb));
       bitmap_copy (transp, live);
       bitmap_clear (moveable);
       bitmap_clear (live);
@@ -5036,7 +5036,8 @@ interesting_dest_for_shprep_1 (rtx set, basic_block call_dom)
   rtx dest = SET_DEST (set);
   if (!REG_P (src) || !HARD_REGISTER_P (src)
       || !REG_P (dest) || HARD_REGISTER_P (dest)
-      || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest))))
+      || (call_dom
+	  && !bitmap_bit_p (DF_LIVE_SUBREG_IN (call_dom), REGNO (dest))))
     return NULL;
   return dest;
 }
@@ -5514,10 +5515,10 @@ allocate_initial_values (void)
 		  /* Update global register liveness information.  */
 		  FOR_EACH_BB_FN (bb, cfun)
 		    {
-		      if (REGNO_REG_SET_P (df_get_live_in (bb), regno))
-			SET_REGNO_REG_SET (df_get_live_in (bb), new_regno);
-		      if (REGNO_REG_SET_P (df_get_live_out (bb), regno))
-			SET_REGNO_REG_SET (df_get_live_out (bb), new_regno);
+		      if (REGNO_REG_SET_P (DF_LIVE_SUBREG_IN (bb), regno))
+			SET_REGNO_REG_SET (DF_LIVE_SUBREG_IN (bb), new_regno);
+		      if (REGNO_REG_SET_P (DF_LIVE_SUBREG_OUT (bb), regno))
+			SET_REGNO_REG_SET (DF_LIVE_SUBREG_OUT (bb), new_regno);
 		    }
 		}
 	    }
@@ -5679,6 +5680,7 @@ ira (FILE *f)
   if (optimize > 1)
     df_remove_problem (df_live);
   gcc_checking_assert (df_live == NULL);
+  df_live_subreg_add_problem ();
 
   if (flag_checking)
     df->changeable_flags |= DF_VERIFY_SCHEDULED;
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 3/7] ira: Support subreg live range track
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 2/7] ira: Switch to live_subreg data Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 4/7] ira: Support subreg copy Lehua Ding
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch supports tracking subreg liveness. It first extends
ira_object_t objects[2] to std::vector<ira_object_t> objects,
which can hold more than one object, and is used to collect all
access via subreg in program and the partial_in and partial_out
of the basic block live in/out.

Then there is a modification to the way conflicts between registers
are detected, for example, if a object conflicts with b object, then
the offset and size of the object relative to the allocno it belongs
to need to be taken into account to compute the conflict registers
between allocno and allocno.

gcc/ChangeLog:

	* hard-reg-set.h (struct HARD_REG_SET): New shift operator.
	* ira-build.cc (ira_create_object): Adjust.
	(find_object): New.
	(find_object_anyway): New.
	(ira_create_allocno): Adjust.
	(get_range): New.
	(ira_copy_allocno_objects): New.
	(merge_hard_reg_conflicts): Adjust copy.
	(create_cap_allocno): Adjust.
	(find_subreg_p): New.
	(add_subregs): New.
	(create_insn_allocnos): Collect subreg.
	(create_bb_allocnos): Ditto.
	(move_allocno_live_ranges): Adjust.
	(copy_allocno_live_ranges): Adjust.
	(setup_min_max_allocno_live_range_point): Adjust.
	* ira-color.cc (INCLUDE_MAP): include map.
	(setup_left_conflict_sizes_p): Adjust conflict size.
	(setup_profitable_hard_regs): Adjust.
	(get_conflict_and_start_profitable_regs): Adjust.
	(check_hard_reg_p): Adjust conflict check.
	(assign_hard_reg): Adjust.
	(push_allocno_to_stack): Adjust conflict size.
	(improve_allocation): Adjust.
	* ira-conflicts.cc (record_object_conflict): Simplify.
	(build_object_conflicts): Adjust.
	(build_conflicts): Adjust.
	(print_allocno_conflicts): Adjust.
	* ira-emit.cc (modify_move_list): Adjust.
	* ira-int.h (struct ira_object): Adjust struct.
	(struct ira_allocno): Adjust struct.
	(ALLOCNO_NUM_OBJECTS): New accessor.
	(ALLOCNO_UNIT_SIZE): Ditto.
	(ALLOCNO_TRACK_SUBREG_P): Ditto.
	(ALLOCNO_NREGS): Ditto.
	(OBJECT_SUBWORD): Ditto.
	(OBJECT_INDEX): Ditto.
	(OBJECT_START): Ditto.
	(OBJECT_NREGS): Ditto.
	(find_object): Exported.
	(find_object_anyway): Ditto.
	(ira_copy_allocno_objects): Ditto.
	(has_subreg_object_p): Ditto.
	(get_full_object): Ditto.
	* ira-lives.cc (INCLUDE_VECTOR): Include vector.
	(add_onflict_hard_regs): New.
	(add_onflict_hard_reg): New.
	(make_hard_regno_dead): Adjust.
	(make_object_live): Adjust.
	(update_allocno_pressure_excess_length): Adjust.
	(make_object_dead): Adjust.
	(mark_pseudo_regno_live): Adjust.
	(add_subreg_point): New.
	(mark_pseudo_object_live): Adjust.
	(mark_pseudo_regno_subword_live): Adjust.
	(mark_pseudo_regno_subreg_live): Adjust.
	(mark_pseudo_regno_subregs_live): Adjust.
	(mark_pseudo_reg_live): Adjust.
	(mark_pseudo_regno_dead): Adjust.
	(mark_pseudo_object_dead): Adjust.
	(mark_pseudo_regno_subword_dead): Adjust.
	(mark_pseudo_regno_subreg_dead): Adjust.
	(mark_pseudo_reg_dead): Adjust.
	(process_single_reg_class_operands): Adjust.
	(process_out_of_region_eh_regs): Adjust.
	(add_conflict_from_region_landing_pads): Adjust.
	(process_bb_node_lives): Adjust.
	(class subreg_live_item): New class.
	(create_subregs_live_ranges): New function.
	(ira_create_allocno_live_ranges): Adjust.
	* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h   |  33 +++
 gcc/ira-build.cc     | 235 +++++++++++++++++---
 gcc/ira-color.cc     | 302 +++++++++++++++++---------
 gcc/ira-conflicts.cc |  48 ++---
 gcc/ira-emit.cc      |   2 +-
 gcc/ira-int.h        |  57 ++++-
 gcc/ira-lives.cc     | 500 ++++++++++++++++++++++++++++++++-----------
 gcc/ira.cc           |  52 ++---
 8 files changed, 907 insertions(+), 322 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
     return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+    if (shift_amount == 0)
+      return *this;
+
+    HARD_REG_SET res;
+    unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+    if (shift_amount >= total_bits)
+      {
+	unsigned int n_elt = shift_amount % total_bits;
+	shift_amount -= n_elt * total_bits;
+	for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+	  res.elts[i] = elts[i + n_elt];
+	/* clear upper n_elt elements.  */
+	for (unsigned int i = 0; i < n_elt; i += 1)
+	  res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+      }
+
+    if (shift_amount > 0)
+      {
+	/* The left bits of an element be shifted.  */
+	HARD_REG_ELT_TYPE left = 0;
+	/* Total bits of an element.  */
+	for (int i = ARRAY_SIZE (elts); i >= 0; --i)
+	  {
+	    res.elts[i] = (elts[i] >> shift_amount) | left;
+	    left = elts[i] << (total_bits - shift_amount);
+	  }
+      }
+    return res;
+  }
+
   HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
 };
 typedef const HARD_REG_SET &const_hard_reg_set;
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index f931c6e304c..a32693e69e4 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -29,10 +29,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-config.h"
 #include "regs.h"
 #include "memmodel.h"
+#include "tm_p.h"
 #include "ira.h"
 #include "ira-int.h"
 #include "sparseset.h"
 #include "cfgloop.h"
+#include "subreg-live-range.h"
 
 static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
 				     ira_loop_tree_node_t);
@@ -442,13 +444,12 @@ initiate_allocnos (void)
 
 /* Create and return an object corresponding to a new allocno A.  */
 static ira_object_t
-ira_create_object (ira_allocno_t a, int subword)
+ira_create_object (ira_allocno_t a, int start, int nregs)
 {
   enum reg_class aclass = ALLOCNO_CLASS (a);
   ira_object_t obj = object_pool.allocate ();
 
   OBJECT_ALLOCNO (obj) = a;
-  OBJECT_SUBWORD (obj) = subword;
   OBJECT_CONFLICT_ID (obj) = ira_objects_num;
   OBJECT_CONFLICT_VEC_P (obj) = false;
   OBJECT_CONFLICT_ARRAY (obj) = NULL;
@@ -460,12 +461,75 @@ ira_create_object (ira_allocno_t a, int subword)
   OBJECT_MIN (obj) = INT_MAX;
   OBJECT_MAX (obj) = -1;
   OBJECT_LIVE_RANGES (obj) = NULL;
+  OBJECT_START (obj) = start;
+  OBJECT_NREGS (obj) = nregs;
+  OBJECT_INDEX (obj) = ALLOCNO_NUM_OBJECTS (a);
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
     = ira_object_id_map_vec.address ();
   ira_objects_num = ira_object_id_map_vec.length ();
 
+  a->objects.push_back (obj);
+
+  return obj;
+}
+
+/* Return the object in allocno A which match START & NREGS.  */
+ira_object_t
+find_object (ira_allocno_t a, int start, int nregs)
+{
+  for (ira_object_t obj : a->objects)
+    {
+      if (OBJECT_START (obj) == start && OBJECT_NREGS (obj) == nregs)
+	return obj;
+    }
+  return NULL;
+}
+
+ira_object_t
+find_object (ira_allocno_t a, poly_int64 offset, poly_int64 size)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+
+  if (!has_subreg_object_p (a)
+      || maybe_eq (GET_MODE_SIZE (ALLOCNO_MODE (a)), size))
+    return find_object (a, 0, nregs);
+
+  gcc_assert (maybe_lt (size, GET_MODE_SIZE (ALLOCNO_MODE (a)))
+	      && maybe_le (offset + size, GET_MODE_SIZE (ALLOCNO_MODE (a))));
+
+  int subreg_start = -1;
+  int subreg_nregs = -1;
+  for (int i = 0; i < nregs; i += 1)
+    {
+      poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	{
+	  subreg_start = i;
+	}
+      if (subreg_nregs < 0 && maybe_le (offset + size, right))
+	{
+	  subreg_nregs = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nregs > 0);
+  return find_object (a, subreg_start, subreg_nregs);
+}
+
+/* Return the object in allocno A which match START & NREGS.  Create when not
+   found.  */
+ira_object_t
+find_object_anyway (ira_allocno_t a, int start, int nregs)
+{
+  ira_object_t obj = find_object (a, start, nregs);
+  if (obj == NULL && ALLOCNO_TRACK_SUBREG_P (a))
+    obj = ira_create_object (a, start, nregs);
+
+  gcc_assert (obj != NULL);
   return obj;
 }
 
@@ -525,7 +589,8 @@ ira_create_allocno (int regno, bool cap_p,
   ALLOCNO_MEMORY_COST (a) = 0;
   ALLOCNO_UPDATED_MEMORY_COST (a) = 0;
   ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) = 0;
-  ALLOCNO_NUM_OBJECTS (a) = 0;
+  ALLOCNO_UNIT_SIZE (a) = 0;
+  ALLOCNO_TRACK_SUBREG_P (a) = false;
 
   ALLOCNO_ADD_DATA (a) = NULL;
   allocno_vec.safe_push (a);
@@ -549,6 +614,51 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
       OBJECT_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
       OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= ~reg_class_contents[aclass];
     }
+
+  if (aclass == NO_REGS)
+    return;
+  /* SET the unit_size of one register.  */
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+    {
+      ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+      ALLOCNO_TRACK_SUBREG_P (a) = true;
+      return;
+    }
+}
+
+/* Return the subreg range of rtx SUBREG.  */
+static subreg_range
+get_range (rtx subreg)
+{
+  gcc_assert (read_modify_subreg_p (subreg));
+  rtx reg = SUBREG_REG (subreg);
+  machine_mode reg_mode = GET_MODE (reg);
+
+  machine_mode subreg_mode = GET_MODE (subreg);
+  int nblocks = get_nblocks (reg_mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+
+  poly_int64 offset = SUBREG_BYTE (subreg);
+  poly_int64 left = offset + GET_MODE_SIZE (subreg_mode);
+
+  int subreg_start = -1;
+  int subreg_nblocks = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nblocks < 0 && maybe_le (left, right))
+	{
+	  subreg_nblocks = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nblocks > 0);
+
+  return subreg_range (subreg_start, subreg_start + subreg_nblocks);
 }
 
 /* Determine the number of objects we should associate with allocno A
@@ -558,15 +668,37 @@ ira_create_allocno_objects (ira_allocno_t a)
 {
   machine_mode mode = ALLOCNO_MODE (a);
   enum reg_class aclass = ALLOCNO_CLASS (a);
-  int n = ira_reg_class_max_nregs[aclass][mode];
-  int i;
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
 
-  if (n != 2 || maybe_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD))
-    n = 1;
+  ira_create_object (a, 0, nregs);
 
-  ALLOCNO_NUM_OBJECTS (a) = n;
-  for (i = 0; i < n; i++)
-    ALLOCNO_OBJECT (a, i) = ira_create_object (a, i);
+  if (aclass == NO_REGS || !ALLOCNO_TRACK_SUBREG_P (a) || a->subregs.empty ())
+    return;
+
+  int nblocks = get_nblocks (ALLOCNO_MODE (a));
+  int times = nblocks / ALLOCNO_NREGS (a);
+  gcc_assert (times >= 1 && nblocks % ALLOCNO_NREGS (a) == 0);
+
+  for (const auto &range : a->subregs)
+    {
+      int start = range.start / times;
+      int end = CEIL (range.end, times);
+      if (find_object (a, start, end - start) != NULL)
+	continue;
+      ira_create_object (a, start, end - start);
+    }
+
+  a->subregs.clear ();
+}
+
+/* Copy the objects from FROM to TO.  */
+void
+ira_copy_allocno_objects (ira_allocno_t to, ira_allocno_t from)
+{
+  ira_allocno_object_iterator oi;
+  ira_object_t obj;
+  FOR_EACH_ALLOCNO_OBJECT (from, obj, oi)
+    ira_create_object (to, OBJECT_START (obj), OBJECT_NREGS (obj));
 }
 
 /* For each allocno, set ALLOCNO_NUM_OBJECTS and create the
@@ -590,11 +722,11 @@ merge_hard_reg_conflicts (ira_allocno_t from, ira_allocno_t to,
 			  bool total_only)
 {
   int i;
-  gcc_assert (ALLOCNO_NUM_OBJECTS (to) == ALLOCNO_NUM_OBJECTS (from));
-  for (i = 0; i < ALLOCNO_NUM_OBJECTS (to); i++)
+  for (i = 0; i < ALLOCNO_NUM_OBJECTS (from); i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
 
       if (!total_only)
 	OBJECT_CONFLICT_HARD_REGS (to_obj)
@@ -888,7 +1020,7 @@ create_cap_allocno (ira_allocno_t a)
   ALLOCNO_WMODE (cap) = ALLOCNO_WMODE (a);
   aclass = ALLOCNO_CLASS (a);
   ira_set_allocno_class (cap, aclass);
-  ira_create_allocno_objects (cap);
+  ira_copy_allocno_objects (cap, a);
   ALLOCNO_CAP_MEMBER (cap) = a;
   ALLOCNO_CAP (a) = cap;
   ALLOCNO_CLASS_COST (cap) = ALLOCNO_CLASS_COST (a);
@@ -1830,6 +1962,26 @@ ira_traverse_loop_tree (bool bb_p, ira_loop_tree_node_t loop_node,
 /* The basic block currently being processed.  */
 static basic_block curr_bb;
 
+/* Return true if A's subregs has a subreg with same SIZE and OFFSET.  */
+static bool
+find_subreg_p (ira_allocno_t a, const subreg_range &r)
+{
+  for (const auto &item : a->subregs)
+    if (item.start == r.start && item.end == r.end)
+      return true;
+  return false;
+}
+
+/* Return start and nregs subregs from DF_LIVE_SUBREG.  */
+static void
+add_subregs (ira_allocno_t a, const subreg_ranges &sr)
+{
+  gcc_assert (get_nblocks (ALLOCNO_MODE (a)) == (unsigned) sr.max);
+  for (const subreg_range &r : sr.ranges)
+    if (!find_subreg_p (a, r))
+      a->subregs.push_back (r);
+}
+
 /* This recursive function creates allocnos corresponding to
    pseudo-registers containing in X.  True OUTPUT_P means that X is
    an lvalue.  OUTER corresponds to the parent expression of X.  */
@@ -1859,6 +2011,14 @@ create_insn_allocnos (rtx x, rtx outer, bool output_p)
 		}
 	    }
 
+	  /* Collect subreg reference.  */
+	  if (outer != NULL && read_modify_subreg_p (outer))
+	    {
+	      const subreg_range r = get_range (outer);
+	      if (!find_subreg_p (a, r))
+		a->subregs.push_back (r);
+	    }
+
 	  ALLOCNO_NREFS (a)++;
 	  ALLOCNO_FREQ (a) += REG_FREQ_FROM_BB (curr_bb);
 	  if (output_p)
@@ -1919,10 +2079,28 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
       create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
      another.  */
-  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_FULL_IN (bb), FIRST_PSEUDO_REGISTER,
 			     i, bi)
     if (ira_curr_regno_allocno_map[i] == NULL)
       ira_create_allocno (i, false, ira_curr_loop_tree_node);
+
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_PARTIAL_IN (bb),
+			     FIRST_PSEUDO_REGISTER, i, bi)
+    {
+      if (ira_curr_regno_allocno_map[i] == NULL)
+	ira_create_allocno (i, false, ira_curr_loop_tree_node);
+      add_subregs (ira_curr_regno_allocno_map[i],
+		   DF_LIVE_SUBREG_RANGE_IN (bb)->lives.at (i));
+    }
+
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+			     FIRST_PSEUDO_REGISTER, i, bi)
+    {
+      if (ira_curr_regno_allocno_map[i] == NULL)
+	ira_create_allocno (i, false, ira_curr_loop_tree_node);
+      add_subregs (ira_curr_regno_allocno_map[i],
+		   DF_LIVE_SUBREG_RANGE_OUT (bb)->lives.at (i));
+    }
 }
 
 /* Create allocnos corresponding to pseudo-registers living on edge E
@@ -2137,20 +2315,20 @@ move_allocno_live_ranges (ira_allocno_t from, ira_allocno_t to)
   int i;
   int n = ALLOCNO_NUM_OBJECTS (from);
 
-  gcc_assert (n == ALLOCNO_NUM_OBJECTS (to));
-
   for (i = 0; i < n; i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
       live_range_t lr = OBJECT_LIVE_RANGES (from_obj);
 
       if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
 	{
 	  fprintf (ira_dump_file,
-		   "      Moving ranges of a%dr%d to a%dr%d: ",
+		   "      Moving ranges of a%dr%d_obj%d to a%dr%d_obj%d: ",
 		   ALLOCNO_NUM (from), ALLOCNO_REGNO (from),
-		   ALLOCNO_NUM (to), ALLOCNO_REGNO (to));
+		   OBJECT_INDEX (from_obj), ALLOCNO_NUM (to),
+		   ALLOCNO_REGNO (to), OBJECT_INDEX (to_obj));
 	  ira_print_live_range_list (ira_dump_file, lr);
 	}
       change_object_in_range_list (lr, to_obj);
@@ -2166,12 +2344,11 @@ copy_allocno_live_ranges (ira_allocno_t from, ira_allocno_t to)
   int i;
   int n = ALLOCNO_NUM_OBJECTS (from);
 
-  gcc_assert (n == ALLOCNO_NUM_OBJECTS (to));
-
   for (i = 0; i < n; i++)
     {
       ira_object_t from_obj = ALLOCNO_OBJECT (from, i);
-      ira_object_t to_obj = ALLOCNO_OBJECT (to, i);
+      ira_object_t to_obj = find_object_anyway (to, OBJECT_START (from_obj),
+						OBJECT_NREGS (from_obj));
       live_range_t lr = OBJECT_LIVE_RANGES (from_obj);
 
       if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
@@ -2783,15 +2960,17 @@ setup_min_max_allocno_live_range_point (void)
 		ira_assert (OBJECT_LIVE_RANGES (obj) == NULL);
 		OBJECT_MAX (obj) = 0;
 		OBJECT_MIN (obj) = 1;
-		continue;
 	      }
 	    ira_assert (ALLOCNO_CAP_MEMBER (a) == NULL);
 	    /* Accumulation of range info.  */
 	    if (ALLOCNO_CAP (a) != NULL)
 	      {
-		for (cap = ALLOCNO_CAP (a); cap != NULL; cap = ALLOCNO_CAP (cap))
+		for (cap = ALLOCNO_CAP (a); cap != NULL;
+		     cap = ALLOCNO_CAP (cap))
 		  {
-		    ira_object_t cap_obj = ALLOCNO_OBJECT (cap, j);
+		    ira_object_t cap_obj = find_object (cap, OBJECT_START (obj),
+							OBJECT_NREGS (obj));
+		    gcc_assert (cap_obj != NULL);
 		    if (OBJECT_MAX (cap_obj) < OBJECT_MAX (obj))
 		      OBJECT_MAX (cap_obj) = OBJECT_MAX (obj);
 		    if (OBJECT_MIN (cap_obj) > OBJECT_MIN (obj))
@@ -2802,7 +2981,9 @@ setup_min_max_allocno_live_range_point (void)
 	    if ((parent = ALLOCNO_LOOP_TREE_NODE (a)->parent) == NULL)
 	      continue;
 	    parent_a = parent->regno_allocno_map[i];
-	    parent_obj = ALLOCNO_OBJECT (parent_a, j);
+	    parent_obj
+	      = find_object (parent_a, OBJECT_START (obj), OBJECT_NREGS (obj));
+	    gcc_assert (parent_obj != NULL);
 	    if (OBJECT_MAX (parent_obj) < OBJECT_MAX (obj))
 	      OBJECT_MAX (parent_obj) = OBJECT_MAX (obj);
 	    if (OBJECT_MIN (parent_obj) > OBJECT_MIN (obj))
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 4aa3e316282..8aed25144b9 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_MAP
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -852,18 +853,17 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
   node_preorder_num = node->preorder_num;
   node_set = node->hard_regs->set;
   node_check_tick++;
+  /* Collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
   for (k = 0; k < nobj; k++)
     {
       ira_object_t obj = ALLOCNO_OBJECT (a, k);
       ira_object_t conflict_obj;
       ira_object_conflict_iterator oci;
-      
+
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	{
-	  int size;
- 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
-	  allocno_hard_regs_node_t conflict_node, temp_node;
-	  HARD_REG_SET conflict_node_set;
+	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
 	  allocno_color_data_t conflict_data;
 
 	  conflict_data = ALLOCNO_COLOR_DATA (conflict_a);
@@ -872,6 +872,24 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
 					     conflict_data
 					     ->profitable_hard_regs))
 	    continue;
+	  int num = ALLOCNO_NUM (conflict_a);
+	  if (allocno_conflict_regs.count (num) == 0)
+	    allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+	  bitmap_head temp;
+	  bitmap_initialize (&temp, &reg_obstack);
+	  bitmap_set_range (&temp, OBJECT_START (conflict_obj),
+			    OBJECT_NREGS (conflict_obj));
+	  bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+	  int size = bitmap_count_bits (&temp);
+	  bitmap_clear (&temp);
+	  if (size == 0)
+	    continue;
+
+	  bitmap_set_range (allocno_conflict_regs.at (num),
+			    OBJECT_START (conflict_obj),
+			    OBJECT_NREGS (conflict_obj));
+	  allocno_hard_regs_node_t conflict_node, temp_node;
+	  HARD_REG_SET conflict_node_set;
 	  conflict_node = conflict_data->hard_regs_node;
 	  conflict_node_set = conflict_node->hard_regs->set;
 	  if (hard_reg_set_subset_p (node_set, conflict_node_set))
@@ -886,14 +904,13 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
 	      temp_node->check = node_check_tick;
 	      temp_node->conflict_size = 0;
 	    }
-	  size = (ira_reg_class_max_nregs
-		  [ALLOCNO_CLASS (conflict_a)][ALLOCNO_MODE (conflict_a)]);
-	  if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1)
-	    /* We will deal with the subwords individually.  */
-	    size = 1;
 	  temp_node->conflict_size += size;
 	}
     }
+  /* Setup conflict nregs of ALLOCNO.  */
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
+
   for (i = 0; i < data->hard_regs_subnodes_num; i++)
     {
       allocno_hard_regs_node_t temp_node;
@@ -1031,7 +1048,7 @@ static void
 setup_profitable_hard_regs (void)
 {
   unsigned int i;
-  int j, k, nobj, hard_regno, nregs, class_size;
+  int j, k, nobj, hard_regno, class_size;
   ira_allocno_t a;
   bitmap_iterator bi;
   enum reg_class aclass;
@@ -1076,7 +1093,6 @@ setup_profitable_hard_regs (void)
 	  || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
 	continue;
       mode = ALLOCNO_MODE (a);
-      nregs = hard_regno_nregs (hard_regno, mode);
       nobj = ALLOCNO_NUM_OBJECTS (a);
       for (k = 0; k < nobj; k++)
 	{
@@ -1088,24 +1104,39 @@ setup_profitable_hard_regs (void)
 	    {
 	      ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
 
-	      /* We can process the conflict allocno repeatedly with
-		 the same result.  */
-	      if (nregs == nobj && nregs > 1)
+	      if (!has_subreg_object_p (a))
 		{
-		  int num = OBJECT_SUBWORD (conflict_obj);
-		  
-		  if (REG_WORDS_BIG_ENDIAN)
-		    CLEAR_HARD_REG_BIT
-		      (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
-		       hard_regno + nobj - num - 1);
-		  else
-		    CLEAR_HARD_REG_BIT
-		      (ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
-		       hard_regno + num);
+		  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
+		    &= ~ira_reg_mode_hard_regset[hard_regno][mode];
+		  continue;
+		}
+
+	      /* Clear all hard regs occupied by obj.  */
+	      if (REG_WORDS_BIG_ENDIAN)
+		{
+		  int start_regno
+		    = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj);
+		  for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
+		    {
+		      int regno = start_regno - i;
+		      if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			CLEAR_HARD_REG_BIT (
+			  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
+			  regno);
+		    }
 		}
 	      else
-		ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs
-		  &= ~ira_reg_mode_hard_regset[hard_regno][mode];
+		{
+		  int start_regno = hard_regno + OBJECT_START (obj);
+		  for (int i = 0; i < OBJECT_NREGS (obj); i += 1)
+		    {
+		      int regno = start_regno + i;
+		      if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			CLEAR_HARD_REG_BIT (
+			  ALLOCNO_COLOR_DATA (conflict_a)->profitable_hard_regs,
+			  regno);
+		    }
+		}
 	    }
 	}
     }
@@ -1677,18 +1708,25 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
    aligned.  */
 static inline void
 get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
-					HARD_REG_SET *conflict_regs,
+					HARD_REG_SET *start_conflict_regs,
 					HARD_REG_SET *start_profitable_regs)
 {
   int i, nwords;
   ira_object_t obj;
 
   nwords = ALLOCNO_NUM_OBJECTS (a);
-  for (i = 0; i < nwords; i++)
-    {
-      obj = ALLOCNO_OBJECT (a, i);
-      conflict_regs[i] = OBJECT_TOTAL_CONFLICT_HARD_REGS (obj);
-    }
+  CLEAR_HARD_REG_SET (*start_conflict_regs);
+  if (has_subreg_object_p (a))
+    for (i = 0; i < nwords; i++)
+      {
+	obj = ALLOCNO_OBJECT (a, i);
+	for (int j = 0; j < OBJECT_NREGS (obj); j += 1)
+	  *start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
+				  >> (OBJECT_START (obj) + j);
+      }
+  else
+    *start_conflict_regs
+      = OBJECT_TOTAL_CONFLICT_HARD_REGS (get_full_object (a));
   if (retry_p)
     *start_profitable_regs
       = (reg_class_contents[ALLOCNO_CLASS (a)]
@@ -1702,9 +1740,9 @@ get_conflict_and_start_profitable_regs (ira_allocno_t a, bool retry_p,
    PROFITABLE_REGS and whose objects have CONFLICT_REGS.  */
 static inline bool
 check_hard_reg_p (ira_allocno_t a, int hard_regno,
-		  HARD_REG_SET *conflict_regs, HARD_REG_SET profitable_regs)
+		  HARD_REG_SET start_conflict_regs,
+		  HARD_REG_SET profitable_regs)
 {
-  int j, nwords, nregs;
   enum reg_class aclass;
   machine_mode mode;
 
@@ -1716,28 +1754,17 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
   /* Checking only profitable hard regs.  */
   if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
     return false;
-  nregs = hard_regno_nregs (hard_regno, mode);
-  nwords = ALLOCNO_NUM_OBJECTS (a);
-  for (j = 0; j < nregs; j++)
+
+  if (has_subreg_object_p (a))
+    return !TEST_HARD_REG_BIT (start_conflict_regs, hard_regno);
+  else
     {
-      int k;
-      int set_to_test_start = 0, set_to_test_end = nwords;
-      
-      if (nregs == nwords)
-	{
-	  if (REG_WORDS_BIG_ENDIAN)
-	    set_to_test_start = nwords - j - 1;
-	  else
-	    set_to_test_start = j;
-	  set_to_test_end = set_to_test_start + 1;
-	}
-      for (k = set_to_test_start; k < set_to_test_end; k++)
-	if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j))
-	  break;
-      if (k != set_to_test_end)
-	break;
+      int nregs = hard_regno_nregs (hard_regno, mode);
+      for (int i = 0; i < nregs; i += 1)
+	if (TEST_HARD_REG_BIT (start_conflict_regs, hard_regno + i))
+	  return false;
+      return true;
     }
-  return j == nregs;
 }
 
 /* Return number of registers needed to be saved and restored at
@@ -1945,7 +1972,7 @@ spill_soft_conflicts (ira_allocno_t a, bitmap allocnos_to_spill,
 static bool
 assign_hard_reg (ira_allocno_t a, bool retry_p)
 {
-  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
+  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
   int i, j, hard_regno, best_hard_regno, class_size;
   int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word;
   int *a_costs;
@@ -1962,8 +1989,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
   HARD_REG_SET soft_conflict_regs = {};
 
   ira_assert (! ALLOCNO_ASSIGNED_P (a));
-  get_conflict_and_start_profitable_regs (a, retry_p,
-					  conflicting_regs,
+  get_conflict_and_start_profitable_regs (a, retry_p, &start_conflicting_regs,
 					  &profitable_hard_regs);
   aclass = ALLOCNO_CLASS (a);
   class_size = ira_class_hard_regs_num[aclass];
@@ -2041,7 +2067,6 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 		      (hard_regno, ALLOCNO_MODE (conflict_a),
 		       reg_class_contents[aclass])))
 		{
-		  int n_objects = ALLOCNO_NUM_OBJECTS (conflict_a);
 		  int conflict_nregs;
 
 		  mode = ALLOCNO_MODE (conflict_a);
@@ -2076,24 +2101,95 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 			    note_conflict (r);
 			}
 		    }
+		  else if (has_subreg_object_p (a))
+		    {
+		      /* Set start_conflicting_regs if that cause obj and
+			 conflict_obj overlap. the overlap position:
+					   +--------------+
+					   | conflict_obj |
+					   +--------------+
+
+			       +-----------+              +-----------+
+			       |   obj     |     ...      |   obj     |
+			       +-----------+              +-----------+
+
+			Point: A                  B       C
+
+			the hard regs from A to C point will cause overlap.
+			For REG_WORDS_BIG_ENDIAN:
+			   A = hard_regno + ALLOCNO_NREGS (conflict_a) - 1
+			       - OBJECT_START (conflict_obj)
+			       - OBJECT_NREGS (obj) + 1
+			   C = A + OBJECT_NREGS (obj)
+			       + OBJECT_NREGS (conflict_obj) - 2
+			For !REG_WORDS_BIG_ENDIAN:
+			   A = hard_regno + OBJECT_START (conflict_obj)
+			       - OBJECT_NREGS (obj) + 1
+			   C = A + OBJECT_NREGS (obj)
+			       + OBJECT_NREGS (conflict_obj) - 2
+			 */
+		      int start_regno;
+		      int conflict_allocno_nregs, conflict_object_nregs,
+			conflict_object_start;
+		      if (has_subreg_object_p (conflict_a))
+			{
+			  conflict_allocno_nregs = ALLOCNO_NREGS (conflict_a);
+			  conflict_object_nregs = OBJECT_NREGS (conflict_obj);
+			  conflict_object_start = OBJECT_START (conflict_obj);
+			}
+		      else
+			{
+			  conflict_allocno_nregs = conflict_object_nregs
+			    = hard_regno_nregs (hard_regno, mode);
+			  conflict_object_start = 0;
+			}
+		      if (REG_WORDS_BIG_ENDIAN)
+			{
+			  int A = hard_regno + conflict_allocno_nregs - 1
+				  - conflict_object_start - OBJECT_NREGS (obj)
+				  + 1;
+			  start_regno = A + OBJECT_NREGS (obj) - 1
+					+ OBJECT_START (obj) - ALLOCNO_NREGS (a)
+					+ 1;
+			}
+		      else
+			{
+			  int A = hard_regno + conflict_object_start
+				  - OBJECT_NREGS (obj) + 1;
+			  start_regno = A - OBJECT_START (obj);
+			}
+
+		      for (int i = 0;
+			   i <= OBJECT_NREGS (obj) + conflict_object_nregs - 2;
+			   i += 1)
+			{
+			  int regno = start_regno + i;
+			  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
+			    SET_HARD_REG_BIT (start_conflicting_regs, regno);
+			}
+		      if (hard_reg_set_subset_p (profitable_hard_regs,
+						 start_conflicting_regs))
+			goto fail;
+		    }
 		  else
 		    {
-		      if (conflict_nregs == n_objects && conflict_nregs > 1)
+		      if (has_subreg_object_p (conflict_a))
 			{
-			  int num = OBJECT_SUBWORD (conflict_obj);
-
-			  if (REG_WORDS_BIG_ENDIAN)
-			    SET_HARD_REG_BIT (conflicting_regs[word],
-					      hard_regno + n_objects - num - 1);
-			  else
-			    SET_HARD_REG_BIT (conflicting_regs[word],
-					      hard_regno + num);
+			  int start_hard_regno
+			    = REG_WORDS_BIG_ENDIAN
+				? hard_regno + ALLOCNO_NREGS (conflict_a)
+				    - OBJECT_START (conflict_obj)
+				: hard_regno + OBJECT_START (conflict_obj);
+			  for (int i = 0; i < OBJECT_NREGS (conflict_obj);
+			       i += 1)
+			    SET_HARD_REG_BIT (start_conflicting_regs,
+					      start_hard_regno + i);
 			}
 		      else
-			conflicting_regs[word]
+			start_conflicting_regs
 			  |= ira_reg_mode_hard_regset[hard_regno][mode];
 		      if (hard_reg_set_subset_p (profitable_hard_regs,
-						 conflicting_regs[word]))
+						 start_conflicting_regs))
 			goto fail;
 		    }
 		}
@@ -2160,8 +2256,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 	  && FIRST_STACK_REG <= hard_regno && hard_regno <= LAST_STACK_REG)
 	continue;
 #endif
-      if (! check_hard_reg_p (a, hard_regno,
-			      conflicting_regs, profitable_hard_regs))
+      if (!check_hard_reg_p (a, hard_regno, start_conflicting_regs,
+			     profitable_hard_regs))
 	continue;
       cost = costs[i];
       full_cost = full_costs[i];
@@ -2667,21 +2763,16 @@ push_allocno_to_stack (ira_allocno_t a)
 {
   enum reg_class aclass;
   allocno_color_data_t data, conflict_data;
-  int size, i, n = ALLOCNO_NUM_OBJECTS (a);
-    
+  int i, n = ALLOCNO_NUM_OBJECTS (a);
+
   data = ALLOCNO_COLOR_DATA (a);
   data->in_graph_p = false;
   allocno_stack_vec.safe_push (a);
   aclass = ALLOCNO_CLASS (a);
   if (aclass == NO_REGS)
     return;
-  size = ira_reg_class_max_nregs[aclass][ALLOCNO_MODE (a)];
-  if (n > 1)
-    {
-      /* We will deal with the subwords individually.  */
-      gcc_assert (size == ALLOCNO_NUM_OBJECTS (a));
-      size = 1;
-    }
+  /* Already collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
   for (i = 0; i < n; i++)
     {
       ira_object_t obj = ALLOCNO_OBJECT (a, i);
@@ -2706,6 +2797,21 @@ push_allocno_to_stack (ira_allocno_t a)
 	    continue;
 	  ira_assert (bitmap_bit_p (coloring_allocno_bitmap,
 				    ALLOCNO_NUM (conflict_a)));
+
+	  int num = ALLOCNO_NUM (conflict_a);
+	  if (allocno_conflict_regs.count (num) == 0)
+	    allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+	  bitmap_head temp;
+	  bitmap_initialize (&temp, &reg_obstack);
+	  bitmap_set_range (&temp, OBJECT_START (obj), OBJECT_NREGS (obj));
+	  bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+	  int size = bitmap_count_bits (&temp);
+	  bitmap_clear (&temp);
+	  if (size == 0)
+	    continue;
+
+	  bitmap_set_range (allocno_conflict_regs.at (num), OBJECT_START (obj),
+			    OBJECT_NREGS (obj));
 	  if (update_left_conflict_sizes_p (conflict_a, a, size))
 	    {
 	      delete_allocno_from_bucket
@@ -2721,6 +2827,9 @@ push_allocno_to_stack (ira_allocno_t a)
 	  
 	}
     }
+
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
 }
 
 /* Put ALLOCNO onto the coloring stack and remove it from its bucket.
@@ -3154,7 +3263,7 @@ improve_allocation (void)
   machine_mode mode;
   int *allocno_costs;
   int costs[FIRST_PSEUDO_REGISTER];
-  HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
+  HARD_REG_SET start_conflicting_regs, profitable_hard_regs;
   ira_allocno_t a;
   bitmap_iterator bi;
   int saved_nregs;
@@ -3193,7 +3302,7 @@ improve_allocation (void)
 		     - allocno_copy_cost_saving (a, hregno));
       try_p = false;
       get_conflict_and_start_profitable_regs (a, false,
-					      conflicting_regs,
+					      &start_conflicting_regs,
 					      &profitable_hard_regs);
       class_size = ira_class_hard_regs_num[aclass];
       mode = ALLOCNO_MODE (a);
@@ -3202,8 +3311,8 @@ improve_allocation (void)
       for (j = 0; j < class_size; j++)
 	{
 	  hregno = ira_class_hard_regs[aclass][j];
-	  if (! check_hard_reg_p (a, hregno,
-				  conflicting_regs, profitable_hard_regs))
+	  if (!check_hard_reg_p (a, hregno, start_conflicting_regs,
+				 profitable_hard_regs))
 	    continue;
 	  ira_assert (ira_class_hard_reg_index[aclass][hregno] == j);
 	  k = allocno_costs == NULL ? 0 : j;
@@ -3287,16 +3396,15 @@ improve_allocation (void)
 		}
 	      conflict_nregs = hard_regno_nregs (conflict_hregno,
 						 ALLOCNO_MODE (conflict_a));
-	      auto note_conflict = [&](int r)
-		{
-		  if (check_hard_reg_p (a, r,
-					conflicting_regs, profitable_hard_regs))
-		    {
-		      if (spill_a)
-			SET_HARD_REG_BIT (soft_conflict_regs, r);
-		      costs[r] += spill_cost;
-		    }
-		};
+	      auto note_conflict = [&] (int r) {
+		if (check_hard_reg_p (a, r, start_conflicting_regs,
+				      profitable_hard_regs))
+		  {
+		    if (spill_a)
+		      SET_HARD_REG_BIT (soft_conflict_regs, r);
+		    costs[r] += spill_cost;
+		  }
+	      };
 	      for (r = conflict_hregno;
 		   r >= 0 && (int) end_hard_regno (mode, r) > conflict_hregno;
 		   r--)
@@ -3314,8 +3422,8 @@ improve_allocation (void)
       for (j = 0; j < class_size; j++)
 	{
 	  hregno = ira_class_hard_regs[aclass][j];
-	  if (check_hard_reg_p (a, hregno,
-				conflicting_regs, profitable_hard_regs)
+	  if (check_hard_reg_p (a, hregno, start_conflicting_regs,
+				profitable_hard_regs)
 	      && min_cost > costs[hregno])
 	    {
 	      best = hregno;
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index a4d93c8d734..0585ad10043 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -60,23 +60,8 @@ static IRA_INT_TYPE **conflicts;
 static void
 record_object_conflict (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
-  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
-  int w1 = OBJECT_SUBWORD (obj1);
-  int w2 = OBJECT_SUBWORD (obj2);
-  int id1, id2;
-
-  /* Canonicalize the conflict.  If two identically-numbered words
-     conflict, always record this as a conflict between words 0.  That
-     is the only information we need, and it is easier to test for if
-     it is collected in each allocno's lowest-order object.  */
-  if (w1 == w2 && w1 > 0)
-    {
-      obj1 = ALLOCNO_OBJECT (a1, 0);
-      obj2 = ALLOCNO_OBJECT (a2, 0);
-    }
-  id1 = OBJECT_CONFLICT_ID (obj1);
-  id2 = OBJECT_CONFLICT_ID (obj2);
+  int id1 = OBJECT_CONFLICT_ID (obj1);
+  int id2 = OBJECT_CONFLICT_ID (obj2);
 
   SET_MINMAX_SET_BIT (conflicts[id1], id2, OBJECT_MIN (obj1),
 		      OBJECT_MAX (obj1));
@@ -606,8 +591,8 @@ build_object_conflicts (ira_object_t obj)
   if (parent_a == NULL)
     return;
   ira_assert (ALLOCNO_CLASS (a) == ALLOCNO_CLASS (parent_a));
-  ira_assert (ALLOCNO_NUM_OBJECTS (a) == ALLOCNO_NUM_OBJECTS (parent_a));
-  parent_obj = ALLOCNO_OBJECT (parent_a, OBJECT_SUBWORD (obj));
+  parent_obj
+    = find_object_anyway (parent_a, OBJECT_START (obj), OBJECT_NREGS (obj));
   parent_num = OBJECT_CONFLICT_ID (parent_obj);
   parent_min = OBJECT_MIN (parent_obj);
   parent_max = OBJECT_MAX (parent_obj);
@@ -616,7 +601,6 @@ build_object_conflicts (ira_object_t obj)
     {
       ira_object_t another_obj = ira_object_id_map[i];
       ira_allocno_t another_a = OBJECT_ALLOCNO (another_obj);
-      int another_word = OBJECT_SUBWORD (another_obj);
 
       ira_assert (ira_reg_classes_intersect_p
 		  [ALLOCNO_CLASS (a)][ALLOCNO_CLASS (another_a)]);
@@ -627,11 +611,11 @@ build_object_conflicts (ira_object_t obj)
       ira_assert (ALLOCNO_NUM (another_parent_a) >= 0);
       ira_assert (ALLOCNO_CLASS (another_a)
 		  == ALLOCNO_CLASS (another_parent_a));
-      ira_assert (ALLOCNO_NUM_OBJECTS (another_a)
-		  == ALLOCNO_NUM_OBJECTS (another_parent_a));
       SET_MINMAX_SET_BIT (conflicts[parent_num],
-			  OBJECT_CONFLICT_ID (ALLOCNO_OBJECT (another_parent_a,
-							      another_word)),
+			  OBJECT_CONFLICT_ID (
+			    find_object_anyway (another_parent_a,
+						OBJECT_START (another_obj),
+						OBJECT_NREGS (another_obj))),
 			  parent_min, parent_max);
     }
 }
@@ -659,9 +643,10 @@ build_conflicts (void)
 	    build_object_conflicts (obj);
 	    for (cap = ALLOCNO_CAP (a); cap != NULL; cap = ALLOCNO_CAP (cap))
 	      {
-		ira_object_t cap_obj = ALLOCNO_OBJECT (cap, j);
-		gcc_assert (ALLOCNO_NUM_OBJECTS (cap) == ALLOCNO_NUM_OBJECTS (a));
-		build_object_conflicts (cap_obj);
+		  ira_object_t cap_obj
+		    = find_object_anyway (cap, OBJECT_START (obj),
+					  OBJECT_NREGS (obj));
+		  build_object_conflicts (cap_obj);
 	      }
 	  }
       }
@@ -736,7 +721,8 @@ print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
 	}
 
       if (n > 1)
-	fprintf (file, "\n;;   subobject %d:", i);
+	fprintf (file, "\n;;   subobject s%d,n%d,f%d:", OBJECT_START (obj),
+		 OBJECT_NREGS (obj), ALLOCNO_NREGS (a));
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	{
 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
@@ -746,8 +732,10 @@ print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
 	    {
 	      fprintf (file, " a%d(r%d", ALLOCNO_NUM (conflict_a),
 		       ALLOCNO_REGNO (conflict_a));
-	      if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1)
-		fprintf (file, ",w%d", OBJECT_SUBWORD (conflict_obj));
+	      if (has_subreg_object_p (conflict_a))
+		  fprintf (file, ",s%d,n%d,f%d", OBJECT_START (conflict_obj),
+			   OBJECT_NREGS (conflict_obj),
+			   ALLOCNO_NREGS (conflict_a));
 	      if ((bb = ALLOCNO_LOOP_TREE_NODE (conflict_a)->bb) != NULL)
 		fprintf (file, ",b%d", bb->index);
 	      else
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index 84ed482e568..9dc7f3c655e 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -854,7 +854,7 @@ modify_move_list (move_t list)
 		ALLOCNO_MODE (new_allocno) = ALLOCNO_MODE (set_move->to);
 		ira_set_allocno_class (new_allocno,
 				       ALLOCNO_CLASS (set_move->to));
-		ira_create_allocno_objects (new_allocno);
+		ira_copy_allocno_objects (new_allocno, set_move->to);
 		ALLOCNO_ASSIGNED_P (new_allocno) = true;
 		ALLOCNO_HARD_REGNO (new_allocno) = -1;
 		ALLOCNO_EMIT_DATA (new_allocno)->reg
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 0685e1f4e8d..9095a8227f7 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "recog.h"
 #include "function-abi.h"
+#include <vector>
+#include "subreg-live-range.h"
 
 /* To provide consistency in naming, all IRA external variables,
    functions, common typedefs start with prefix ira_.  */
@@ -222,11 +224,13 @@ extern int ira_max_point;
 extern live_range_t *ira_start_point_ranges, *ira_finish_point_ranges;
 
 /* A structure representing conflict information for an allocno
-   (or one of its subwords).  */
+   (or one of its subregs).  */
 struct ira_object
 {
   /* The allocno associated with this record.  */
   ira_allocno_t allocno;
+  /* Index in allocno->objects array */
+  unsigned int index;
   /* Vector of accumulated conflicting conflict_redords with NULL end
      marker (if OBJECT_CONFLICT_VEC_P is true) or conflict bit vector
      otherwise.  */
@@ -236,10 +240,9 @@ struct ira_object
      ranges in the list are not intersected and ordered by decreasing
      their program points*.  */
   live_range_t live_ranges;
-  /* The subword within ALLOCNO which is represented by this object.
-     Zero means the lowest-order subword (or the entire allocno in case
-     it is not being tracked in subwords).  */
-  int subword;
+  /* Reprensent OBJECT occupied [start, start + nregs) registers of it's
+     ALLOCNO.  */
+  int start, nregs;
   /* Allocated size of the conflicts array.  */
   unsigned int conflicts_array_size;
   /* A unique number for every instance of this structure, which is used
@@ -295,6 +298,11 @@ struct ira_allocno
      reload (at this point pseudo-register has only one allocno) which
      did not get stack slot yet.  */
   signed int hard_regno : 16;
+  /* Unit size of one register that allocate for the allocno. Only use to
+     compute the start and nregs of subreg which be tracked.  */
+  poly_int64 unit_size;
+  /* Flag means need track subreg live range for the allocno.  */
+  bool track_subreg_p;
   /* A bitmask of the ABIs used by calls that occur while the allocno
      is live.  */
   unsigned int crossed_calls_abis : NUM_ABI_IDS;
@@ -353,8 +361,6 @@ struct ira_allocno
      register class living at the point than number of hard-registers
      of the class available for the allocation.  */
   int excess_pressure_points_num;
-  /* The number of objects tracked in the following array.  */
-  int num_objects;
   /* Accumulated frequency of calls which given allocno
      intersects.  */
   int call_freq;
@@ -387,8 +393,11 @@ struct ira_allocno
   /* An array of structures describing conflict information and live
      ranges for each object associated with the allocno.  There may be
      more than one such object in cases where the allocno represents a
-     multi-word register.  */
-  ira_object_t objects[2];
+     multi-hardreg pesudo.  */
+  std::vector<ira_object_t> objects;
+  /* An array of structures decribing the subreg mode start and subreg end for
+     this allocno.  */
+  std::vector<subreg_range> subregs;
   /* Registers clobbered by intersected calls.  */
    HARD_REG_SET crossed_calls_clobbered_regs;
   /* Array of usage costs (accumulated and the one updated during
@@ -468,8 +477,12 @@ struct ira_allocno
 #define ALLOCNO_EXCESS_PRESSURE_POINTS_NUM(A) \
   ((A)->excess_pressure_points_num)
 #define ALLOCNO_OBJECT(A,N) ((A)->objects[N])
-#define ALLOCNO_NUM_OBJECTS(A) ((A)->num_objects)
+#define ALLOCNO_NUM_OBJECTS(A) ((int) (A)->objects.size ())
 #define ALLOCNO_ADD_DATA(A) ((A)->add_data)
+#define ALLOCNO_UNIT_SIZE(A) ((A)->unit_size)
+#define ALLOCNO_TRACK_SUBREG_P(A) ((A)->track_subreg_p)
+#define ALLOCNO_NREGS(A)                                                       \
+  (ira_reg_class_max_nregs[ALLOCNO_CLASS (A)][ALLOCNO_MODE (A)])
 
 /* Typedef for pointer to the subsequent structure.  */
 typedef struct ira_emit_data *ira_emit_data_t;
@@ -511,7 +524,7 @@ allocno_emit_reg (ira_allocno_t a)
 }
 
 #define OBJECT_ALLOCNO(O) ((O)->allocno)
-#define OBJECT_SUBWORD(O) ((O)->subword)
+#define OBJECT_INDEX(O) ((O)->index)
 #define OBJECT_CONFLICT_ARRAY(O) ((O)->conflicts_array)
 #define OBJECT_CONFLICT_VEC(O) ((ira_object_t *)(O)->conflicts_array)
 #define OBJECT_CONFLICT_BITVEC(O) ((IRA_INT_TYPE *)(O)->conflicts_array)
@@ -524,6 +537,8 @@ allocno_emit_reg (ira_allocno_t a)
 #define OBJECT_MAX(O) ((O)->max)
 #define OBJECT_CONFLICT_ID(O) ((O)->id)
 #define OBJECT_LIVE_RANGES(O) ((O)->live_ranges)
+#define OBJECT_START(O) ((O)->start)
+#define OBJECT_NREGS(O) ((O)->nregs)
 
 /* Map regno -> allocnos with given regno (see comments for
    allocno member `next_regno_allocno').  */
@@ -1041,6 +1056,12 @@ extern void ira_free_cost_vector (int *, reg_class_t);
 extern void ira_flattening (int, int);
 extern bool ira_build (void);
 extern void ira_destroy (void);
+extern ira_object_t
+find_object (ira_allocno_t, int, int);
+extern ira_object_t find_object (ira_allocno_t, poly_int64, poly_int64);
+ira_object_t
+find_object_anyway (ira_allocno_t a, int start, int nregs);
+extern void ira_copy_allocno_objects (ira_allocno_t, ira_allocno_t);
 
 /* ira-costs.cc */
 extern void ira_init_costs_once (void);
@@ -1708,4 +1729,18 @@ ira_caller_save_loop_spill_p (ira_allocno_t a, ira_allocno_t subloop_a,
   return call_cost && call_cost >= spill_cost;
 }
 
+/* Return true if allocno A has subreg object.  */
+inline bool
+has_subreg_object_p (ira_allocno_t a)
+{
+  return ALLOCNO_NUM_OBJECTS (a) > 1;
+}
+
+/* Return the full object of allocno A.  */
+inline ira_object_t
+get_full_object (ira_allocno_t a)
+{
+  return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 #endif /* GCC_IRA_INT_H */
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 05e2be12a26..9ca9e5548da 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -35,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sparseset.h"
 #include "function-abi.h"
 #include "except.h"
+#include "subreg-live-range.h"
 
 /* The code in this file is similar to one in global but the code
    works on the allocno basis and creates live ranges instead of
@@ -91,6 +93,9 @@ static alternative_mask preferred_alternatives;
    we should not add a conflict with the copy's destination operand.  */
 static rtx ignore_reg_for_conflicts;
 
+/* Store def/use point of has_subreg_object_p register.  */
+static class subregs_live_points *subreg_live_points;
+
 /* Record hard register REGNO as now being live.  */
 static void
 make_hard_regno_live (int regno)
@@ -98,6 +103,33 @@ make_hard_regno_live (int regno)
   SET_HARD_REG_BIT (hard_regs_live, regno);
 }
 
+/* Update conflict hard regs of ALLOCNO a for current live part.  */
+static void
+add_onflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)
+{
+  gcc_assert (has_subreg_object_p (a));
+
+  if (subreg_live_points->subreg_live_ranges.count (ALLOCNO_NUM (a)) == 0)
+    return;
+
+  for (const subreg_range &r :
+       subreg_live_points->subreg_live_ranges.at (ALLOCNO_NUM (a)).ranges)
+    {
+      ira_object_t obj = find_object_anyway (a, r.start, r.end - r.start);
+      OBJECT_CONFLICT_HARD_REGS (obj) |= regs;
+      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= regs;
+    }
+}
+
+static void
+add_onflict_hard_reg (ira_allocno_t a, unsigned int regno)
+{
+  HARD_REG_SET set;
+  CLEAR_HARD_REG_SET (set);
+  SET_HARD_REG_BIT (set, regno);
+  add_onflict_hard_regs (a, set);
+}
+
 /* Process the definition of hard register REGNO.  This updates
    hard_regs_live and hard reg conflict information for living allocnos.  */
 static void
@@ -113,8 +145,13 @@ make_hard_regno_dead (int regno)
 	     == (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)))
 	continue;
 
-      SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
-      SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
+      if (has_subreg_object_p (OBJECT_ALLOCNO (obj)))
+	add_onflict_hard_reg (OBJECT_ALLOCNO (obj), regno);
+      else
+	{
+	  SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
+	  SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
+	}
     }
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
 }
@@ -127,9 +164,29 @@ make_object_live (ira_object_t obj)
   sparseset_set_bit (objects_live, OBJECT_CONFLICT_ID (obj));
 
   live_range_t lr = OBJECT_LIVE_RANGES (obj);
-  if (lr == NULL
-      || (lr->finish != curr_point && lr->finish + 1 != curr_point))
-    ira_add_live_range_to_object (obj, curr_point, -1);
+  if (lr == NULL || (lr->finish != curr_point && lr->finish + 1 != curr_point))
+    {
+      ira_add_live_range_to_object (obj, curr_point, -1);
+      if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+	{
+	  fprintf (ira_dump_file,
+		   "     add new live_range for a%d(r%d): [%d...-1]\n",
+		   ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+		   ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), curr_point);
+	}
+    }
+  else
+    {
+      if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+	{
+	  fprintf (
+	    ira_dump_file,
+	    "     use old live_range for a%d(r%d): [%d...%d], curr: %d\n",
+	    ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+	    ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), lr->start, lr->finish,
+	    curr_point);
+	}
+    }
 }
 
 /* Update ALLOCNO_EXCESS_PRESSURE_POINTS_NUM for the allocno
@@ -140,7 +197,6 @@ update_allocno_pressure_excess_length (ira_object_t obj)
   ira_allocno_t a = OBJECT_ALLOCNO (obj);
   int start, i;
   enum reg_class aclass, pclass, cl;
-  live_range_t p;
 
   aclass = ALLOCNO_CLASS (a);
   pclass = ira_pressure_class_translate[aclass];
@@ -152,10 +208,18 @@ update_allocno_pressure_excess_length (ira_object_t obj)
 	continue;
       if (high_pressure_start_point[cl] < 0)
 	continue;
-      p = OBJECT_LIVE_RANGES (obj);
-      ira_assert (p != NULL);
-      start = (high_pressure_start_point[cl] > p->start
-	       ? high_pressure_start_point[cl] : p->start);
+      int start_point;
+      if (has_subreg_object_p (a))
+	start_point = subreg_live_points->get_start_point (ALLOCNO_NUM (a));
+      else
+	{
+	  live_range_t p = OBJECT_LIVE_RANGES (obj);
+	  ira_assert (p != NULL);
+	  start_point = p->start;
+	}
+      start = (high_pressure_start_point[cl] > start_point
+		 ? high_pressure_start_point[cl]
+		 : start_point);
       ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a) += curr_point - start + 1;
     }
 }
@@ -201,6 +265,14 @@ make_object_dead (ira_object_t obj)
     CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
 
   lr = OBJECT_LIVE_RANGES (obj);
+  if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
+    {
+      fprintf (ira_dump_file,
+	       "     finish a live_range a%d(r%d): [%d...%d] => [%d...%d]\n",
+	       ALLOCNO_NUM (OBJECT_ALLOCNO (obj)),
+	       ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)), lr->start, lr->finish,
+	       lr->start, curr_point);
+    }
   ira_assert (lr != NULL);
   lr->finish = curr_point;
   update_allocno_pressure_excess_length (obj);
@@ -295,77 +367,144 @@ pseudo_regno_single_word_and_live_p (int regno)
   return sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj));
 }
 
-/* Mark the pseudo register REGNO as live.  Update all information about
-   live ranges and register pressure.  */
+/* Collect the point which the OBJ be def/use.  */
 static void
-mark_pseudo_regno_live (int regno)
+add_subreg_point (ira_object_t obj, bool is_def, bool is_dec = true)
 {
-  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  enum reg_class pclass;
-  int i, n, nregs;
-
-  if (a == NULL)
-    return;
+  ira_allocno_t a = OBJECT_ALLOCNO (obj);
+  if (is_def)
+    {
+      OBJECT_CONFLICT_HARD_REGS (obj) |= hard_regs_live;
+      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= hard_regs_live;
+      if (is_dec)
+	{
+	  enum reg_class pclass
+	    = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+	  dec_register_pressure (pclass, ALLOCNO_NREGS (a));
+	}
+      update_allocno_pressure_excess_length (obj);
+    }
+  else
+    {
+      enum reg_class pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      inc_register_pressure (pclass, ALLOCNO_NREGS (a));
+    }
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  subreg_range r = subreg_range (
+    {OBJECT_START (obj), OBJECT_START (obj) + OBJECT_NREGS (obj)});
+  subreg_live_points->add_point (ALLOCNO_NUM (a), ALLOCNO_NREGS (a), r, is_def,
+				 curr_point);
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
-  if (n > 1)
+  if (internal_flag_ira_verbose > 8 && ira_dump_file != NULL)
     {
-      /* We track every subobject separately.  */
-      gcc_assert (nregs == n);
-      nregs = 1;
+      fprintf (ira_dump_file, "     %s a%d(r%d", is_def ? "def" : "use",
+	       ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
+      if (ALLOCNO_CLASS (a) != NO_REGS
+	  && ALLOCNO_NREGS (a) != OBJECT_NREGS (obj))
+	fprintf (ira_dump_file, " [subreg: start %d, nregs %d]",
+		 OBJECT_START (obj), OBJECT_NREGS (obj));
+      else
+	fprintf (ira_dump_file, " [full: nregs %d]", OBJECT_NREGS (obj));
+      fprintf (ira_dump_file, ") at point %d\n", curr_point);
     }
 
-  for (i = 0; i < n; i++)
-    {
-      ira_object_t obj = ALLOCNO_OBJECT (a, i);
+  gcc_assert (has_subreg_object_p (a));
+  gcc_assert (subreg_live_points->subreg_live_ranges.count (ALLOCNO_NUM (a))
+	      != 0);
+
+  const subreg_ranges &sr
+    = subreg_live_points->subreg_live_ranges.at (ALLOCNO_NUM (a));
+  ira_object_t main_obj = find_object (a, 0, ALLOCNO_NREGS (a));
+  gcc_assert (main_obj != NULL);
+  if (sr.empty_p ()
+      && sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (main_obj)))
+    sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (main_obj));
+  else if (!sr.empty_p ()
+	   && !sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (main_obj)))
+    sparseset_set_bit (objects_live, OBJECT_CONFLICT_ID (main_obj));
+}
+
+/* Mark the object OBJ as live.  */
+static void
+mark_pseudo_object_live (ira_allocno_t a, ira_object_t obj)
+{
+  /* Invalidate because it is referenced.  */
+  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
 
+  if (has_subreg_object_p (a))
+    add_subreg_point (obj, false);
+  else
+    {
       if (sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-	continue;
+	return;
 
-      inc_register_pressure (pclass, nregs);
+      enum reg_class pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      inc_register_pressure (pclass, ALLOCNO_NREGS (a));
       make_object_live (obj);
     }
 }
 
+/* Mark the pseudo register REGNO as live.  Update all information about
+   live ranges and register pressure.  */
+static void
+mark_pseudo_regno_live (int regno)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+
+  if (a == NULL)
+    return;
+
+  int nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  ira_object_t obj = find_object (a, 0, nregs);
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_live (a, obj);
+}
+
 /* Like mark_pseudo_regno_live, but try to only mark one subword of
    the pseudo as live.  SUBWORD indicates which; a value of 0
    indicates the low part.  */
 static void
-mark_pseudo_regno_subword_live (int regno, int subword)
+mark_pseudo_regno_subreg_live (int regno, rtx subreg)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n;
-  enum reg_class pclass;
-  ira_object_t obj;
 
   if (a == NULL)
     return;
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  ira_object_t obj
+    = find_object (a, SUBREG_BYTE (subreg), GET_MODE_SIZE (GET_MODE (subreg)));
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_live (a, obj);
+}
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  if (n == 1)
+/* Mark objects in subreg ranges SR as live.  Update all information about
+   live ranges and register pressure.  */
+static void
+mark_pseudo_regno_subregs_live (int regno, const subreg_ranges &sr)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+  if (a == NULL)
+    return;
+
+  if (!ALLOCNO_TRACK_SUBREG_P (a))
     {
       mark_pseudo_regno_live (regno);
       return;
     }
 
-  pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  gcc_assert
-    (n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
-  obj = ALLOCNO_OBJECT (a, subword);
-
-  if (sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-    return;
-
-  inc_register_pressure (pclass, 1);
-  make_object_live (obj);
+  int times = sr.max / ALLOCNO_NREGS (a);
+  gcc_assert (sr.max >= ALLOCNO_NREGS (a)
+	      && times * ALLOCNO_NREGS (a) == sr.max);
+  for (const subreg_range &range : sr.ranges)
+    {
+      int start = range.start / times;
+      int end = CEIL (range.end, times);
+      ira_object_t obj = find_object (a, start, end - start);
+      gcc_assert (obj != NULL);
+      mark_pseudo_object_live (a, obj);
+    }
 }
 
 /* Mark the register REG as live.  Store a 1 in hard_regs_live for
@@ -403,10 +542,7 @@ static void
 mark_pseudo_reg_live (rtx orig_reg, unsigned regno)
 {
   if (read_modify_subreg_p (orig_reg))
-    {
-      mark_pseudo_regno_subword_live (regno,
-				      subreg_lowpart_p (orig_reg) ? 0 : 1);
-    }
+    mark_pseudo_regno_subreg_live (regno, orig_reg);
   else
     mark_pseudo_regno_live (regno);
 }
@@ -427,72 +563,59 @@ mark_ref_live (df_ref ref)
     mark_hard_reg_live (reg);
 }
 
-/* Mark the pseudo register REGNO as dead.  Update all information about
-   live ranges and register pressure.  */
+/* Mark object as dead.  */
 static void
-mark_pseudo_regno_dead (int regno)
+mark_pseudo_object_dead (ira_allocno_t a, ira_object_t obj)
 {
-  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n, i, nregs;
-  enum reg_class cl;
-
-  if (a == NULL)
-    return;
-
   /* Invalidate because it is referenced.  */
   allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
-  if (n > 1)
-    {
-      /* We track every subobject separately.  */
-      gcc_assert (nregs == n);
-      nregs = 1;
-    }
-  for (i = 0; i < n; i++)
+  if (has_subreg_object_p (a))
+    add_subreg_point (obj, true);
+  else
     {
-      ira_object_t obj = ALLOCNO_OBJECT (a, i);
       if (!sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
-	continue;
+	return;
 
-      dec_register_pressure (cl, nregs);
+      enum reg_class cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+      dec_register_pressure (cl, ALLOCNO_NREGS (a));
       make_object_dead (obj);
     }
 }
 
-/* Like mark_pseudo_regno_dead, but called when we know that only part of the
-   register dies.  SUBWORD indicates which; a value of 0 indicates the low part.  */
+/* Mark the pseudo register REGNO as dead.  Update all information about
+   live ranges and register pressure.  */
 static void
-mark_pseudo_regno_subword_dead (int regno, int subword)
+mark_pseudo_regno_dead (int regno)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[regno];
-  int n;
-  enum reg_class cl;
-  ira_object_t obj;
 
   if (a == NULL)
     return;
 
-  /* Invalidate because it is referenced.  */
-  allocno_saved_at_call[ALLOCNO_NUM (a)] = 0;
+  int nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  ira_object_t obj = find_object (a, 0, nregs);
+  gcc_assert (obj != NULL);
 
-  n = ALLOCNO_NUM_OBJECTS (a);
-  if (n == 1)
-    /* The allocno as a whole doesn't die in this case.  */
-    return;
+  mark_pseudo_object_dead (a, obj);
+}
 
-  cl = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
-  gcc_assert
-    (n == ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
+/* Like mark_pseudo_regno_dead, but called when we know that only part of the
+   register dies.  SUBWORD indicates which; a value of 0 indicates the low part.
+ */
+static void
+mark_pseudo_regno_subreg_dead (int regno, rtx subreg)
+{
+  ira_allocno_t a = ira_curr_regno_allocno_map[regno];
 
-  obj = ALLOCNO_OBJECT (a, subword);
-  if (!sparseset_bit_p (objects_live, OBJECT_CONFLICT_ID (obj)))
+  if (a == NULL)
     return;
 
-  dec_register_pressure (cl, 1);
-  make_object_dead (obj);
+  ira_object_t obj
+    = find_object (a, SUBREG_BYTE (subreg), GET_MODE_SIZE (GET_MODE (subreg)));
+  gcc_assert (obj != NULL);
+
+  mark_pseudo_object_dead (a, obj);
 }
 
 /* Process the definition of hard register REG.  This updates hard_regs_live
@@ -528,10 +651,7 @@ static void
 mark_pseudo_reg_dead (rtx orig_reg, unsigned regno)
 {
   if (read_modify_subreg_p (orig_reg))
-    {
-      mark_pseudo_regno_subword_dead (regno,
-				      subreg_lowpart_p (orig_reg) ? 0 : 1);
-    }
+    mark_pseudo_regno_subreg_dead (regno, orig_reg);
   else
     mark_pseudo_regno_dead (regno);
 }
@@ -1059,8 +1179,15 @@ process_single_reg_class_operands (bool in_p, int freq)
 	      /* We could increase costs of A instead of making it
 		 conflicting with the hard register.  But it works worse
 		 because it will be spilled in reload in anyway.  */
-	      OBJECT_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
-	      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
+	      if (has_subreg_object_p (a))
+		add_onflict_hard_regs (OBJECT_ALLOCNO (obj),
+				       reg_class_contents[cl]);
+	      else
+		{
+		    OBJECT_CONFLICT_HARD_REGS (obj) |= reg_class_contents[cl];
+		    OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
+		      |= reg_class_contents[cl];
+		}
 	    }
 	}
     }
@@ -1198,17 +1325,15 @@ process_out_of_region_eh_regs (basic_block bb)
 			    bi)
     {
       ira_allocno_t a = ira_curr_regno_allocno_map[i];
-      for (int n = ALLOCNO_NUM_OBJECTS (a) - 1; n >= 0; n--)
+      ira_object_t obj = find_object (a, 0, ALLOCNO_NREGS (a));
+      for (int k = 0;; k++)
 	{
-	  ira_object_t obj = ALLOCNO_OBJECT (a, n);
-	  for (int k = 0; ; k++)
-	    {
-	      unsigned int regno = EH_RETURN_DATA_REGNO (k);
-	      if (regno == INVALID_REGNUM)
-		break;
-	      SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
-	      SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
-	    }
+	  unsigned int regno = EH_RETURN_DATA_REGNO (k);
+	  if (regno == INVALID_REGNUM)
+	    break;
+
+	  SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
+	  SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
 	}
     }
 }
@@ -1234,8 +1359,13 @@ add_conflict_from_region_landing_pads (eh_region region, ira_object_t obj,
 	{
 	  HARD_REG_SET new_conflict_regs
 	    = callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-	  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
-	  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+	  if (has_subreg_object_p (a))
+	    add_onflict_hard_regs (a, new_conflict_regs);
+	  else
+	    {
+	      OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+	      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+	    }
 	  return;
 	}
     }
@@ -1260,6 +1390,10 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
   bb = loop_tree_node->bb;
   if (bb != NULL)
     {
+      if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+	fprintf (ira_dump_file, "\n   BB exit(l%d): point = %d\n",
+		 loop_tree_node->parent->loop_num, curr_point);
+
       for (i = 0; i < ira_pressure_classes_num; i++)
 	{
 	  curr_reg_pressure[ira_pressure_classes[i]] = 0;
@@ -1268,6 +1402,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
       curr_bb_node = loop_tree_node;
       reg_live_out = DF_LIVE_SUBREG_OUT (bb);
       sparseset_clear (objects_live);
+      subreg_live_points->clear_live_ranges ();
       REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
       hard_regs_live &= ~(eliminable_regset | ira_no_alloc_regs);
       for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
@@ -1291,9 +1426,17 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 			    <= ira_class_hard_regs_num[cl]);
 	      }
 	  }
-      EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_FULL_OUT (bb),
+				FIRST_PSEUDO_REGISTER, j, bi)
 	mark_pseudo_regno_live (j);
 
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+				FIRST_PSEUDO_REGISTER, j, bi)
+	{
+	  mark_pseudo_regno_subregs_live (
+	    j, DF_LIVE_SUBREG_RANGE_OUT (bb)->lives.at (j));
+	}
+
 #ifdef EH_RETURN_DATA_REGNO
       process_out_of_region_eh_regs (bb);
 #endif
@@ -1408,8 +1551,18 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 			  && (find_reg_note (insn, REG_SETJMP, NULL_RTX)
 			      != NULL_RTX)))
 		    {
-		      SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
-		      SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
+		      if (has_subreg_object_p (a))
+			{
+			  HARD_REG_SET regs;
+			  SET_HARD_REG_SET (regs);
+			  add_onflict_hard_regs (a, regs);
+			}
+		      else
+			{
+			  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
+			  SET_HARD_REG_SET (
+			    OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
+			}
 		    }
 		  eh_region r;
 		  if (can_throw_internal (insn)
@@ -1455,7 +1608,14 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	  
 	  /* Mark each used value as live.  */
 	  FOR_EACH_INSN_USE (use, insn)
-	    mark_ref_live (use);
+	    {
+	      unsigned regno = DF_REF_REGNO (use);
+	      ira_allocno_t a = ira_curr_regno_allocno_map[regno];
+	      if (a && has_subreg_object_p (a)
+		  && DF_REF_FLAGS (use) & (DF_REF_READ_WRITE | DF_REF_SUBREG))
+		  continue;
+	      mark_ref_live (use);
+	    }
 
 	  process_single_reg_class_operands (true, freq);
 
@@ -1485,6 +1645,10 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	}
       ignore_reg_for_conflicts = NULL_RTX;
 
+      if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+	fprintf (ira_dump_file, "\n   BB head(l%d): point = %d\n",
+		 loop_tree_node->parent->loop_num, curr_point);
+
       if (bb_has_eh_pred (bb))
 	for (j = 0; ; ++j)
 	  {
@@ -1538,10 +1702,15 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 	}
 
       EXECUTE_IF_SET_IN_SPARSESET (objects_live, i)
-	make_object_dead (ira_object_id_map[i]);
+      {
+	ira_object_t obj = ira_object_id_map[i];
+	if (has_subreg_object_p (OBJECT_ALLOCNO (obj)))
+	  add_subreg_point (obj, true, false);
+	else
+	  make_object_dead (obj);
+      }
 
       curr_point++;
-
     }
   /* Propagate register pressure to upper loop tree nodes.  */
   if (loop_tree_node != ira_loop_tree_root)
@@ -1742,6 +1911,86 @@ ira_debug_live_ranges (void)
   print_live_ranges (stderr);
 }
 
+class subreg_live_item
+{
+public:
+  subreg_ranges subreg;
+  int start, finish;
+};
+
+/* Create subreg live ranges from objects def/use point info.  */
+static void
+create_subregs_live_ranges ()
+{
+  for (const auto &subreg_point_it : subreg_live_points->subreg_points)
+    {
+      unsigned int allocno_num = subreg_point_it.first;
+      const class live_points &points = subreg_point_it.second;
+      ira_allocno_t a = ira_allocnos[allocno_num];
+      std::vector<subreg_live_item> temps;
+      gcc_assert (has_subreg_object_p (a));
+      for (const auto &point_it : points.points)
+	{
+	  int point = point_it.first;
+	  const live_point &regs = point_it.second;
+	  gcc_assert (temps.empty () || temps.back ().finish <= point);
+	  if (!regs.use_reg.empty_p ())
+	    {
+	      if (temps.empty ())
+		temps.push_back ({regs.use_reg, point, -1});
+	      else if (temps.back ().finish == -1)
+		{
+		  if (!temps.back ().subreg.same_p (regs.use_reg))
+		    {
+		      if (temps.back ().start == point)
+			temps.back ().subreg.add_ranges (regs.use_reg);
+		      else
+			{
+			  temps.back ().finish = point - 1;
+
+			  subreg_ranges temp = regs.use_reg;
+			  temp.add_ranges (temps.back ().subreg);
+			  temps.push_back ({temp, point, -1});
+			}
+		    }
+		}
+	      else if (temps.back ().subreg.same_p (regs.use_reg)
+		       && (temps.back ().finish == point
+			   || temps.back ().finish + 1 == point))
+		temps.back ().finish = -1;
+	      else
+		temps.push_back ({regs.use_reg, point, -1});
+	    }
+	  if (!regs.def_reg.empty_p ())
+	    {
+	      gcc_assert (!temps.empty ());
+	      if (regs.def_reg.include_ranges_p (temps.back ().subreg))
+		temps.back ().finish = point;
+	      else if (temps.back ().subreg.include_ranges_p (regs.def_reg))
+		{
+		  temps.back ().finish = point;
+
+		  subreg_ranges diff = temps.back ().subreg;
+		  diff.remove_ranges (regs.def_reg);
+		  temps.push_back ({diff, point + 1, -1});
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	}
+      for (const subreg_live_item &item : temps)
+	for (const subreg_range &r : item.subreg.ranges)
+	  {
+	    ira_object_t obj = find_object_anyway (a, r.start, r.end - r.start);
+	    live_range_t lr = OBJECT_LIVE_RANGES (obj);
+	    if (lr != NULL && lr->finish + 1 == item.start)
+	      lr->finish = item.finish;
+	    else
+	      ira_add_live_range_to_object (obj, item.start, item.finish);
+	  }
+    }
+}
+
 /* The main entry function creates live ranges, set up
    CONFLICT_HARD_REGS and TOTAL_CONFLICT_HARD_REGS for objects, and
    calculate register pressure info.  */
@@ -1755,13 +2004,20 @@ ira_create_allocno_live_ranges (void)
   allocno_saved_at_call
     = (int *) ira_allocate (ira_allocnos_num * sizeof (int));
   memset (allocno_saved_at_call, 0, ira_allocnos_num * sizeof (int));
+  subreg_live_points = new subregs_live_points ();
   ira_traverse_loop_tree (true, ira_loop_tree_root, NULL,
 			  process_bb_node_lives);
   ira_max_point = curr_point;
+  create_subregs_live_ranges ();
   create_start_finish_chains ();
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
-    print_live_ranges (ira_dump_file);
+    {
+      fprintf (ira_dump_file, ";; subreg live points:\n");
+      subreg_live_points->dump (ira_dump_file);
+      print_live_ranges (ira_dump_file);
+    }
   /* Clean up.  */
+  delete subreg_live_points;
   ira_free (allocno_saved_at_call);
   sparseset_free (objects_live);
   sparseset_free (allocnos_processed);
diff --git a/gcc/ira.cc b/gcc/ira.cc
index c7f27b17002..9ea57d3b1ea 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2623,7 +2623,7 @@ static void
 check_allocation (void)
 {
   ira_allocno_t a;
-  int hard_regno, nregs, conflict_nregs;
+  int hard_regno;
   ira_allocno_iterator ai;
 
   FOR_EACH_ALLOCNO (a, ai)
@@ -2634,28 +2634,18 @@ check_allocation (void)
       if (ALLOCNO_CAP_MEMBER (a) != NULL
 	  || (hard_regno = ALLOCNO_HARD_REGNO (a)) < 0)
 	continue;
-      nregs = hard_regno_nregs (hard_regno, ALLOCNO_MODE (a));
-      if (nregs == 1)
-	/* We allocated a single hard register.  */
-	n = 1;
-      else if (n > 1)
-	/* We allocated multiple hard registers, and we will test
-	   conflicts in a granularity of single hard regs.  */
-	nregs = 1;
 
       for (i = 0; i < n; i++)
 	{
 	  ira_object_t obj = ALLOCNO_OBJECT (a, i);
 	  ira_object_t conflict_obj;
 	  ira_object_conflict_iterator oci;
-	  int this_regno = hard_regno;
-	  if (n > 1)
-	    {
-	      if (REG_WORDS_BIG_ENDIAN)
-		this_regno += n - i - 1;
-	      else
-		this_regno += i;
-	    }
+	  int this_regno;
+	  if (REG_WORDS_BIG_ENDIAN)
+	    this_regno = hard_regno + ALLOCNO_NREGS (a) - 1 - OBJECT_START (obj)
+			 - OBJECT_NREGS (obj) + 1;
+	  else
+	    this_regno = hard_regno + OBJECT_START (obj);
 	  FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
 	    {
 	      ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
@@ -2665,24 +2655,18 @@ check_allocation (void)
 	      if (ira_soft_conflict (a, conflict_a))
 		continue;
 
-	      conflict_nregs = hard_regno_nregs (conflict_hard_regno,
-						 ALLOCNO_MODE (conflict_a));
-
-	      if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1
-		  && conflict_nregs == ALLOCNO_NUM_OBJECTS (conflict_a))
-		{
-		  if (REG_WORDS_BIG_ENDIAN)
-		    conflict_hard_regno += (ALLOCNO_NUM_OBJECTS (conflict_a)
-					    - OBJECT_SUBWORD (conflict_obj) - 1);
-		  else
-		    conflict_hard_regno += OBJECT_SUBWORD (conflict_obj);
-		  conflict_nregs = 1;
-		}
+	      if (REG_WORDS_BIG_ENDIAN)
+		conflict_hard_regno = conflict_hard_regno
+				      + ALLOCNO_NREGS (conflict_a) - 1
+				      - OBJECT_START (conflict_obj)
+				      - OBJECT_NREGS (conflict_obj) + 1;
+	      else
+		conflict_hard_regno
+		  = conflict_hard_regno + OBJECT_START (conflict_obj);
 
-	      if ((conflict_hard_regno <= this_regno
-		 && this_regno < conflict_hard_regno + conflict_nregs)
-		|| (this_regno <= conflict_hard_regno
-		    && conflict_hard_regno < this_regno + nregs))
+	      if (!(this_regno + OBJECT_NREGS (obj) <= conflict_hard_regno
+		    || conflict_hard_regno + OBJECT_NREGS (conflict_obj)
+			 <= this_regno))
 		{
 		  fprintf (stderr, "bad allocation for %d and %d\n",
 			   ALLOCNO_REGNO (a), ALLOCNO_REGNO (conflict_a));
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 4/7] ira: Support subreg copy
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (2 preceding siblings ...)
  2023-11-12  9:58 ` [PATCH V2 3/7] ira: Support subreg live range track Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch changes the previous way of creating a copy between allocnos to objects.

gcc/ChangeLog:

	* ira-build.cc (find_allocno_copy): Removed.
	(find_object): New.
	(ira_create_copy): Adjust.
	(add_allocno_copy_to_list): Adjust.
	(swap_allocno_copy_ends_if_necessary): Adjust.
	(ira_add_allocno_copy): Adjust.
	(print_copy): Adjust.
	(print_allocno_copies): Adjust.
	(ira_flattening): Adjust.
	* ira-color.cc (INCLUDE_VECTOR): Include vector.
	(struct allocno_color_data): Adjust.
	(struct allocno_hard_regs_subnode): Adjust.
	(form_allocno_hard_regs_nodes_forest): Adjust.
	(update_left_conflict_sizes_p): Adjust.
	(struct update_cost_queue_elem): Adjust.
	(queue_update_cost): Adjust.
	(get_next_update_cost): Adjust.
	(update_costs_from_allocno): Adjust.
	(update_conflict_hard_regno_costs): Adjust.
	(assign_hard_reg): Adjust.
	(objects_conflict_by_live_ranges_p): New.
	(allocno_thread_conflict_p): Adjust.
	(object_thread_conflict_p): Ditto.
	(merge_threads): Ditto.
	(form_threads_from_copies): Ditto.
	(form_threads_from_bucket): Ditto.
	(form_threads_from_colorable_allocno): Ditto.
	(init_allocno_threads): Ditto.
	(add_allocno_to_bucket): Ditto.
	(delete_allocno_from_bucket): Ditto.
	(allocno_copy_cost_saving): Ditto.
	(color_allocnos): Ditto.
	(color_pass): Ditto.
	(update_curr_costs): Ditto.
	(coalesce_allocnos): Ditto.
	(ira_reuse_stack_slot): Ditto.
	(ira_initiate_assign): Ditto.
	(ira_finish_assign): Ditto.
	* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
	(REG_SUBREG_P): Ditto.
	(subreg_move_p): New.
	(regs_non_conflict_for_copy_p): New.
	(subreg_reg_align_and_times_p): New.
	(process_regs_for_copy): Ditto.
	(add_insn_allocno_copies): Ditto.
	(propagate_copies): Ditto.
	* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
	* ira-int.h (struct ira_allocno_copy): Ditto.
	(ira_add_allocno_copy): Ditto.
	(find_object): Exported.
	(subreg_move_p): Exported.
	* ira.cc (print_redundant_copies): Exported.

---
 gcc/ira-build.cc     | 154 +++++++-----
 gcc/ira-color.cc     | 541 +++++++++++++++++++++++++++++++------------
 gcc/ira-conflicts.cc | 173 +++++++++++---
 gcc/ira-emit.cc      |  10 +-
 gcc/ira-int.h        |  10 +-
 gcc/ira.cc           |   5 +-
 6 files changed, 646 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-				     ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -520,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+    return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+    return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
    found.  */
 ira_object_t
@@ -1503,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
    LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
 		   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a1)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      if (first_a == a1)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  if (cp->first == obj1)
+	    another_obj = cp->second;
+	  else
+	    continue;
 	}
-      else if (cp->second == a1)
+      else if (second_a == a1)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  if (cp->second == obj1)
+	    another_obj = cp->first;
+	  else
+	    continue;
 	}
       else
 	gcc_unreachable ();
-      if (another_a == a2 && cp->insn == insn
+      if (another_obj == obj2 && cp->insn == insn
 	  && cp->loop_tree_node == loop_tree_node)
 	return cp;
     }
@@ -1533,7 +1549,7 @@ find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
 /* Create and return copy with given attributes LOOP_TREE_NODE, FIRST,
    SECOND, FREQ, CONSTRAINT_P, and INSN.  */
 ira_copy_t
-ira_create_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+ira_create_copy (ira_object_t first, ira_object_t second, int freq,
 		 bool constraint_p, rtx_insn *insn,
 		 ira_loop_tree_node_t loop_tree_node)
 {
@@ -1557,28 +1573,29 @@ ira_create_copy (ira_allocno_t first, ira_allocno_t second, int freq,
 static void
 add_allocno_copy_to_list (ira_copy_t cp)
 {
-  ira_allocno_t first = cp->first, second = cp->second;
+  ira_object_t first = cp->first, second = cp->second;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (first), a2 = OBJECT_ALLOCNO (second);
 
   cp->prev_first_allocno_copy = NULL;
   cp->prev_second_allocno_copy = NULL;
-  cp->next_first_allocno_copy = ALLOCNO_COPIES (first);
+  cp->next_first_allocno_copy = ALLOCNO_COPIES (a1);
   if (cp->next_first_allocno_copy != NULL)
     {
-      if (cp->next_first_allocno_copy->first == first)
+      if (OBJECT_ALLOCNO (cp->next_first_allocno_copy->first) == a1)
 	cp->next_first_allocno_copy->prev_first_allocno_copy = cp;
       else
 	cp->next_first_allocno_copy->prev_second_allocno_copy = cp;
     }
-  cp->next_second_allocno_copy = ALLOCNO_COPIES (second);
+  cp->next_second_allocno_copy = ALLOCNO_COPIES (a2);
   if (cp->next_second_allocno_copy != NULL)
     {
-      if (cp->next_second_allocno_copy->second == second)
+      if (OBJECT_ALLOCNO (cp->next_second_allocno_copy->second) == a2)
 	cp->next_second_allocno_copy->prev_second_allocno_copy = cp;
       else
 	cp->next_second_allocno_copy->prev_first_allocno_copy = cp;
     }
-  ALLOCNO_COPIES (first) = cp;
-  ALLOCNO_COPIES (second) = cp;
+  ALLOCNO_COPIES (a1) = cp;
+  ALLOCNO_COPIES (a2) = cp;
 }
 
 /* Make a copy CP a canonical copy where number of the
@@ -1586,7 +1603,8 @@ add_allocno_copy_to_list (ira_copy_t cp)
 static void
 swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 {
-  if (ALLOCNO_NUM (cp->first) <= ALLOCNO_NUM (cp->second))
+  if (ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))
+      <= ALLOCNO_NUM (OBJECT_ALLOCNO (cp->second)))
     return;
 
   std::swap (cp->first, cp->second);
@@ -1595,11 +1613,10 @@ swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 }
 
 /* Create (or update frequency if the copy already exists) and return
-   the copy of allocnos FIRST and SECOND with frequency FREQ
-   corresponding to move insn INSN (if any) and originated from
-   LOOP_TREE_NODE.  */
+   the copy of objects FIRST and SECOND with frequency FREQ corresponding to
+   move insn INSN (if any) and originated from LOOP_TREE_NODE.  */
 ira_copy_t
-ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+ira_add_allocno_copy (ira_object_t first, ira_object_t second, int freq,
 		      bool constraint_p, rtx_insn *insn,
 		      ira_loop_tree_node_t loop_tree_node)
 {
@@ -1618,15 +1635,38 @@ ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
   return cp;
 }
 
+/* Create (or update frequency if the copy already exists) and return
+   the copy of allocnos FIRST and SECOND with frequency FREQ
+   corresponding to move insn INSN (if any) and originated from
+   LOOP_TREE_NODE.  */
+ira_copy_t
+ira_add_allocno_copy (ira_allocno_t first, ira_allocno_t second, int freq,
+		      bool constraint_p, rtx_insn *insn,
+		      ira_loop_tree_node_t loop_tree_node)
+{
+  ira_object_t obj1 = get_full_object (first);
+  ira_object_t obj2 = get_full_object (second);
+  gcc_assert (obj1 != NULL && obj2 != NULL);
+  return ira_add_allocno_copy (obj1, obj2, freq, constraint_p, insn,
+			       loop_tree_node);
+}
+
 /* Print info about copy CP into file F.  */
 static void
 print_copy (FILE *f, ira_copy_t cp)
 {
-  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num,
-	   ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-	   ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second), cp->freq,
-	   cp->insn != NULL
-	   ? "move" : cp->constraint_p ? "constraint" : "shuffle");
+  ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
+  fprintf (f, "  cp%d:a%d(r%d", cp->num, ALLOCNO_NUM (a1), ALLOCNO_REGNO (a1));
+  if (ALLOCNO_NREGS (a1) != OBJECT_NREGS (cp->first))
+    fprintf (f, "_obj%d", OBJECT_INDEX (cp->first));
+  fprintf (f, ")<->a%d(r%d", ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2));
+  if (ALLOCNO_NREGS (a2) != OBJECT_NREGS (cp->second))
+    fprintf (f, "_obj%d", OBJECT_INDEX (cp->second));
+  fprintf (f, ")@%d:%s\n", cp->freq,
+	   cp->insn != NULL   ? "move"
+	   : cp->constraint_p ? "constraint"
+			      : "shuffle");
 }
 
 DEBUG_FUNCTION void
@@ -1673,24 +1713,25 @@ ira_debug_copies (void)
 static void
 print_allocno_copies (FILE *f, ira_allocno_t a)
 {
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
   ira_copy_t cp, next_cp;
 
   fprintf (f, " a%d(r%d):", ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      if (OBJECT_ALLOCNO (cp->first) == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_obj = cp->second;
 	}
-      else if (cp->second == a)
+      else if (OBJECT_ALLOCNO (cp->second) == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_obj = cp->first;
 	}
       else
 	gcc_unreachable ();
+      ira_allocno_t another_a = OBJECT_ALLOCNO (another_obj);
       fprintf (f, " cp%d:a%d(r%d)@%d", cp->num,
 	       ALLOCNO_NUM (another_a), ALLOCNO_REGNO (another_a), cp->freq);
     }
@@ -3480,25 +3521,21 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
      copies.  */
   FOR_EACH_COPY (cp, ci)
     {
-      if (ALLOCNO_CAP_MEMBER (cp->first) != NULL
-	  || ALLOCNO_CAP_MEMBER (cp->second) != NULL)
+      ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
+      if (ALLOCNO_CAP_MEMBER (a1) != NULL || ALLOCNO_CAP_MEMBER (a2) != NULL)
 	{
 	  if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
-	    fprintf
-	      (ira_dump_file, "      Remove cp%d:%c%dr%d-%c%dr%d\n",
-	       cp->num, ALLOCNO_CAP_MEMBER (cp->first) != NULL ? 'c' : 'a',
-	       ALLOCNO_NUM (cp->first),
-	       REGNO (allocno_emit_reg (cp->first)),
-	       ALLOCNO_CAP_MEMBER (cp->second) != NULL ? 'c' : 'a',
-	       ALLOCNO_NUM (cp->second),
-	       REGNO (allocno_emit_reg (cp->second)));
+	    fprintf (ira_dump_file, "      Remove cp%d:%c%dr%d-%c%dr%d\n",
+		     cp->num, ALLOCNO_CAP_MEMBER (a1) != NULL ? 'c' : 'a',
+		     ALLOCNO_NUM (a1), REGNO (allocno_emit_reg (a1)),
+		     ALLOCNO_CAP_MEMBER (a2) != NULL ? 'c' : 'a',
+		     ALLOCNO_NUM (a2), REGNO (allocno_emit_reg (a2)));
 	  cp->loop_tree_node = NULL;
 	  continue;
 	}
-      first
-	= regno_top_level_allocno_map[REGNO (allocno_emit_reg (cp->first))];
-      second
-	= regno_top_level_allocno_map[REGNO (allocno_emit_reg (cp->second))];
+      first = regno_top_level_allocno_map[REGNO (allocno_emit_reg (a1))];
+      second = regno_top_level_allocno_map[REGNO (allocno_emit_reg (a2))];
       node = cp->loop_tree_node;
       if (node == NULL)
 	keep_p = true; /* It copy generated in ira-emit.cc.  */
@@ -3506,8 +3543,8 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
 	{
 	  /* Check that the copy was not propagated from level on
 	     which we will have different pseudos.  */
-	  node_first = node->regno_allocno_map[ALLOCNO_REGNO (cp->first)];
-	  node_second = node->regno_allocno_map[ALLOCNO_REGNO (cp->second)];
+	  node_first = node->regno_allocno_map[ALLOCNO_REGNO (a1)];
+	  node_second = node->regno_allocno_map[ALLOCNO_REGNO (a2)];
 	  keep_p = ((REGNO (allocno_emit_reg (first))
 		     == REGNO (allocno_emit_reg (node_first)))
 		     && (REGNO (allocno_emit_reg (second))
@@ -3516,18 +3553,18 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
       if (keep_p)
 	{
 	  cp->loop_tree_node = ira_loop_tree_root;
-	  cp->first = first;
-	  cp->second = second;
+	  cp->first = find_object_anyway (first, OBJECT_START (cp->first),
+					  OBJECT_NREGS (cp->first));
+	  cp->second = find_object_anyway (second, OBJECT_START (cp->second),
+					   OBJECT_NREGS (cp->second));
 	}
       else
 	{
 	  cp->loop_tree_node = NULL;
 	  if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file, "      Remove cp%d:a%dr%d-a%dr%d\n",
-		     cp->num, ALLOCNO_NUM (cp->first),
-		     REGNO (allocno_emit_reg (cp->first)),
-		     ALLOCNO_NUM (cp->second),
-		     REGNO (allocno_emit_reg (cp->second)));
+		     cp->num, ALLOCNO_NUM (a1), REGNO (allocno_emit_reg (a1)),
+		     ALLOCNO_NUM (a2), REGNO (allocno_emit_reg (a2)));
 	}
     }
   /* Remove unnecessary allocnos on lower levels of the loop tree.  */
@@ -3563,9 +3600,10 @@ ira_flattening (int max_regno_before_emit, int ira_max_point_before_emit)
 	  finish_copy (cp);
 	  continue;
 	}
-      ira_assert
-	(ALLOCNO_LOOP_TREE_NODE (cp->first) == ira_loop_tree_root
-	 && ALLOCNO_LOOP_TREE_NODE (cp->second) == ira_loop_tree_root);
+      ira_assert (ALLOCNO_LOOP_TREE_NODE (OBJECT_ALLOCNO (cp->first))
+		    == ira_loop_tree_root
+		  && ALLOCNO_LOOP_TREE_NODE (OBJECT_ALLOCNO (cp->second))
+		       == ira_loop_tree_root);
       add_allocno_copy_to_list (cp);
       swap_allocno_copy_ends_if_necessary (cp);
     }
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MAP
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -150,11 +151,18 @@ struct allocno_color_data
   struct update_cost_record *update_cost_records;
   /* Threads.  We collect allocnos connected by copies into threads
      and try to assign hard regs to allocnos by threads.  */
-  /* Allocno representing all thread.  */
-  ira_allocno_t first_thread_allocno;
+  /* The head objects for all thread.  */
+  ira_object_t *first_thread_objects;
   /* Allocnos in thread forms a cycle list through the following
      member.  */
-  ira_allocno_t next_thread_allocno;
+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;
+  /* The freq sum of all thread allocno.  */
   /* All thread frequency.  Defined only for first thread allocno.  */
   int thread_freq;
   /* Sum of frequencies of hard register preferences of the allocno.  */
@@ -188,6 +196,9 @@ static bitmap coloring_allocno_bitmap;
    allocnos.  */
 static bitmap consideration_allocno_bitmap;
 
+/* Bitmap of allocnos which is not trivially colorable.  */
+static bitmap uncolorable_allocno_set;
+
 /* All allocnos sorted according their priorities.  */
 static ira_allocno_t *sorted_allocnos;
 
@@ -647,9 +658,13 @@ struct allocno_hard_regs_subnode
      Overall conflict size is
      left_conflict_subnodes_size
        + MIN (max_node_impact - left_conflict_subnodes_size,
-              left_conflict_size)
+	      left_conflict_size)
+     Use MIN here to ensure that the total conflict does not exceed
+     max_node_impact.
   */
+  /* The total conflict size of subnodes.  */
   short left_conflict_subnodes_size;
+  /* The maximum number of registers that the current node can use.  */
   short max_node_impact;
 };
 
@@ -758,6 +773,8 @@ form_allocno_hard_regs_nodes_forest (void)
       collect_allocno_hard_regs_cover (hard_regs_roots,
 				       allocno_data->profitable_hard_regs);
       allocno_hard_regs_node = NULL;
+      /* Find the ancestor node in forest which cover all nodes. The ancestor is
+	 a smallest superset of profitable_hard_regs.  */
       for (j = 0; hard_regs_node_vec.iterate (j, &node); j++)
 	allocno_hard_regs_node
 	  = (j == 0
@@ -990,6 +1007,8 @@ update_left_conflict_sizes_p (ira_allocno_t a,
 					removed_node->hard_regs->set));
   start = node_preorder_num * allocno_hard_regs_nodes_num;
   i = allocno_hard_regs_subnode_index[start + removed_node->preorder_num];
+  /* i < 0 means removed_node is parent of node instead of node is the parent of
+     removed_node.  */
   if (i < 0)
     i = 0;
   subnodes = allocno_hard_regs_subnodes + data->hard_regs_subnodes_start;
@@ -999,6 +1018,7 @@ update_left_conflict_sizes_p (ira_allocno_t a,
 	      - subnodes[i].left_conflict_subnodes_size,
 	      subnodes[i].left_conflict_size));
   subnodes[i].left_conflict_size -= size;
+  /* Update all ancestors for subnode i.  */
   for (;;)
     {
       conflict_size
@@ -1242,6 +1262,9 @@ struct update_cost_queue_elem
      connecting this allocno to the one being allocated.  */
   int divisor;
 
+  /* Hard register regno assigned to current ALLOCNO.  */
+  int hard_regno;
+
   /* Allocno from which we started chaining costs of connected
      allocnos. */
   ira_allocno_t start;
@@ -1308,7 +1331,7 @@ start_update_cost (void)
 /* Add (ALLOCNO, START, FROM, DIVISOR) to the end of update_cost_queue, unless
    ALLOCNO is already in the queue, or has NO_REGS class.  */
 static inline void
-queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
+queue_update_cost (ira_allocno_t allocno, int hard_regno, ira_allocno_t start,
 		   ira_allocno_t from, int divisor)
 {
   struct update_cost_queue_elem *elem;
@@ -1317,6 +1340,7 @@ queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
   if (elem->check != update_cost_check
       && ALLOCNO_CLASS (allocno) != NO_REGS)
     {
+      elem->hard_regno = hard_regno;
       elem->check = update_cost_check;
       elem->start = start;
       elem->from = from;
@@ -1334,8 +1358,8 @@ queue_update_cost (ira_allocno_t allocno, ira_allocno_t start,
    false if the queue was empty, otherwise make (*ALLOCNO, *START,
    *FROM, *DIVISOR) describe the removed element.  */
 static inline bool
-get_next_update_cost (ira_allocno_t *allocno, ira_allocno_t *start,
-		      ira_allocno_t *from, int *divisor)
+get_next_update_cost (ira_allocno_t *allocno, int *hard_regno,
+		      ira_allocno_t *start, ira_allocno_t *from, int *divisor)
 {
   struct update_cost_queue_elem *elem;
 
@@ -1348,6 +1372,8 @@ get_next_update_cost (ira_allocno_t *allocno, ira_allocno_t *start,
   *from = elem->from;
   *divisor = elem->divisor;
   update_cost_queue = elem->next;
+  if (hard_regno != NULL)
+    *hard_regno = elem->hard_regno;
   return true;
 }
 
@@ -1449,31 +1475,41 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
   enum reg_class rclass, aclass;
   ira_allocno_t another_allocno, start = allocno, from = NULL;
   ira_copy_t cp, next_cp;
+  ira_object_t another_obj;
+  unsigned int obj_index1, obj_index2;
 
   rclass = REGNO_REG_CLASS (hard_regno);
   do
     {
+      gcc_assert (hard_regno >= 0);
       mode = ALLOCNO_MODE (allocno);
       ira_init_register_move_cost_if_necessary (mode);
       for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == allocno)
+	  if (OBJECT_ALLOCNO (cp->first) == allocno)
 	    {
+	      obj_index1 = OBJECT_INDEX (cp->first);
+	      obj_index2 = OBJECT_INDEX (cp->second);
 	      next_cp = cp->next_first_allocno_copy;
-	      another_allocno = cp->second;
+	      another_obj = cp->second;
 	    }
-	  else if (cp->second == allocno)
+	  else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	    {
+	      obj_index1 = OBJECT_INDEX (cp->second);
+	      obj_index2 = OBJECT_INDEX (cp->first);
 	      next_cp = cp->next_second_allocno_copy;
-	      another_allocno = cp->first;
+	      another_obj = cp->first;
 	    }
 	  else
 	    gcc_unreachable ();
 
+	  another_allocno = OBJECT_ALLOCNO (another_obj);
 	  if (another_allocno == from
 	      || (ALLOCNO_COLOR_DATA (another_allocno) != NULL
-		  && (ALLOCNO_COLOR_DATA (allocno)->first_thread_allocno
-		      != ALLOCNO_COLOR_DATA (another_allocno)->first_thread_allocno)))
+		  && (ALLOCNO_COLOR_DATA (allocno)
+			->first_thread_objects[obj_index1]
+		      != ALLOCNO_COLOR_DATA (another_allocno)
+			   ->first_thread_objects[obj_index2])))
 	    continue;
 
 	  aclass = ALLOCNO_CLASS (another_allocno);
@@ -1482,6 +1518,8 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 	      || ALLOCNO_ASSIGNED_P (another_allocno))
 	    continue;
 
+	  ira_allocno_t first_allocno = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_allocno = OBJECT_ALLOCNO (cp->second);
 	  /* If we have different modes use the smallest one.  It is
 	     a sub-register move.  It is hard to predict what LRA
 	     will reload (the pseudo or its sub-register) but LRA
@@ -1489,14 +1527,21 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 	     register classes bigger modes might be invalid,
 	     e.g. DImode for AREG on x86.  For such cases the
 	     register move cost will be maximal.  */
-	  mode = narrower_subreg_mode (ALLOCNO_MODE (cp->first),
-				       ALLOCNO_MODE (cp->second));
+	  mode = narrower_subreg_mode (ALLOCNO_MODE (first_allocno),
+				       ALLOCNO_MODE (second_allocno));
 
 	  ira_init_register_move_cost_if_necessary (mode);
 
-	  cost = (cp->second == allocno
-		  ? ira_register_move_cost[mode][rclass][aclass]
-		  : ira_register_move_cost[mode][aclass][rclass]);
+	  cost = (second_allocno == allocno
+		    ? ira_register_move_cost[mode][rclass][aclass]
+		    : ira_register_move_cost[mode][aclass][rclass]);
+	  /* Adjust the hard regno for another_allocno for subreg copy.  */
+	  int start_regno = hard_regno;
+	  if (cp->insn && subreg_move_p (cp->first, cp->second))
+	    {
+	      int diff = OBJECT_START (cp->first) - OBJECT_START (cp->second);
+	      start_regno += (first_allocno == allocno ? diff : -diff);
+	    }
 	  if (decr_p)
 	    cost = -cost;
 
@@ -1505,25 +1550,30 @@ update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 
 	  if (internal_flag_ira_verbose > 5 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file,
-		     "          a%dr%d (hr%d): update cost by %d, conflict cost by %d\n",
-		     ALLOCNO_NUM (another_allocno), ALLOCNO_REGNO (another_allocno),
-		     hard_regno, update_cost, update_conflict_cost);
+		     "          a%dr%d (hr%d): update cost by %d, conflict "
+		     "cost by %d\n",
+		     ALLOCNO_NUM (another_allocno),
+		     ALLOCNO_REGNO (another_allocno), start_regno, update_cost,
+		     update_conflict_cost);
 	  if (update_cost == 0)
 	    continue;
 
-	  if (! update_allocno_cost (another_allocno, hard_regno,
-				     update_cost, update_conflict_cost))
+	  if (start_regno < 0
+	      || (start_regno + ALLOCNO_NREGS (another_allocno))
+		   > FIRST_PSEUDO_REGISTER
+	      || !update_allocno_cost (another_allocno, start_regno,
+				       update_cost, update_conflict_cost))
 	    continue;
-	  queue_update_cost (another_allocno, start, allocno,
+	  queue_update_cost (another_allocno, start_regno, start, allocno,
 			     divisor * COST_HOP_DIVISOR);
 	  if (record_p && ALLOCNO_COLOR_DATA (another_allocno) != NULL)
 	    ALLOCNO_COLOR_DATA (another_allocno)->update_cost_records
-	      = get_update_cost_record (hard_regno, divisor,
-					ALLOCNO_COLOR_DATA (another_allocno)
-					->update_cost_records);
+	      = get_update_cost_record (
+		start_regno, divisor,
+		ALLOCNO_COLOR_DATA (another_allocno)->update_cost_records);
 	}
-    }
-  while (get_next_update_cost (&allocno, &start, &from, &divisor));
+  } while (
+    get_next_update_cost (&allocno, &hard_regno, &start, &from, &divisor));
 }
 
 /* Decrease preferred ALLOCNO hard register costs and costs of
@@ -1632,23 +1682,25 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
   enum reg_class another_aclass;
   ira_allocno_t allocno, another_allocno, start, from;
   ira_copy_t cp, next_cp;
+  ira_object_t another_obj;
 
-  while (get_next_update_cost (&allocno, &start, &from, &divisor))
+  while (get_next_update_cost (&allocno, NULL, &start, &from, &divisor))
     for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
       {
-	if (cp->first == allocno)
+	if (OBJECT_ALLOCNO (cp->first) == allocno)
 	  {
 	    next_cp = cp->next_first_allocno_copy;
-	    another_allocno = cp->second;
+	    another_obj = cp->second;
 	  }
-	else if (cp->second == allocno)
+	else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	  {
 	    next_cp = cp->next_second_allocno_copy;
-	    another_allocno = cp->first;
+	    another_obj = cp->first;
 	  }
 	else
 	  gcc_unreachable ();
 
+	another_allocno = OBJECT_ALLOCNO (another_obj);
 	another_aclass = ALLOCNO_CLASS (another_allocno);
 	if (another_allocno == from
 	    || ALLOCNO_ASSIGNED_P (another_allocno)
@@ -1696,7 +1748,8 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR))
-	  queue_update_cost (another_allocno, start, from, divisor * COST_HOP_DIVISOR);
+	  queue_update_cost (another_allocno, -1, start, from,
+			     divisor * COST_HOP_DIVISOR);
       }
 }
 
@@ -2034,6 +2087,11 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
         {
 	  ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
+
+	  if (ALLOCNO_COLOR_DATA (a)->first_thread_allocno
+	      == ALLOCNO_COLOR_DATA (conflict_a)->first_thread_allocno)
+	    continue;
+
 	  enum reg_class conflict_aclass;
 	  allocno_color_data_t data = ALLOCNO_COLOR_DATA (conflict_a);
 
@@ -2225,7 +2283,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 		      continue;
 		    full_costs[j] -= conflict_costs[k];
 		  }
-	      queue_update_cost (conflict_a, conflict_a, NULL, COST_HOP_DIVISOR);
+	      queue_update_cost (conflict_a, -1, conflict_a, NULL,
+				 COST_HOP_DIVISOR);
 	    }
 	}
     }
@@ -2239,7 +2298,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
   if (! retry_p)
     {
       start_update_cost ();
-      queue_update_cost (a, a, NULL, COST_HOP_DIVISOR);
+      queue_update_cost (a, -1, a, NULL, COST_HOP_DIVISOR);
       update_conflict_hard_regno_costs (full_costs, aclass, false);
     }
   min_cost = min_full_cost = INT_MAX;
@@ -2264,17 +2323,17 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
       if (!HONOR_REG_ALLOC_ORDER)
 	{
 	  if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0)
-	  /* We need to save/restore the hard register in
-	     epilogue/prologue.  Therefore we increase the cost.  */
-	  {
-	    rclass = REGNO_REG_CLASS (hard_regno);
-	    add_cost = ((ira_memory_move_cost[mode][rclass][0]
-		         + ira_memory_move_cost[mode][rclass][1])
+	    /* We need to save/restore the hard register in
+	       epilogue/prologue.  Therefore we increase the cost.  */
+	    {
+	      rclass = REGNO_REG_CLASS (hard_regno);
+	      add_cost = ((ira_memory_move_cost[mode][rclass][0]
+			   + ira_memory_move_cost[mode][rclass][1])
 		        * saved_nregs / hard_regno_nregs (hard_regno,
 							  mode) - 1);
-	    cost += add_cost;
-	    full_cost += add_cost;
-	  }
+	      cost += add_cost;
+	      full_cost += add_cost;
+	    }
 	}
       if (min_cost > cost)
 	min_cost = cost;
@@ -2393,54 +2452,173 @@ copy_freq_compare_func (const void *v1p, const void *v2p)
   return cp1->num - cp2->num;
 }
 
-\f
+/* Return true if object OBJ1 conflict with OBJ2.  */
+static bool
+objects_conflict_by_live_ranges_p (ira_object_t obj1, ira_object_t obj2)
+{
+  rtx reg1, reg2;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  if (a1 == a2)
+    return false;
+  reg1 = regno_reg_rtx[ALLOCNO_REGNO (a1)];
+  reg2 = regno_reg_rtx[ALLOCNO_REGNO (a2)];
+  if (reg1 != NULL && reg2 != NULL
+      && ORIGINAL_REGNO (reg1) == ORIGINAL_REGNO (reg2))
+    return false;
+
+  /* We don't keep live ranges for caps because they can be quite big.
+     Use ranges of non-cap allocno from which caps are created.  */
+  a1 = get_cap_member (a1);
+  a2 = get_cap_member (a2);
+
+  obj1 = find_object (a1, OBJECT_START (obj1), OBJECT_NREGS (obj1));
+  obj2 = find_object (a2, OBJECT_START (obj2), OBJECT_NREGS (obj2));
+  return ira_live_ranges_intersect_p (OBJECT_LIVE_RANGES (obj1),
+				      OBJECT_LIVE_RANGES (obj2));
+}
 
-/* Return true if any allocno from thread of A1 conflicts with any
-   allocno from thread A2.  */
+/* Return true if any object from thread of OBJ1 conflicts with any
+   object from thread OBJ2.  */
 static bool
-allocno_thread_conflict_p (ira_allocno_t a1, ira_allocno_t a2)
+object_thread_conflict_p (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a, conflict_a;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+
+  gcc_assert (
+    obj1 != obj2
+    && ALLOCNO_COLOR_DATA (a1)->first_thread_objects[OBJECT_INDEX (obj1)]
+	 == obj1
+    && ALLOCNO_COLOR_DATA (a2)->first_thread_objects[OBJECT_INDEX (obj2)]
+	 == obj2);
+
+  ira_allocno_t first_thread_allocno1
+    = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+  ira_allocno_t first_thread_allocno2
+    = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno;
+
+  int offset
+    = (ALLOCNO_COLOR_DATA (a1)->first_thread_offset + OBJECT_START (obj1))
+      - (ALLOCNO_COLOR_DATA (a2)->first_thread_offset + OBJECT_START (obj2));
+
+  /* Update first_thread_allocno and thread_allocnos info.  */
+  bitmap thread_allocnos1
+    = ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_allocnos;
+  bitmap thread_allocnos2
+    = ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos;
+  gcc_assert (!bitmap_empty_p (thread_allocnos1)
+	      && !bitmap_empty_p (thread_allocnos2));
+  std::vector<ira_object_t> thread_objects_2;
 
-  for (a = ALLOCNO_COLOR_DATA (a2)->next_thread_allocno;;
-       a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
+  unsigned int i;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos2, 0, i, bi)
     {
-      for (conflict_a = ALLOCNO_COLOR_DATA (a1)->next_thread_allocno;;
-	   conflict_a = ALLOCNO_COLOR_DATA (conflict_a)->next_thread_allocno)
-	{
-	  if (allocnos_conflict_by_live_ranges_p (a, conflict_a))
-	    return true;
-	  if (conflict_a == a1)
-	    break;
-	}
-      if (a == a2)
-	break;
+      ira_allocno_object_iterator oi;
+      ira_object_t obj;
+      FOR_EACH_ALLOCNO_OBJECT (ira_allocnos[i], obj, oi)
+	thread_objects_2.push_back (obj);
+    }
+
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos1, 0, i, bi)
+    {
+      ira_allocno_object_iterator oi;
+      ira_object_t obj;
+      ira_allocno_t a = ira_allocnos[i];
+      FOR_EACH_ALLOCNO_OBJECT (ira_allocnos[i], obj, oi)
+	for (ira_object_t other_obj : thread_objects_2)
+	  {
+	    int thread_start1 = ALLOCNO_COLOR_DATA (a)->first_thread_offset
+				+ OBJECT_START (obj);
+	    int thread_start2 = ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (other_obj))
+				  ->first_thread_offset
+				+ offset + OBJECT_START (other_obj);
+	    if (!(thread_start1 + OBJECT_NREGS (obj) <= thread_start2
+		  || thread_start2 + OBJECT_NREGS (other_obj) <= thread_start1)
+		&& objects_conflict_by_live_ranges_p (obj, other_obj))
+	      return true;
+	  }
     }
+
   return false;
 }
 
-/* Merge two threads given correspondingly by their first allocnos T1
-   and T2 (more accurately merging T2 into T1).  */
+/* Merge two threads given correspondingly by their first objects OBJ1
+   and OBJ2 (more accurately merging OBJ2 into OBJ1).  */
 static void
-merge_threads (ira_allocno_t t1, ira_allocno_t t2)
+merge_threads (ira_object_t obj1, ira_object_t obj2)
 {
-  ira_allocno_t a, next, last;
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+
+  gcc_assert (
+    obj1 != obj2
+    && ALLOCNO_COLOR_DATA (a1)->first_thread_objects[OBJECT_INDEX (obj1)]
+	 == obj1
+    && ALLOCNO_COLOR_DATA (a2)->first_thread_objects[OBJECT_INDEX (obj2)]
+	 == obj2);
+
+  ira_allocno_t first_thread_allocno1
+    = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+  ira_allocno_t first_thread_allocno2
+    = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno;
+
+  gcc_assert (first_thread_allocno1 != first_thread_allocno2);
 
-  gcc_assert (t1 != t2
-	      && ALLOCNO_COLOR_DATA (t1)->first_thread_allocno == t1
-	      && ALLOCNO_COLOR_DATA (t2)->first_thread_allocno == t2);
-  for (last = t2, a = ALLOCNO_COLOR_DATA (t2)->next_thread_allocno;;
-       a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
+  int offset
+    = (ALLOCNO_COLOR_DATA (a1)->first_thread_offset + OBJECT_START (obj1))
+      - (ALLOCNO_COLOR_DATA (a2)->first_thread_offset + OBJECT_START (obj2));
+
+  /* Update first_thread_allocno and thread_allocnos info.  */
+  unsigned int i;
+  bitmap_iterator bi;
+  bitmap thread_allocnos2
+    = ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos;
+  bitmap thread_allocnos1
+    = ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_allocnos;
+  gcc_assert (!bitmap_empty_p (thread_allocnos1)
+	      && !bitmap_empty_p (thread_allocnos2));
+  EXECUTE_IF_SET_IN_BITMAP (thread_allocnos2, 0, i, bi)
+    {
+      ira_allocno_t a = ira_allocnos[i];
+      gcc_assert (ALLOCNO_COLOR_DATA (a)->first_thread_allocno
+		  == first_thread_allocno2);
+      /* Update first_thread_allocno and first_thread_offset filed.  */
+      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = first_thread_allocno1;
+      ALLOCNO_COLOR_DATA (a)->first_thread_offset += offset;
+      bitmap_set_bit (thread_allocnos1, i);
+    }
+  bitmap_clear (thread_allocnos2);
+  ira_free_bitmap (thread_allocnos2);
+  ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_allocnos = NULL;
+
+  ira_object_t last_obj = obj2;
+  for (ira_object_t next_obj
+       = ALLOCNO_COLOR_DATA (a2)->next_thread_objects[OBJECT_INDEX (obj2)];
+       ; next_obj = ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (next_obj))
+		      ->next_thread_objects[OBJECT_INDEX (next_obj)])
     {
-      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = t1;
-      if (a == t2)
+      ira_allocno_t next_a = OBJECT_ALLOCNO (next_obj);
+      ALLOCNO_COLOR_DATA (next_a)->first_thread_objects[OBJECT_INDEX (next_obj)]
+	= obj1;
+      gcc_assert (ALLOCNO_COLOR_DATA (next_a)->first_thread_allocno
+		  == first_thread_allocno1);
+      gcc_assert (bitmap_bit_p (thread_allocnos1, ALLOCNO_NUM (next_a)));
+      if (next_obj == obj2)
 	break;
-      last = a;
+      last_obj = next_obj;
     }
-  next = ALLOCNO_COLOR_DATA (t1)->next_thread_allocno;
-  ALLOCNO_COLOR_DATA (t1)->next_thread_allocno = t2;
-  ALLOCNO_COLOR_DATA (last)->next_thread_allocno = next;
-  ALLOCNO_COLOR_DATA (t1)->thread_freq += ALLOCNO_COLOR_DATA (t2)->thread_freq;
+  /* Add OBJ2's threads chain to OBJ1.  */
+  ira_object_t temp_obj
+    = ALLOCNO_COLOR_DATA (a1)->next_thread_objects[OBJECT_INDEX (obj1)];
+  ALLOCNO_COLOR_DATA (a1)->next_thread_objects[OBJECT_INDEX (obj1)] = obj2;
+  ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (last_obj))
+    ->next_thread_objects[OBJECT_INDEX (last_obj)]
+    = temp_obj;
+
+  ALLOCNO_COLOR_DATA (first_thread_allocno1)->thread_freq
+    += ALLOCNO_COLOR_DATA (first_thread_allocno2)->thread_freq;
 }
 
 /* Create threads by processing CP_NUM copies from sorted copies.  We
@@ -2448,7 +2626,6 @@ merge_threads (ira_allocno_t t1, ira_allocno_t t2)
 static void
 form_threads_from_copies (int cp_num)
 {
-  ira_allocno_t a, thread1, thread2;
   ira_copy_t cp;
 
   qsort (sorted_copies, cp_num, sizeof (ira_copy_t), copy_freq_compare_func);
@@ -2457,33 +2634,43 @@ form_threads_from_copies (int cp_num)
   for (int i = 0; i < cp_num; i++)
     {
       cp = sorted_copies[i];
-      thread1 = ALLOCNO_COLOR_DATA (cp->first)->first_thread_allocno;
-      thread2 = ALLOCNO_COLOR_DATA (cp->second)->first_thread_allocno;
-      if (thread1 == thread2)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      ira_object_t thread1 = ALLOCNO_COLOR_DATA (first_a)
+			       ->first_thread_objects[OBJECT_INDEX (cp->first)];
+      ira_object_t thread2
+	= ALLOCNO_COLOR_DATA (second_a)
+	    ->first_thread_objects[OBJECT_INDEX (cp->second)];
+      if (thread1 == thread2
+	  || ALLOCNO_COLOR_DATA (first_a)->first_thread_allocno
+	       == ALLOCNO_COLOR_DATA (second_a)->first_thread_allocno)
 	continue;
-      if (! allocno_thread_conflict_p (thread1, thread2))
+      if (!object_thread_conflict_p (thread1, thread2))
 	{
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf
-		(ira_dump_file,
-		 "        Forming thread by copy %d:a%dr%d-a%dr%d (freq=%d):\n",
-		 cp->num, ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-		 ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second),
-		 cp->freq);
+	    fprintf (
+	      ira_dump_file,
+	      "        Forming thread by copy %d:a%dr%d-a%dr%d (freq=%d):\n",
+	      cp->num, ALLOCNO_NUM (first_a), ALLOCNO_REGNO (first_a),
+	      ALLOCNO_NUM (second_a), ALLOCNO_REGNO (second_a), cp->freq);
 	  merge_threads (thread1, thread2);
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	    {
-	      thread1 = ALLOCNO_COLOR_DATA (thread1)->first_thread_allocno;
-	      fprintf (ira_dump_file, "          Result (freq=%d): a%dr%d(%d)",
-		       ALLOCNO_COLOR_DATA (thread1)->thread_freq,
-		       ALLOCNO_NUM (thread1), ALLOCNO_REGNO (thread1),
-		       ALLOCNO_FREQ (thread1));
-	      for (a = ALLOCNO_COLOR_DATA (thread1)->next_thread_allocno;
-		   a != thread1;
-		   a = ALLOCNO_COLOR_DATA (a)->next_thread_allocno)
-		fprintf (ira_dump_file, " a%dr%d(%d)",
-			 ALLOCNO_NUM (a), ALLOCNO_REGNO (a),
-			 ALLOCNO_FREQ (a));
+	      ira_allocno_t a1 = OBJECT_ALLOCNO (thread1);
+	      ira_allocno_t first_thread_allocno
+		= ALLOCNO_COLOR_DATA (a1)->first_thread_allocno;
+	      fprintf (ira_dump_file, "          Result (freq=%d):",
+		       ALLOCNO_COLOR_DATA (first_thread_allocno)->thread_freq);
+	      unsigned int i;
+	      bitmap_iterator bi;
+	      EXECUTE_IF_SET_IN_BITMAP (
+		ALLOCNO_COLOR_DATA (first_thread_allocno)->thread_allocnos, 0,
+		i, bi)
+		{
+		  ira_allocno_t a = ira_allocnos[i];
+		  fprintf (ira_dump_file, " a%dr%d(%d)", ALLOCNO_NUM (a),
+			   ALLOCNO_REGNO (a), ALLOCNO_FREQ (a));
+		}
 	      fprintf (ira_dump_file, "\n");
 	    }
 	}
@@ -2503,13 +2690,27 @@ form_threads_from_bucket (ira_allocno_t bucket)
     {
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == a)
+	  bool intersect_p = hard_reg_set_intersect_p (
+	    ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (cp->first))
+	      ->profitable_hard_regs,
+	    ALLOCNO_COLOR_DATA (OBJECT_ALLOCNO (cp->second))
+	      ->profitable_hard_regs);
+	  if (OBJECT_ALLOCNO (cp->first) == a)
 	    {
 	      next_cp = cp->next_first_allocno_copy;
+	      if (!intersect_p)
+		continue;
+	      sorted_copies[cp_num++] = cp;
+	    }
+	  else if (OBJECT_ALLOCNO (cp->second) == a)
+	    {
+	      next_cp = cp->next_second_allocno_copy;
+	      if (!intersect_p
+		  || !bitmap_bit_p (uncolorable_allocno_set,
+				    ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))))
+		continue;
 	      sorted_copies[cp_num++] = cp;
 	    }
-	  else if (cp->second == a)
-	    next_cp = cp->next_second_allocno_copy;
 	  else
 	    gcc_unreachable ();
 	}
@@ -2531,15 +2732,15 @@ form_threads_from_colorable_allocno (ira_allocno_t a)
 	     ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      if (OBJECT_ALLOCNO (cp->first) == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_a = OBJECT_ALLOCNO (cp->second);
 	}
-      else if (cp->second == a)
+      else if (OBJECT_ALLOCNO (cp->second) == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_a = OBJECT_ALLOCNO (cp->first);
 	}
       else
 	gcc_unreachable ();
@@ -2564,8 +2765,16 @@ init_allocno_threads (void)
     {
       a = ira_allocnos[j];
       /* Set up initial thread data: */
-      ALLOCNO_COLOR_DATA (a)->first_thread_allocno
-	= ALLOCNO_COLOR_DATA (a)->next_thread_allocno = a;
+      for (int i = 0; i < ALLOCNO_NUM_OBJECTS (a); i += 1)
+	{
+	  ira_object_t obj = ALLOCNO_OBJECT (a, i);
+	  ALLOCNO_COLOR_DATA (a)->first_thread_objects[i]
+	    = ALLOCNO_COLOR_DATA (a)->next_thread_objects[i] = obj;
+	}
+      ALLOCNO_COLOR_DATA (a)->first_thread_allocno = a;
+      ALLOCNO_COLOR_DATA (a)->first_thread_offset = 0;
+      ALLOCNO_COLOR_DATA (a)->thread_allocnos = ira_allocate_bitmap ();
+      bitmap_set_bit (ALLOCNO_COLOR_DATA (a)->thread_allocnos, ALLOCNO_NUM (a));
       ALLOCNO_COLOR_DATA (a)->thread_freq = ALLOCNO_FREQ (a);
       ALLOCNO_COLOR_DATA (a)->hard_reg_prefs = 0;
       for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
@@ -2608,6 +2817,9 @@ add_allocno_to_bucket (ira_allocno_t a, ira_allocno_t *bucket_ptr)
   ira_allocno_t first_a;
   allocno_color_data_t data;
 
+  if (bucket_ptr == &uncolorable_allocno_bucket)
+    bitmap_set_bit (uncolorable_allocno_set, ALLOCNO_NUM (a));
+
   if (bucket_ptr == &uncolorable_allocno_bucket
       && ALLOCNO_CLASS (a) != NO_REGS)
     {
@@ -2734,6 +2946,9 @@ delete_allocno_from_bucket (ira_allocno_t allocno, ira_allocno_t *bucket_ptr)
 {
   ira_allocno_t prev_allocno, next_allocno;
 
+  if (bucket_ptr == &uncolorable_allocno_bucket)
+    bitmap_clear_bit (uncolorable_allocno_set, ALLOCNO_NUM (allocno));
+
   if (bucket_ptr == &uncolorable_allocno_bucket
       && ALLOCNO_CLASS (allocno) != NO_REGS)
     {
@@ -3227,16 +3442,23 @@ allocno_copy_cost_saving (ira_allocno_t allocno, int hard_regno)
     rclass = ALLOCNO_CLASS (allocno);
   for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
     {
-      if (cp->first == allocno)
+      if (OBJECT_ALLOCNO (cp->first) == allocno)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  if (ALLOCNO_HARD_REGNO (cp->second) != hard_regno)
+	  ira_allocno_t another_a = OBJECT_ALLOCNO (cp->second);
+	  if (ALLOCNO_HARD_REGNO (another_a) > -1
+	      && hard_regno + OBJECT_START (cp->first)
+		   != ALLOCNO_HARD_REGNO (another_a)
+			+ OBJECT_START (cp->second))
 	    continue;
 	}
-      else if (cp->second == allocno)
+      else if (OBJECT_ALLOCNO (cp->second) == allocno)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  if (ALLOCNO_HARD_REGNO (cp->first) != hard_regno)
+	  ira_allocno_t another_a = OBJECT_ALLOCNO (cp->first);
+	  if (ALLOCNO_HARD_REGNO (another_a) > -1
+	      && hard_regno + OBJECT_START (cp->second)
+		   != ALLOCNO_HARD_REGNO (another_a) + OBJECT_START (cp->first))
 	    continue;
 	}
       else
@@ -3643,6 +3865,7 @@ color_allocnos (void)
       /* Put the allocnos into the corresponding buckets.  */
       colorable_allocno_bucket = NULL;
       uncolorable_allocno_bucket = NULL;
+      bitmap_clear (uncolorable_allocno_set);
       EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi)
 	{
 	  a = ira_allocnos[i];
@@ -3740,10 +3963,12 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
   bitmap_copy (coloring_allocno_bitmap, loop_tree_node->all_allocnos);
   bitmap_copy (consideration_allocno_bitmap, coloring_allocno_bitmap);
   n = 0;
+  size_t obj_n = 0;
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
       n++;
+      obj_n += ALLOCNO_NUM_OBJECTS (a);
       if (! ALLOCNO_ASSIGNED_P (a))
 	continue;
       bitmap_clear_bit (coloring_allocno_bitmap, ALLOCNO_NUM (a));
@@ -3752,20 +3977,29 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
     = (allocno_color_data_t) ira_allocate (sizeof (struct allocno_color_data)
 					   * n);
   memset (allocno_color_data, 0, sizeof (struct allocno_color_data) * n);
+  ira_object_t *thread_objects
+    = (ira_object_t *) ira_allocate (sizeof (ira_object_t *) * obj_n * 2);
+  memset (thread_objects, 0, sizeof (ira_object_t *) * obj_n * 2);
   curr_allocno_process = 0;
   n = 0;
+  size_t obj_offset = 0;
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
       ALLOCNO_ADD_DATA (a) = allocno_color_data + n;
+      ALLOCNO_COLOR_DATA (a)->first_thread_objects
+	= thread_objects + obj_offset;
+      obj_offset += ALLOCNO_NUM_OBJECTS (a);
+      ALLOCNO_COLOR_DATA (a)->next_thread_objects = thread_objects + obj_offset;
+      obj_offset += ALLOCNO_NUM_OBJECTS (a);
       n++;
     }
+  gcc_assert (obj_n * 2 == obj_offset);
   init_allocno_threads ();
   /* Color all mentioned allocnos including transparent ones.  */
   color_allocnos ();
   /* Process caps.  They are processed just once.  */
-  if (flag_ira_region == IRA_REGION_MIXED
-      || flag_ira_region == IRA_REGION_ALL)
+  if (flag_ira_region == IRA_REGION_MIXED || flag_ira_region == IRA_REGION_ALL)
     EXECUTE_IF_SET_IN_BITMAP (loop_tree_node->all_allocnos, 0, j, bi)
       {
 	a = ira_allocnos[j];
@@ -3881,12 +4115,22 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
 	    }
 	}
     }
-  ira_free (allocno_color_data);
   EXECUTE_IF_SET_IN_BITMAP (consideration_allocno_bitmap, 0, j, bi)
     {
       a = ira_allocnos[j];
+      gcc_assert (a != NULL);
+      ALLOCNO_COLOR_DATA (a)->first_thread_objects = NULL;
+      ALLOCNO_COLOR_DATA (a)->next_thread_objects = NULL;
+      if (ALLOCNO_COLOR_DATA (a)->thread_allocnos != NULL)
+	{
+	  bitmap_clear (ALLOCNO_COLOR_DATA (a)->thread_allocnos);
+	  ira_free_bitmap (ALLOCNO_COLOR_DATA (a)->thread_allocnos);
+	  ALLOCNO_COLOR_DATA (a)->thread_allocnos = NULL;
+	}
       ALLOCNO_ADD_DATA (a) = NULL;
     }
+  ira_free (allocno_color_data);
+  ira_free (thread_objects);
 }
 
 /* Initialize the common data for coloring and calls functions to do
@@ -4080,15 +4324,17 @@ update_curr_costs (ira_allocno_t a)
   ira_init_register_move_cost_if_necessary (mode);
   for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
     {
-      if (cp->first == a)
+      ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+      ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+      if (first_a == a)
 	{
 	  next_cp = cp->next_first_allocno_copy;
-	  another_a = cp->second;
+	  another_a = second_a;
 	}
-      else if (cp->second == a)
+      else if (second_a == a)
 	{
 	  next_cp = cp->next_second_allocno_copy;
-	  another_a = cp->first;
+	  another_a = first_a;
 	}
       else
 	gcc_unreachable ();
@@ -4100,9 +4346,8 @@ update_curr_costs (ira_allocno_t a)
       i = ira_class_hard_reg_index[aclass][hard_regno];
       if (i < 0)
 	continue;
-      cost = (cp->first == a
-	      ? ira_register_move_cost[mode][rclass][aclass]
-	      : ira_register_move_cost[mode][aclass][rclass]);
+      cost = (first_a == a ? ira_register_move_cost[mode][rclass][aclass]
+			   : ira_register_move_cost[mode][aclass][rclass]);
       ira_allocate_and_set_or_copy_costs
 	(&ALLOCNO_UPDATED_HARD_REG_COSTS (a), aclass, ALLOCNO_CLASS_COST (a),
 	 ALLOCNO_HARD_REG_COSTS (a));
@@ -4349,21 +4594,23 @@ coalesce_allocnos (void)
 	continue;
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
 	{
-	  if (cp->first == a)
+	  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+	  if (first_a == a)
 	    {
 	      next_cp = cp->next_first_allocno_copy;
-	      regno = ALLOCNO_REGNO (cp->second);
+	      regno = ALLOCNO_REGNO (second_a);
 	      /* For priority coloring we coalesce allocnos only with
 		 the same allocno class not with intersected allocno
 		 classes as it were possible.  It is done for
 		 simplicity.  */
 	      if ((cp->insn != NULL || cp->constraint_p)
-		  && ALLOCNO_ASSIGNED_P (cp->second)
-		  && ALLOCNO_HARD_REGNO (cp->second) < 0
-		  && ! ira_equiv_no_lvalue_p (regno))
+		  && ALLOCNO_ASSIGNED_P (second_a)
+		  && ALLOCNO_HARD_REGNO (second_a) < 0
+		  && !ira_equiv_no_lvalue_p (regno))
 		sorted_copies[cp_num++] = cp;
 	    }
-	  else if (cp->second == a)
+	  else if (second_a == a)
 	    next_cp = cp->next_second_allocno_copy;
 	  else
 	    gcc_unreachable ();
@@ -4376,17 +4623,18 @@ coalesce_allocnos (void)
       for (i = 0; i < cp_num; i++)
 	{
 	  cp = sorted_copies[i];
-	  if (! coalesced_allocno_conflict_p (cp->first, cp->second))
+	  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+	  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+	  if (!coalesced_allocno_conflict_p (first_a, second_a))
 	    {
 	      allocno_coalesced_p = true;
 	      if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-		fprintf
-		  (ira_dump_file,
-		   "      Coalescing copy %d:a%dr%d-a%dr%d (freq=%d)\n",
-		   cp->num, ALLOCNO_NUM (cp->first), ALLOCNO_REGNO (cp->first),
-		   ALLOCNO_NUM (cp->second), ALLOCNO_REGNO (cp->second),
-		   cp->freq);
-	      merge_allocnos (cp->first, cp->second);
+		fprintf (ira_dump_file,
+			 "      Coalescing copy %d:a%dr%d-a%dr%d (freq=%d)\n",
+			 cp->num, ALLOCNO_NUM (first_a),
+			 ALLOCNO_REGNO (first_a), ALLOCNO_NUM (second_a),
+			 ALLOCNO_REGNO (second_a), cp->freq);
+	      merge_allocnos (first_a, second_a);
 	      i++;
 	      break;
 	    }
@@ -4395,8 +4643,11 @@ coalesce_allocnos (void)
       for (n = 0; i < cp_num; i++)
 	{
 	  cp = sorted_copies[i];
-	  if (allocno_coalesce_data[ALLOCNO_NUM (cp->first)].first
-	      != allocno_coalesce_data[ALLOCNO_NUM (cp->second)].first)
+	  if (allocno_coalesce_data[ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first))]
+		.first
+	      != allocno_coalesce_data[ALLOCNO_NUM (
+					 OBJECT_ALLOCNO (cp->second))]
+		   .first)
 	    sorted_copies[n++] = cp;
 	}
       cp_num = n;
@@ -5070,15 +5321,15 @@ ira_reuse_stack_slot (int regno, poly_uint64 inherent_size,
 	       cp != NULL;
 	       cp = next_cp)
 	    {
-	      if (cp->first == allocno)
+	      if (OBJECT_ALLOCNO (cp->first) == allocno)
 		{
 		  next_cp = cp->next_first_allocno_copy;
-		  another_allocno = cp->second;
+		  another_allocno = OBJECT_ALLOCNO (cp->second);
 		}
-	      else if (cp->second == allocno)
+	      else if (OBJECT_ALLOCNO (cp->second) == allocno)
 		{
 		  next_cp = cp->next_second_allocno_copy;
-		  another_allocno = cp->first;
+		  another_allocno = OBJECT_ALLOCNO (cp->first);
 		}
 	      else
 		gcc_unreachable ();
@@ -5274,6 +5525,7 @@ ira_initiate_assign (void)
     = (ira_allocno_t *) ira_allocate (sizeof (ira_allocno_t)
 				      * ira_allocnos_num);
   consideration_allocno_bitmap = ira_allocate_bitmap ();
+  uncolorable_allocno_set = ira_allocate_bitmap ();
   initiate_cost_update ();
   allocno_priorities = (int *) ira_allocate (sizeof (int) * ira_allocnos_num);
   sorted_copies = (ira_copy_t *) ira_allocate (ira_copies_num
@@ -5286,6 +5538,7 @@ ira_finish_assign (void)
 {
   ira_free (sorted_allocnos);
   ira_free_bitmap (consideration_allocno_bitmap);
+  ira_free_bitmap (uncolorable_allocno_set);
   finish_cost_update ();
   ira_free (allocno_priorities);
   ira_free (sorted_copies);
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index 0585ad10043..7aeed7202ce 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -173,25 +173,115 @@ build_conflict_bit_table (void)
   sparseset_free (objects_live);
   return true;
 }
-\f
-/* Return true iff allocnos A1 and A2 cannot be allocated to the same
-   register due to conflicts.  */
 
-static bool
-allocnos_conflict_for_copy_p (ira_allocno_t a1, ira_allocno_t a2)
+/* Check that X is REG or SUBREG of REG.  */
+#define REG_SUBREG_P(x)                                                        \
+  (REG_P (x) || (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))))
+
+/* Return true if OBJ1 and OBJ2 can be a move INSN.  */
+bool
+subreg_move_p (ira_object_t obj1, ira_object_t obj2)
 {
-  /* Due to the fact that we canonicalize conflicts (see
-     record_object_conflict), we only need to test for conflicts of
-     the lowest order words.  */
-  ira_object_t obj1 = ALLOCNO_OBJECT (a1, 0);
-  ira_object_t obj2 = ALLOCNO_OBJECT (a2, 0);
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  return ALLOCNO_CLASS (a1) != NO_REGS && ALLOCNO_CLASS (a2) != NO_REGS
+	 && (ALLOCNO_TRACK_SUBREG_P (a1) || ALLOCNO_TRACK_SUBREG_P (a2))
+	 && OBJECT_NREGS (obj1) == OBJECT_NREGS (obj2)
+	 && (OBJECT_NREGS (obj1) != ALLOCNO_NREGS (a1)
+	     || OBJECT_NREGS (obj2) != ALLOCNO_NREGS (a2));
+}
 
-  return OBJECTS_CONFLICT_P (obj1, obj2);
+/* Return true if ORIG_DEST_REG and ORIG_SRC_REG can be a move INSN.  */
+bool
+subreg_move_p (rtx orig_dest_reg, rtx orig_src_reg)
+{
+  gcc_assert (REG_SUBREG_P (orig_dest_reg) && REG_SUBREG_P (orig_src_reg));
+  rtx reg1
+    = SUBREG_P (orig_dest_reg) ? SUBREG_REG (orig_dest_reg) : orig_dest_reg;
+  rtx reg2 = SUBREG_P (orig_src_reg) ? SUBREG_REG (orig_src_reg) : orig_src_reg;
+  if (HARD_REGISTER_P (reg1) || HARD_REGISTER_P (reg2))
+    return false;
+  ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
+  ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
+  ira_object_t obj1 = find_object (a1, orig_dest_reg);
+  ira_object_t obj2 = find_object (a2, orig_src_reg);
+  return subreg_move_p (obj1, obj2);
 }
 
-/* Check that X is REG or SUBREG of REG.  */
-#define REG_SUBREG_P(x)							\
-   (REG_P (x) || (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))))
+/* Return true if OBJ1 and OBJ2 can allocate to the same register.  */
+static bool
+regs_non_conflict_for_copy_p (ira_object_t obj1, ira_object_t obj2,
+			      bool is_move, bool offset_equal)
+{
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
+  ira_allocno_t a2 = OBJECT_ALLOCNO (obj2);
+  if (is_move && subreg_move_p (obj1, obj2))
+    {
+      if (OBJECTS_CONFLICT_P (obj1, obj2))
+	return false;
+      /* Assume a1 allocate to `OBJECT_START (obj2)` and a2 allocate to
+	 `OBJECT_START (obj1)` hard register, so both objects can use the same
+	 hard register `OBJECT_START (obj1) + OBJECT_START (obj2)`.  */
+      int start_regno1 = OBJECT_START (obj2);
+      int start_regno2 = OBJECT_START (obj1);
+
+      ira_object_t obj_a, obj_b;
+      ira_allocno_object_iterator oi_a, oi_b;
+      FOR_EACH_ALLOCNO_OBJECT (a1, obj_a, oi_a)
+	FOR_EACH_ALLOCNO_OBJECT (a2, obj_b, oi_b)
+	  /* If there have a conflict between a1 and a2 and prevent the
+	     allocation before, then obj1 and obj2 cannot be a copy.  */
+	  if (OBJECTS_CONFLICT_P (obj_a, obj_b)
+	      && !(start_regno1 + OBJECT_START (obj_a) + OBJECT_NREGS (obj_a)
+		     <= (start_regno2 + OBJECT_START (obj_b))
+		   || start_regno2 + OBJECT_START (obj_b) + OBJECT_NREGS (obj_b)
+			<= (start_regno1 + OBJECT_START (obj_a))))
+	      return false;
+
+      return true;
+    }
+  else
+    {
+      /* For normal case, make sure full_obj1 and full_obj2 can allocate to the
+	 same register.  */
+      ira_object_t full_obj1 = find_object (a1, 0, ALLOCNO_NREGS (a1));
+      ira_object_t full_obj2 = find_object (a2, 0, ALLOCNO_NREGS (a2));
+      return !OBJECTS_CONFLICT_P (full_obj1, full_obj2) && offset_equal;
+    }
+}
+
+/* Return true if ORIG_REG offset align in ALLOCNO_UNIT_SIZE (A) and times of
+   ALLOCNO_UNIT_SIZE (A). Use to forbidden bellow rtl which has a subreg move to
+   create copy (from testsuite/gcc.dg/vect/vect-simd-20.c on AArch64). Suppose
+   they are all allocated to the fourth register, that is, pseudo 127 is
+   allocated to w4, and pseudo 149 is allocated to x4 and x5. Then the third
+   instruction can be safely deleted without affecting the result of pseudo 149.
+   But when the second instruction is executed, the upper 32 bits of x4 will be
+   set to 0 (the behavior of the add instruction), that is to say, the result of
+   pseudo 149 is modified, and its 32~63 bits are set to 0, Not the desired
+   result.
+
+     (set (reg:SI 127)
+	  (subreg:SI (reg:TI 149) 0))
+     ...
+     (set (reg:SI 127)
+	  (plus:SI (reg:SI 127)
+		   (reg:SI 180)))
+     ...
+     (set (zero_extract:DI (subreg:DI (reg:TI 149) 0)
+			   (const_int 32 [0x20])
+			   (const_int 0 [0]))
+	  (subreg:DI (reg:SI 127) 0))  */
+static bool
+subreg_reg_align_and_times_p (ira_allocno_t a, rtx orig_reg)
+{
+  if (!has_subreg_object_p (a) || !SUBREG_P (orig_reg))
+    return true;
+
+  return multiple_p (SUBREG_BYTE (orig_reg), ALLOCNO_UNIT_SIZE (a))
+	 && multiple_p (GET_MODE_SIZE (GET_MODE (orig_reg)),
+			ALLOCNO_UNIT_SIZE (a));
+}
 
 /* Return X if X is a REG, otherwise it should be SUBREG of REG and
    the function returns the reg in this case.  *OFFSET will be set to
@@ -237,8 +327,9 @@ get_freq_for_shuffle_copy (int freq)
    SINGLE_INPUT_OP_HAS_CSTR_P is only meaningful when constraint_p
    is true, see function ira_get_dup_out_num for its meaning.  */
 static bool
-process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
-		       int freq, bool single_input_op_has_cstr_p = true)
+process_regs_for_copy (rtx orig_reg1, rtx orig_reg2, bool constraint_p,
+		       rtx_insn *insn, int freq,
+		       bool single_input_op_has_cstr_p = true)
 {
   int allocno_preferenced_hard_regno, index, offset1, offset2;
   int cost, conflict_cost, move_cost;
@@ -248,10 +339,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
   machine_mode mode;
   ira_copy_t cp;
 
-  gcc_assert (REG_SUBREG_P (reg1) && REG_SUBREG_P (reg2));
-  only_regs_p = REG_P (reg1) && REG_P (reg2);
-  reg1 = go_through_subreg (reg1, &offset1);
-  reg2 = go_through_subreg (reg2, &offset2);
+  gcc_assert (REG_SUBREG_P (orig_reg1) && REG_SUBREG_P (orig_reg2));
+  only_regs_p = REG_P (orig_reg1) && REG_P (orig_reg2);
+  rtx reg1 = go_through_subreg (orig_reg1, &offset1);
+  rtx reg2 = go_through_subreg (orig_reg2, &offset2);
   /* Set up hard regno preferenced by allocno.  If allocno gets the
      hard regno the copy (or potential move) insn will be removed.  */
   if (HARD_REGISTER_P (reg1))
@@ -270,13 +361,17 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, rtx_insn *insn,
     {
       ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
       ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
+      ira_object_t obj1 = find_object (a1, orig_reg1);
+      ira_object_t obj2 = find_object (a2, orig_reg2);
 
-      if (!allocnos_conflict_for_copy_p (a1, a2)
-	  && offset1 == offset2
+      if (subreg_reg_align_and_times_p (a1, orig_reg1)
+	  && subreg_reg_align_and_times_p (a2, orig_reg2)
+	  && regs_non_conflict_for_copy_p (obj1, obj2, insn != NULL,
+					   offset1 == offset2)
 	  && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
 			GET_MODE_PRECISION (ALLOCNO_MODE (a2))))
 	{
-	  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
+	  cp = ira_add_allocno_copy (obj1, obj2, freq, constraint_p, insn,
 				     ira_curr_loop_tree_node);
 	  bitmap_set_bit (ira_curr_loop_tree_node->local_copies, cp->num);
 	  return true;
@@ -438,16 +533,15 @@ add_insn_allocno_copies (rtx_insn *insn)
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
   if (freq == 0)
     freq = 1;
-  if ((set = single_set (insn)) != NULL_RTX
-      && REG_SUBREG_P (SET_DEST (set)) && REG_SUBREG_P (SET_SRC (set))
-      && ! side_effects_p (set)
-      && find_reg_note (insn, REG_DEAD,
-			REG_P (SET_SRC (set))
-			? SET_SRC (set)
-			: SUBREG_REG (SET_SRC (set))) != NULL_RTX)
+  if ((set = single_set (insn)) != NULL_RTX && REG_SUBREG_P (SET_DEST (set))
+      && REG_SUBREG_P (SET_SRC (set)) && !side_effects_p (set)
+      && (find_reg_note (insn, REG_DEAD,
+			 REG_P (SET_SRC (set)) ? SET_SRC (set)
+					       : SUBREG_REG (SET_SRC (set)))
+	    != NULL_RTX
+	  || subreg_move_p (SET_DEST (set), SET_SRC (set))))
     {
-      process_regs_for_copy (SET_SRC (set), SET_DEST (set),
-			     false, insn, freq);
+      process_regs_for_copy (SET_SRC (set), SET_DEST (set), false, insn, freq);
       return;
     }
   /* Fast check of possibility of constraint or shuffle copies.  If
@@ -521,16 +615,23 @@ propagate_copies (void)
 
   FOR_EACH_COPY (cp, ci)
     {
-      a1 = cp->first;
-      a2 = cp->second;
+      a1 = OBJECT_ALLOCNO (cp->first);
+      a2 = OBJECT_ALLOCNO (cp->second);
       if (ALLOCNO_LOOP_TREE_NODE (a1) == ira_loop_tree_root)
 	continue;
       ira_assert ((ALLOCNO_LOOP_TREE_NODE (a2) != ira_loop_tree_root));
       parent_a1 = ira_parent_or_cap_allocno (a1);
       parent_a2 = ira_parent_or_cap_allocno (a2);
+      ira_object_t parent_obj1
+	= find_object_anyway (parent_a1, OBJECT_START (cp->first),
+			      OBJECT_NREGS (cp->first));
+      ira_object_t parent_obj2
+	= find_object_anyway (parent_a2, OBJECT_START (cp->second),
+			      OBJECT_NREGS (cp->second));
       ira_assert (parent_a1 != NULL && parent_a2 != NULL);
-      if (! allocnos_conflict_for_copy_p (parent_a1, parent_a2))
-	ira_add_allocno_copy (parent_a1, parent_a2, cp->freq,
+      if (regs_non_conflict_for_copy_p (parent_obj1, parent_obj2,
+					cp->insn != NULL, true))
+	ira_add_allocno_copy (parent_obj1, parent_obj2, cp->freq,
 			      cp->constraint_p, cp->insn, cp->loop_tree_node);
     }
 }
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index 9dc7f3c655e..30ff46980f5 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -1129,11 +1129,11 @@ add_range_and_copies_from_move_list (move_t list, ira_loop_tree_node_t node,
       update_costs (to, false, freq);
       cp = ira_add_allocno_copy (from, to, freq, false, move->insn, NULL);
       if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
-	fprintf (ira_dump_file, "    Adding cp%d:a%dr%d-a%dr%d\n",
-		 cp->num, ALLOCNO_NUM (cp->first),
-		 REGNO (allocno_emit_reg (cp->first)),
-		 ALLOCNO_NUM (cp->second),
-		 REGNO (allocno_emit_reg (cp->second)));
+	fprintf (ira_dump_file, "    Adding cp%d:a%dr%d-a%dr%d\n", cp->num,
+		 ALLOCNO_NUM (OBJECT_ALLOCNO (cp->first)),
+		 REGNO (allocno_emit_reg (OBJECT_ALLOCNO (cp->first))),
+		 ALLOCNO_NUM (OBJECT_ALLOCNO (cp->second)),
+		 REGNO (allocno_emit_reg (OBJECT_ALLOCNO (cp->second))));
 
       nr = ALLOCNO_NUM_OBJECTS (from);
       for (i = 0; i < nr; i++)
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 9095a8227f7..963e533e448 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -594,9 +594,9 @@ struct ira_allocno_copy
 {
   /* The unique order number of the copy node starting with 0.  */
   int num;
-  /* Allocnos connected by the copy.  The first allocno should have
+  /* Objects connected by the copy.  The first allocno should have
      smaller order number than the second one.  */
-  ira_allocno_t first, second;
+  ira_object_t first, second;
   /* Execution frequency of the copy.  */
   int freq;
   bool constraint_p;
@@ -1046,6 +1046,9 @@ extern void ira_remove_allocno_prefs (ira_allocno_t);
 extern ira_copy_t ira_create_copy (ira_allocno_t, ira_allocno_t,
 				   int, bool, rtx_insn *,
 				   ira_loop_tree_node_t);
+extern ira_copy_t
+ira_add_allocno_copy (ira_object_t, ira_object_t, int, bool, rtx_insn *,
+		      ira_loop_tree_node_t);
 extern ira_copy_t ira_add_allocno_copy (ira_allocno_t, ira_allocno_t, int,
 					bool, rtx_insn *,
 					ira_loop_tree_node_t);
@@ -1059,6 +1062,7 @@ extern void ira_destroy (void);
 extern ira_object_t
 find_object (ira_allocno_t, int, int);
 extern ira_object_t find_object (ira_allocno_t, poly_int64, poly_int64);
+extern ira_object_t find_object (ira_allocno_t, rtx);
 ira_object_t
 find_object_anyway (ira_allocno_t a, int start, int nregs);
 extern void ira_copy_allocno_objects (ira_allocno_t, ira_allocno_t);
@@ -1087,6 +1091,8 @@ extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *,
 /* ira-conflicts.cc */
 extern void ira_debug_conflicts (bool);
 extern void ira_build_conflicts (void);
+extern bool subreg_move_p (ira_object_t, ira_object_t);
+extern bool subreg_move_p (rtx, rtx);
 
 /* ira-color.cc */
 extern ira_allocno_t ira_soft_conflict (ira_allocno_t, ira_allocno_t);
diff --git a/gcc/ira.cc b/gcc/ira.cc
index 9ea57d3b1ea..280ca47a999 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2853,14 +2853,15 @@ print_redundant_copies (void)
       if (hard_regno >= 0)
 	continue;
       for (cp = ALLOCNO_COPIES (a); cp != NULL; cp = next_cp)
-	if (cp->first == a)
+	if (OBJECT_ALLOCNO (cp->first) == a)
 	  next_cp = cp->next_first_allocno_copy;
 	else
 	  {
 	    next_cp = cp->next_second_allocno_copy;
 	    if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL
 		&& cp->insn != NULL_RTX
-		&& ALLOCNO_HARD_REGNO (cp->first) == hard_regno)
+		&& ALLOCNO_HARD_REGNO (OBJECT_ALLOCNO (cp->first))
+		     == hard_regno)
 	      fprintf (ira_dump_file,
 		       "        Redundant move from %d(freq %d):%d\n",
 		       INSN_UID (cp->insn), cp->freq, hard_regno);
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (3 preceding siblings ...)
  2023-11-12  9:58 ` [PATCH V2 4/7] ira: Support subreg copy Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 6/7] lra: Switch to live_subreg data flow Lehua Ding
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch relax the subreg track capability to all subreg registers.

gcc/ChangeLog:

	* ira-build.cc (get_reg_unit_size): New.
	(has_same_nregs): New.
	(ira_set_allocno_class): Adjust.

---
 gcc/ira-build.cc | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 13f0f7336ed..f88aeeeeaef 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (REGNO_REG_CLASS (i) != NO_REGS
+	&& reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+	&& ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+      return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
    registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class aclass)
 
   if (aclass == NO_REGS)
     return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+     hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
     {
-      ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+      ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
       ALLOCNO_TRACK_SUBREG_P (a) = true;
       return;
     }
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 6/7] lra: Switch to live_subreg data flow
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (4 preceding siblings ...)
  2023-11-12  9:58 ` [PATCH V2 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12  9:58 ` [PATCH V2 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
  2023-11-12 12:08 ` [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch switches the live_reg data in lra to live_subreg data,
and the situation will be more complicated than in ira because
this part of the data is modified in lra also and the live_subreg
data will be recalculated.

gcc/ChangeLog:

	* lra-coalesce.cc (update_live_info):
	Adjust to new live subreg data.
	(lra_coalesce): Ditto.
	* lra-constraints.cc (update_ebb_live_info): Ditto.
	(get_live_on_other_edges): Ditto.
	(inherit_in_ebb): Ditto.
	(lra_inheritance): Ditto.
	(fix_bb_live_info): Ditto.
	(remove_inheritance_pseudos): Ditto.
	* lra-int.h (GCC_LRA_INT_H): Ditto.
	* lra-lives.cc (class bb_data_pseudos): Ditto.
	(make_hard_regno_live): Ditto.
	(make_hard_regno_dead): Ditto.
	(mark_regno_live): Ditto.
	(mark_regno_dead): Ditto.
	(live_trans_fun): Ditto.
	(live_con_fun_0): Ditto.
	(live_con_fun_n): Ditto.
	(initiate_live_solver): Ditto.
	(finish_live_solver): Ditto.
	(process_bb_lives): Ditto.
	(lra_create_live_ranges_1): Ditto.
	* lra-remat.cc (dump_candidates_and_remat_bb_data): Ditto.
	(calculate_livein_cands): Ditto.
	(do_remat): Ditto.
	* lra-spills.cc (spill_pseudos): Ditto.

---
 gcc/lra-coalesce.cc    |  20 ++-
 gcc/lra-constraints.cc |  93 +++++++++---
 gcc/lra-int.h          |   2 +
 gcc/lra-lives.cc       | 328 ++++++++++++++++++++++++++++++++---------
 gcc/lra-remat.cc       |  13 +-
 gcc/lra-spills.cc      |  22 ++-
 6 files changed, 374 insertions(+), 104 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
    bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (&used_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, all,
 			    FIRST_PSEUDO_REGISTER, j, bi)
     bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (&used_pseudos_bitmap))
     {
-      bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
-      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+      bitmap_and_compl_into (all, &coalesced_pseudos_bitmap);
+      bitmap_ior_into (all, &used_pseudos_bitmap);
+
+      bitmap_and_compl_into (full, &coalesced_pseudos_bitmap);
+      bitmap_ior_and_compl_into (full, &used_pseudos_bitmap, partial);
+
+      bitmap_and_compl_into (partial, &coalesced_pseudos_bitmap);
+      bitmap_ior_and_compl_into (partial, &used_pseudos_bitmap, full);
     }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (&used_pseudos_bitmap, &reg_obstack);
   FOR_EACH_BB_FN (bb, cfun)
     {
-      update_live_info (df_get_live_in (bb));
-      update_live_info (df_get_live_out (bb));
+      update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+			DF_LIVE_SUBREG_PARTIAL_IN (bb));
+      update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+			DF_LIVE_SUBREG_PARTIAL_OUT (bb));
       FOR_BB_INSNS_SAFE (bb, insn, next)
 	if (INSN_P (insn)
 	    && bitmap_bit_p (&involved_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
 	{
 	  if (prev_bb != NULL)
 	    {
-	      /* Update df_get_live_in (prev_bb):  */
+	      /* Update subreg live (prev_bb):  */
+	      bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+	      bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+	      bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+	      subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
 	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
 		if (bitmap_bit_p (&live_regs, j))
-		  bitmap_set_bit (df_get_live_in (prev_bb), j);
-		else
-		  bitmap_clear_bit (df_get_live_in (prev_bb), j);
+		  {
+		    bitmap_set_bit (subreg_all_in, j);
+		    bitmap_set_bit (subreg_full_in, j);
+		    if (bitmap_bit_p (subreg_partial_in, j))
+		      {
+			bitmap_clear_bit (subreg_partial_in, j);
+			range_in->remove_live (j);
+		      }
+		  }
+		else if (bitmap_bit_p (subreg_all_in, j))
+		  {
+		    bitmap_clear_bit (subreg_all_in, j);
+		    bitmap_clear_bit (subreg_full_in, j);
+		    if (bitmap_bit_p (subreg_partial_in, j))
+		      {
+			bitmap_clear_bit (subreg_partial_in, j);
+			range_in->remove_live (j);
+		      }
+		  }
 	    }
+	  bitmap subreg_all_out = DF_LIVE_SUBREG_OUT (curr_bb);
 	  if (curr_bb != last_bb)
 	    {
-	      /* Update df_get_live_out (curr_bb):  */
+	      /* Update subreg live (curr_bb):  */
+	      bitmap subreg_all_out = DF_LIVE_SUBREG_OUT (curr_bb);
+	      bitmap subreg_full_out = DF_LIVE_SUBREG_FULL_OUT (curr_bb);
+	      bitmap subreg_partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (curr_bb);
+	      subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (curr_bb);
 	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
 		{
 		  live_p = bitmap_bit_p (&live_regs, j);
 		  if (! live_p)
 		    FOR_EACH_EDGE (e, ei, curr_bb->succs)
-		      if (bitmap_bit_p (df_get_live_in (e->dest), j))
+		      if (bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), j))
 			{
 			  live_p = true;
 			  break;
 			}
 		  if (live_p)
-		    bitmap_set_bit (df_get_live_out (curr_bb), j);
-		  else
-		    bitmap_clear_bit (df_get_live_out (curr_bb), j);
+		    {
+		      bitmap_set_bit (subreg_all_out, j);
+		      bitmap_set_bit (subreg_full_out, j);
+		      if (bitmap_bit_p (subreg_partial_out, j))
+			{
+			  bitmap_clear_bit (subreg_partial_out, j);
+			  range_out->remove_live (j);
+			}
+		    }
+		  else if (bitmap_bit_p (subreg_all_out, j))
+		    {
+		      bitmap_clear_bit (subreg_all_out, j);
+		      bitmap_clear_bit (subreg_full_out, j);
+		      if (bitmap_bit_p (subreg_partial_out, j))
+			{
+			  bitmap_clear_bit (subreg_partial_out, j);
+			  range_out->remove_live (j);
+			}
+		    }
 		}
 	    }
 	  prev_bb = curr_bb;
-	  bitmap_and (&live_regs, &check_only_regs, df_get_live_out (curr_bb));
+	  bitmap_and (&live_regs, &check_only_regs, subreg_all_out);
 	}
       if (! NONDEBUG_INSN_P (curr_insn))
 	continue;
@@ -6715,7 +6756,7 @@ get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
   bitmap_clear (res);
   FOR_EACH_EDGE (e, ei, from->succs)
     if (e->dest != to)
-      bitmap_ior_into (res, df_get_live_in (e->dest));
+      bitmap_ior_into (res, DF_LIVE_SUBREG_IN (e->dest));
   last = get_last_insertion_point (from);
   if (! JUMP_P (last))
     return;
@@ -6787,7 +6828,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
 	{
 	  /* We are at the end of BB.  Add qualified living
 	     pseudos for potential splitting.  */
-	  to_process = df_get_live_out (curr_bb);
+	  to_process = DF_LIVE_SUBREG_OUT (curr_bb);
 	  if (last_processed_bb != NULL)
 	    {
 	      /* We are somewhere in the middle of EBB.	 */
@@ -7159,7 +7200,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
 	{
 	  /* We reached the beginning of the current block -- do
 	     rest of spliting in the current BB.  */
-	  to_process = df_get_live_in (curr_bb);
+	  to_process = DF_LIVE_SUBREG_IN (curr_bb);
 	  if (BLOCK_FOR_INSN (head) != curr_bb)
 	    {
 	      /* We are somewhere in the middle of EBB.	 */
@@ -7236,7 +7277,7 @@ lra_inheritance (void)
 	fprintf (lra_dump_file, "EBB");
       /* Form a EBB starting with BB.  */
       bitmap_clear (&ebb_global_regs);
-      bitmap_ior_into (&ebb_global_regs, df_get_live_in (bb));
+      bitmap_ior_into (&ebb_global_regs, DF_LIVE_SUBREG_IN (bb));
       for (;;)
 	{
 	  if (lra_dump_file != NULL)
@@ -7252,7 +7293,7 @@ lra_inheritance (void)
 	    break;
 	  bb = bb->next_bb;
 	}
-      bitmap_ior_into (&ebb_global_regs, df_get_live_out (bb));
+      bitmap_ior_into (&ebb_global_regs, DF_LIVE_SUBREG_OUT (bb));
       if (lra_dump_file != NULL)
 	fprintf (lra_dump_file, "\n");
       if (inherit_in_ebb (BB_HEAD (start_bb), BB_END (bb)))
@@ -7281,15 +7322,23 @@ int lra_undo_inheritance_iter;
 /* Fix BB live info LIVE after removing pseudos created on pass doing
    inheritance/split which are REMOVED_PSEUDOS.	 */
 static void
-fix_bb_live_info (bitmap live, bitmap removed_pseudos)
+fix_bb_live_info (bitmap all, bitmap full, bitmap partial,
+		  bitmap removed_pseudos)
 {
   unsigned int regno;
   bitmap_iterator bi;
 
   EXECUTE_IF_SET_IN_BITMAP (removed_pseudos, 0, regno, bi)
-    if (bitmap_clear_bit (live, regno)
-	&& REG_P (lra_reg_info[regno].restore_rtx))
-      bitmap_set_bit (live, REGNO (lra_reg_info[regno].restore_rtx));
+    {
+      if (bitmap_clear_bit (all, regno)
+	  && REG_P (lra_reg_info[regno].restore_rtx))
+	{
+	  bitmap_set_bit (all, REGNO (lra_reg_info[regno].restore_rtx));
+	  bitmap_clear_bit (full, regno);
+	  bitmap_set_bit (full, REGNO (lra_reg_info[regno].restore_rtx));
+	  gcc_assert (!bitmap_bit_p (partial, regno));
+	}
+    }
 }
 
 /* Return regno of the (subreg of) REG. Otherwise, return a negative
@@ -7355,8 +7404,10 @@ remove_inheritance_pseudos (bitmap remove_pseudos)
      constraint pass.  */
   FOR_EACH_BB_FN (bb, cfun)
     {
-      fix_bb_live_info (df_get_live_in (bb), remove_pseudos);
-      fix_bb_live_info (df_get_live_out (bb), remove_pseudos);
+      fix_bb_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+			DF_LIVE_SUBREG_PARTIAL_IN (bb), remove_pseudos);
+      fix_bb_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+			DF_LIVE_SUBREG_PARTIAL_OUT (bb), remove_pseudos);
       FOR_BB_INSNS_REVERSE (bb, curr_insn)
 	{
 	  if (! INSN_P (curr_insn))
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index d0752c2ae50..678377d9ec6 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.	If not see
 #ifndef GCC_LRA_INT_H
 #define GCC_LRA_INT_H
 
+#include "subreg-live-range.h"
+
 #define lra_assert(c) gcc_checking_assert (c)
 
 /* The parameter used to prevent infinite reloading for an insn.  Each
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f60e564da82..d93921ad302 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -272,8 +272,26 @@ update_pseudo_point (int regno, int point, enum point_type type)
     }
 }
 
-/* The corresponding bitmaps of BB currently being processed.  */
-static bitmap bb_killed_pseudos, bb_gen_pseudos;
+/* Structure describing local BB data used for pseudo
+   live-analysis.  */
+class bb_data_pseudos : public basic_block_subreg_live_info
+{
+public:
+  /* Basic block about which the below data are.  */
+  basic_block bb;
+};
+
+/* Array for all BB data.  Indexed by the corresponding BB index.  */
+typedef class bb_data_pseudos *bb_data_t;
+
+/* All basic block data are referred through the following array.  */
+static bb_data_t bb_data;
+
+/* The corresponding basic block info of BB currently being processed.  */
+static bb_data_t curr_bb_info;
+
+/* Flag mean curr function has subreg ref need be tracked.  */
+static bool has_subreg_live_p;
 
 /* Record hard register REGNO as now being live.  It updates
    living hard regs and START_LIVING.  */
@@ -287,7 +305,7 @@ make_hard_regno_live (int regno)
   SET_HARD_REG_BIT (hard_regs_live, regno);
   sparseset_set_bit (start_living, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
-    bitmap_set_bit (bb_gen_pseudos, regno);
+    bitmap_set_bit (&curr_bb_info->full_use, regno);
 }
 
 /* Process the definition of hard register REGNO.  This updates
@@ -310,8 +328,8 @@ make_hard_regno_dead (int regno)
   sparseset_set_bit (start_dying, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
     {
-      bitmap_clear_bit (bb_gen_pseudos, regno);
-      bitmap_set_bit (bb_killed_pseudos, regno);
+      bitmap_clear_bit (&curr_bb_info->full_use, regno);
+      bitmap_set_bit (&curr_bb_info->full_def, regno);
     }
 }
 
@@ -355,7 +373,9 @@ mark_regno_live (int regno, machine_mode mode)
   else
     {
       mark_pseudo_live (regno);
-      bitmap_set_bit (bb_gen_pseudos, regno);
+      bitmap_set_bit (&curr_bb_info->full_use, regno);
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_def, regno));
     }
 }
 
@@ -375,8 +395,10 @@ mark_regno_dead (int regno, machine_mode mode)
   else
     {
       mark_pseudo_dead (regno);
-      bitmap_clear_bit (bb_gen_pseudos, regno);
-      bitmap_set_bit (bb_killed_pseudos, regno);
+      bitmap_clear_bit (&curr_bb_info->full_use, regno);
+      bitmap_set_bit (&curr_bb_info->full_def, regno);
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_use, regno));
+      gcc_assert (!bitmap_bit_p (&curr_bb_info->partial_def, regno));
     }
 }
 
@@ -387,23 +409,6 @@ mark_regno_dead (int regno, machine_mode mode)
    border.  That might be a consequence of some global transformations
    in LRA, e.g. PIC pseudo reuse or rematerialization.  */
 
-/* Structure describing local BB data used for pseudo
-   live-analysis.  */
-class bb_data_pseudos
-{
-public:
-  /* Basic block about which the below data are.  */
-  basic_block bb;
-  bitmap_head killed_pseudos; /* pseudos killed in the BB.  */
-  bitmap_head gen_pseudos; /* pseudos generated in the BB.  */
-};
-
-/* Array for all BB data.  Indexed by the corresponding BB index.  */
-typedef class bb_data_pseudos *bb_data_t;
-
-/* All basic block data are referred through the following array.  */
-static bb_data_t bb_data;
-
 /* Two small functions for access to the bb data.  */
 static inline bb_data_t
 get_bb_data (basic_block bb)
@@ -430,13 +435,93 @@ static bool
 live_trans_fun (int bb_index)
 {
   basic_block bb = get_bb_data_by_index (bb_index)->bb;
-  bitmap bb_liveout = df_get_live_out (bb);
-  bitmap bb_livein = df_get_live_in (bb);
+  bitmap full_out = DF_LIVE_SUBREG_FULL_OUT (bb);
+  bitmap full_in = DF_LIVE_SUBREG_FULL_IN (bb);
+  bitmap partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+  bitmap partial_in = DF_LIVE_SUBREG_PARTIAL_IN (bb);
+  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
+  subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (bb);
   bb_data_t bb_info = get_bb_data (bb);
 
-  bitmap_and_compl (&temp_bitmap, bb_liveout, &all_hard_regs_bitmap);
-  return bitmap_ior_and_compl (bb_livein, &bb_info->gen_pseudos,
-			       &temp_bitmap, &bb_info->killed_pseudos);
+  if (!has_subreg_live_p)
+    {
+      bitmap_and_compl (&temp_bitmap, full_out, &all_hard_regs_bitmap);
+      return bitmap_ior_and_compl (full_in, &bb_info->full_use, &temp_bitmap,
+				   &bb_info->full_def);
+    }
+
+  /* If there has subreg live need be tracked.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  bool changed = false;
+  bitmap_head temp_full_out;
+  bitmap_head temp_partial_out;
+  bitmap_head temp_partial_be_full_out;
+  bitmap_head all_def;
+  subregs_live temp_range_out;
+  bitmap_initialize (&temp_full_out, &reg_obstack);
+  bitmap_initialize (&temp_partial_out, &reg_obstack);
+  bitmap_initialize (&temp_partial_be_full_out, &reg_obstack);
+  bitmap_initialize (&all_def, &reg_obstack);
+
+  bitmap_and_compl (&temp_full_out, full_out, &all_hard_regs_bitmap);
+
+  bitmap_ior (&all_def, &bb_info->full_def, &bb_info->partial_def);
+
+  bitmap_and (&temp_partial_out, &temp_full_out, &bb_info->partial_def);
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_out, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      subreg_ranges temp (bb_info->range_def->lives.at (regno).max);
+      temp.make_full ();
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      temp_range_out.add_ranges (regno, temp);
+    }
+  bitmap_ior_and_compl_into (&temp_partial_out, partial_out, &all_def);
+  EXECUTE_IF_AND_COMPL_IN_BITMAP (partial_out, &all_def, FIRST_PSEUDO_REGISTER,
+				  regno, bi)
+    {
+      temp_range_out.add_ranges (regno, range_out->lives.at (regno));
+    }
+  EXECUTE_IF_AND_IN_BITMAP (partial_out, &bb_info->partial_def, 0, regno, bi)
+    {
+      subreg_ranges temp = range_out->lives.at (regno);
+      temp.remove_ranges (bb_info->range_def->lives.at (regno));
+      if (!temp.empty_p ())
+	{
+	  bitmap_set_bit (&temp_partial_out, regno);
+	  temp_range_out.add_ranges (regno, temp);
+	}
+    }
+
+  temp_range_out.add_lives (*bb_info->range_use);
+  EXECUTE_IF_AND_IN_BITMAP (&temp_partial_out, &bb_info->partial_use, 0, regno,
+			    bi)
+    {
+      subreg_ranges temp = temp_range_out.lives.at (regno);
+      temp.add_ranges (bb_info->range_use->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_partial_be_full_out, regno);
+	  temp_range_out.remove_live (regno);
+	}
+    }
+
+  bitmap_ior_and_compl_into (&temp_partial_be_full_out, &temp_full_out,
+			     &all_def);
+  changed
+    |= bitmap_ior (full_in, &temp_partial_be_full_out, &bb_info->full_use);
+
+  bitmap_ior_into (&temp_partial_out, &bb_info->partial_use);
+  changed |= bitmap_and_compl (partial_in, &temp_partial_out,
+			       &temp_partial_be_full_out);
+  changed |= range_in->copy_lives (temp_range_out);
+
+  bitmap_clear (&temp_full_out);
+  bitmap_clear (&temp_partial_out);
+  bitmap_clear (&temp_partial_be_full_out);
+  bitmap_clear (&all_def);
+
+  return changed;
 }
 
 /* The confluence function used by the DF equation solver to set up
@@ -444,7 +529,9 @@ live_trans_fun (int bb_index)
 static void
 live_con_fun_0 (basic_block bb)
 {
-  bitmap_and_into (df_get_live_out (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_OUT (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_FULL_OUT (bb), &all_hard_regs_bitmap);
+  bitmap_and_into (DF_LIVE_SUBREG_PARTIAL_OUT (bb), &all_hard_regs_bitmap);
 }
 
 /* The confluence function used by the DF equation solver to propagate
@@ -456,13 +543,77 @@ live_con_fun_0 (basic_block bb)
 static bool
 live_con_fun_n (edge e)
 {
-  basic_block bb = e->src;
-  basic_block dest = e->dest;
-  bitmap bb_liveout = df_get_live_out (bb);
-  bitmap dest_livein = df_get_live_in (dest);
+  class df_live_subreg_bb_info *src_bb_info
+    = df_live_subreg_get_bb_info (e->src->index);
+  class df_live_subreg_bb_info *dest_bb_info
+    = df_live_subreg_get_bb_info (e->dest->index);
+
+  if (!has_subreg_live_p)
+    {
+      return bitmap_ior_and_compl_into (&src_bb_info->full_out,
+					&dest_bb_info->full_in,
+					&all_hard_regs_bitmap);
+    }
+
+  /* If there has subreg live need be tracked. Calculation formula:
+       temp_full mean:
+	 1. partial in out/in, full in other in/out
+	 2. partial in out and in, and mrege range is full
+       temp_range mean:
+	 the range of regno which partial live
+       src_bb_info->partial_out = (src_bb_info->partial_out |
+     dest_bb_info->partial_in) & ~temp_full src_bb_info->range_out = copy
+     (temp_range) src_bb_info->full_out |= dest_bb_info->full_in | temp_full
+       */
+  subregs_live temp_range;
+  temp_range.add_lives (*src_bb_info->range_out);
+  temp_range.add_lives (*dest_bb_info->range_in);
+
+  bitmap_head temp_partial_all;
+  bitmap_initialize (&temp_partial_all, &bitmap_default_obstack);
+  bitmap_ior (&temp_partial_all, &src_bb_info->partial_out,
+	      &dest_bb_info->partial_in);
+
+  bitmap_head temp_full;
+  bitmap_initialize (&temp_full, &bitmap_default_obstack);
+
+  /* Collect regno that become full after merge src_bb_info->partial_out
+     and dest_bb_info->partial_in.  */
+  unsigned int regno;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (&temp_partial_all, FIRST_PSEUDO_REGISTER, regno, bi)
+    {
+      if (bitmap_bit_p (&src_bb_info->full_out, regno)
+	  || bitmap_bit_p (&dest_bb_info->full_in, regno))
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	  continue;
+	}
+      else if (!bitmap_bit_p (&src_bb_info->partial_out, regno)
+	       || !bitmap_bit_p (&dest_bb_info->partial_in, regno))
+	continue;
+
+      subreg_ranges temp = src_bb_info->range_out->lives.at (regno);
+      temp.add_ranges (dest_bb_info->range_in->lives.at (regno));
+      if (temp.full_p ())
+	{
+	  bitmap_set_bit (&temp_full, regno);
+	  temp_range.remove_live (regno);
+	}
+    }
+
+  /* Calculating src_bb_info->partial_out and src_bb_info->range_out.  */
+  bool changed = bitmap_and_compl (&src_bb_info->partial_out, &temp_partial_all,
+				   &temp_full);
+  changed |= src_bb_info->range_out->copy_lives (temp_range);
 
-  return bitmap_ior_and_compl_into (bb_liveout,
-				    dest_livein, &all_hard_regs_bitmap);
+  /* Calculating src_bb_info->full_out.  */
+  bitmap_ior_and_compl_into (&temp_full, &dest_bb_info->full_in,
+			     &all_hard_regs_bitmap);
+  changed |= bitmap_ior_into (&src_bb_info->full_out, &temp_full);
+
+  return changed;
 }
 
 /* Indexes of all function blocks.  */
@@ -483,8 +634,12 @@ initiate_live_solver (void)
     {
       bb_data_t bb_info = get_bb_data (bb);
       bb_info->bb = bb;
-      bitmap_initialize (&bb_info->killed_pseudos, &reg_obstack);
-      bitmap_initialize (&bb_info->gen_pseudos, &reg_obstack);
+      bitmap_initialize (&bb_info->full_def, &reg_obstack);
+      bitmap_initialize (&bb_info->partial_def, &reg_obstack);
+      bitmap_initialize (&bb_info->full_use, &reg_obstack);
+      bitmap_initialize (&bb_info->partial_use, &reg_obstack);
+      bb_info->range_def = new subregs_live ();
+      bb_info->range_use = new subregs_live ();
       bitmap_set_bit (&all_blocks, bb->index);
     }
 }
@@ -499,8 +654,12 @@ finish_live_solver (void)
   FOR_ALL_BB_FN (bb, cfun)
     {
       bb_data_t bb_info = get_bb_data (bb);
-      bitmap_clear (&bb_info->killed_pseudos);
-      bitmap_clear (&bb_info->gen_pseudos);
+      bitmap_clear (&bb_info->full_def);
+      bitmap_clear (&bb_info->partial_def);
+      bitmap_clear (&bb_info->full_use);
+      bitmap_clear (&bb_info->partial_use);
+      delete bb_info->range_def;
+      delete bb_info->range_use;
     }
   free (bb_data);
   bitmap_clear (&all_hard_regs_bitmap);
@@ -663,7 +822,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   /* Only has a meaningful value once we've seen a call.  */
   function_abi last_call_abi = default_function_abi;
 
-  reg_live_out = df_get_live_out (bb);
+  reg_live_out = DF_LIVE_SUBREG_OUT (bb);
   sparseset_clear (pseudos_live);
   sparseset_clear (pseudos_live_through_calls);
   sparseset_clear (pseudos_live_through_setjumps);
@@ -675,10 +834,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       mark_pseudo_live (j);
     }
 
-  bb_gen_pseudos = &get_bb_data (bb)->gen_pseudos;
-  bb_killed_pseudos = &get_bb_data (bb)->killed_pseudos;
-  bitmap_clear (bb_gen_pseudos);
-  bitmap_clear (bb_killed_pseudos);
+  curr_bb_info = get_bb_data (bb);
+  bitmap_clear (&curr_bb_info->full_use);
+  bitmap_clear (&curr_bb_info->partial_use);
+  bitmap_clear (&curr_bb_info->full_def);
+  bitmap_clear (&curr_bb_info->partial_def);
+  curr_bb_info->range_use->clear ();
+  curr_bb_info->range_def->clear ();
   freq = REG_FREQ_FROM_BB (bb);
 
   if (lra_dump_file != NULL)
@@ -1101,16 +1263,16 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   bool live_change_p = false;
   /* Check if bb border live info was changed.  */
   unsigned int live_pseudos_num = 0;
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb),
-			    FIRST_PSEUDO_REGISTER, j, bi)
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			    bi)
     {
       live_pseudos_num++;
-      if (! sparseset_bit_p (pseudos_live, j))
+      if (!sparseset_bit_p (pseudos_live, j))
 	{
 	  live_change_p = true;
 	  if (lra_dump_file != NULL)
-	    fprintf (lra_dump_file,
-		     "  r%d is removed as live at bb%d start\n", j, bb->index);
+	    fprintf (lra_dump_file, "  r%d is removed as live at bb%d start\n",
+		     j, bb->index);
 	  break;
 	}
     }
@@ -1120,9 +1282,9 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       live_change_p = true;
       if (lra_dump_file != NULL)
 	EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
-	  if (! bitmap_bit_p (df_get_live_in (bb), j))
-	    fprintf (lra_dump_file,
-		     "  r%d is added to live at bb%d start\n", j, bb->index);
+      if (!bitmap_bit_p (DF_LIVE_SUBREG_IN (bb), j))
+	fprintf (lra_dump_file, "  r%d is added to live at bb%d start\n", j,
+		 bb->index);
     }
   /* See if we'll need an increment at the end of this basic block.
      An increment is needed if the PSEUDOS_LIVE set is not empty,
@@ -1135,8 +1297,9 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       mark_pseudo_dead (i);
     }
 
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, j, bi)
-    {
+    EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			      bi)
+      {
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
@@ -1151,7 +1314,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (!TEST_HARD_REG_BIT (hard_regs_spilled_into, i))
 	continue;
 
-      if (bitmap_bit_p (df_get_live_in (bb), i))
+      if (bitmap_bit_p (DF_LIVE_SUBREG_IN (bb), i))
 	continue;
 
       live_change_p = true;
@@ -1159,7 +1322,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	fprintf (lra_dump_file,
 		 "  hard reg r%d is added to live at bb%d start\n", i,
 		 bb->index);
-      bitmap_set_bit (df_get_live_in (bb), i);
+      bitmap_set_bit (DF_LIVE_SUBREG_IN (bb), i);
+      bitmap_set_bit (DF_LIVE_SUBREG_FULL_IN (bb), i);
     }
 
   if (need_curr_point_incr)
@@ -1425,10 +1589,24 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	 disappear, e.g. pseudos with used equivalences.  */
       FOR_EACH_BB_FN (bb, cfun)
 	{
-	  bitmap_clear_range (df_get_live_in (bb), FIRST_PSEUDO_REGISTER,
+	  bitmap_clear_range (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_FULL_IN (bb),
+			      FIRST_PSEUDO_REGISTER,
 			      max_regno - FIRST_PSEUDO_REGISTER);
-	  bitmap_clear_range (df_get_live_out (bb), FIRST_PSEUDO_REGISTER,
+	  bitmap_clear_range (DF_LIVE_SUBREG_PARTIAL_IN (bb),
+			      FIRST_PSEUDO_REGISTER,
 			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_OUT (bb), FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_FULL_OUT (bb),
+			      FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  bitmap_clear_range (DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+			      FIRST_PSEUDO_REGISTER,
+			      max_regno - FIRST_PSEUDO_REGISTER);
+	  DF_LIVE_SUBREG_RANGE_IN (bb)->clear ();
+	  DF_LIVE_SUBREG_RANGE_OUT (bb)->clear ();
 	}
       /* As we did not change CFG since LRA start we can use
 	 DF-infrastructure solver to solve live data flow problem.  */
@@ -1441,6 +1619,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	(DF_BACKWARD, NULL, live_con_fun_0, live_con_fun_n,
 	 live_trans_fun, &all_blocks,
 	 df_get_postorder (DF_BACKWARD), df_get_n_blocks (DF_BACKWARD));
+      df_live_subreg_finalize (&all_blocks);
+
       if (lra_dump_file != NULL)
 	{
 	  fprintf (lra_dump_file,
@@ -1449,16 +1629,28 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	  FOR_EACH_BB_FN (bb, cfun)
 	    {
 	      bb_data_t bb_info = get_bb_data (bb);
-	      bitmap bb_livein = df_get_live_in (bb);
-	      bitmap bb_liveout = df_get_live_out (bb);
 
 	      fprintf (lra_dump_file, "\nBB %d:\n", bb->index);
-	      lra_dump_bitmap_with_title ("  gen:",
-					  &bb_info->gen_pseudos, bb->index);
-	      lra_dump_bitmap_with_title ("  killed:",
-					  &bb_info->killed_pseudos, bb->index);
-	      lra_dump_bitmap_with_title ("  livein:", bb_livein, bb->index);
-	      lra_dump_bitmap_with_title ("  liveout:", bb_liveout, bb->index);
+	      lra_dump_bitmap_with_title ("  full use", &bb_info->full_use,
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  partial use",
+					  &bb_info->partial_use, bb->index);
+	      lra_dump_bitmap_with_title ("  full def", &bb_info->full_def,
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  partial def",
+					  &bb_info->partial_def, bb->index);
+	      lra_dump_bitmap_with_title ("  live in full",
+					  DF_LIVE_SUBREG_FULL_IN (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live in partial",
+					  DF_LIVE_SUBREG_PARTIAL_IN (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live out full",
+					  DF_LIVE_SUBREG_FULL_OUT (bb),
+					  bb->index);
+	      lra_dump_bitmap_with_title ("  live out partial",
+					  DF_LIVE_SUBREG_PARTIAL_OUT (bb),
+					  bb->index);
 	    }
 	}
     }
diff --git a/gcc/lra-remat.cc b/gcc/lra-remat.cc
index 681dcf36331..26d3da07b00 100644
--- a/gcc/lra-remat.cc
+++ b/gcc/lra-remat.cc
@@ -556,11 +556,11 @@ dump_candidates_and_remat_bb_data (void)
       fprintf (lra_dump_file, "\nBB %d:\n", bb->index);
       /* Livein */
       fprintf (lra_dump_file, "  register live in:");
-      dump_regset (df_get_live_in (bb), lra_dump_file);
+      dump_regset (DF_LIVE_SUBREG_IN (bb), lra_dump_file);
       putc ('\n', lra_dump_file);
       /* Liveout */
       fprintf (lra_dump_file, "  register live out:");
-      dump_regset (df_get_live_out (bb), lra_dump_file);
+      dump_regset (DF_LIVE_SUBREG_OUT (bb), lra_dump_file);
       putc ('\n', lra_dump_file);
       /* Changed/dead regs: */
       fprintf (lra_dump_file, "  changed regs:");
@@ -727,7 +727,7 @@ calculate_livein_cands (void)
 
   FOR_EACH_BB_FN (bb, cfun)
     {
-      bitmap livein_regs = df_get_live_in (bb);
+      bitmap livein_regs = DF_LIVE_SUBREG_IN (bb);
       bitmap livein_cands = &get_remat_bb_data (bb)->livein_cands;
       for (unsigned int i = 0; i < cands_num; i++)
 	{
@@ -1064,11 +1064,10 @@ do_remat (void)
   FOR_EACH_BB_FN (bb, cfun)
     {
       CLEAR_HARD_REG_SET (live_hard_regs);
-      EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+      EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), 0, regno, bi)
 	{
-	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
-			   ? regno
-			   : reg_renumber[regno];
+	  int hard_regno
+	    = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
 	  if (hard_regno >= 0)
 	    SET_HARD_REG_BIT (live_hard_regs, hard_regno);
 	}
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index a663a1931e3..d38a2ffe2a7 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -566,8 +566,26 @@ spill_pseudos (void)
 			 "Debug insn #%u is reset because it referenced "
 			 "removed pseudo\n", INSN_UID (insn));
 	    }
-	  bitmap_and_compl_into (df_get_live_in (bb), spilled_pseudos);
-	  bitmap_and_compl_into (df_get_live_out (bb), spilled_pseudos);
+	  unsigned int regno;
+	  bitmap_iterator bi;
+
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_IN (bb), spilled_pseudos);
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_FULL_IN (bb), spilled_pseudos);
+	  bitmap partial_in = DF_LIVE_SUBREG_PARTIAL_IN (bb);
+	  subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (bb);
+	  EXECUTE_IF_AND_IN_BITMAP (partial_in, spilled_pseudos,
+				    FIRST_PSEUDO_REGISTER, regno, bi)
+	    range_in->remove_live (regno);
+	  bitmap_and_compl_into (partial_in, spilled_pseudos);
+
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_OUT (bb), spilled_pseudos);
+	  bitmap_and_compl_into (DF_LIVE_SUBREG_FULL_OUT (bb), spilled_pseudos);
+	  bitmap partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+	  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
+	  EXECUTE_IF_AND_IN_BITMAP (partial_out, spilled_pseudos,
+				    FIRST_PSEUDO_REGISTER, regno, bi)
+	    range_out->remove_live (regno);
+	  bitmap_and_compl_into (partial_out, spilled_pseudos);
 	}
     }
 }
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 7/7] lra: Support subreg live range track and conflict detect
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (5 preceding siblings ...)
  2023-11-12  9:58 ` [PATCH V2 6/7] lra: Switch to live_subreg data flow Lehua Ding
@ 2023-11-12  9:58 ` Lehua Ding
  2023-11-12 12:08 ` [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong, lehua.ding

This patch supports tracking the liveness of a subreg in a lra pass, with the
goal of getting it to agree with ira's register allocation scheme. There is some
duplication, maybe in the future this part of the code logic can be harmonized.

gcc/ChangeLog:

	* ira-build.cc (setup_pseudos_has_subreg_object):
	Collect new data for lra to use.
	(ira_build): Ditto.
	* lra-assigns.cc (set_offset_conflicts): New function.
	(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
	(lra_assign): Ditto.
	* lra-constraints.cc (process_alt_operands): Ditto.
	* lra-int.h (GCC_LRA_INT_H): Ditto.
	(struct lra_live_range): Ditto.
	(struct lra_insn_reg): Ditto.
	(get_range_hard_regs): New.
	(get_nregs): New.
	(has_subreg_object_p): New.
	* lra-lives.cc (INCLUDE_VECTOR): Adjust.
	(lra_live_range_pool): Ditto.
	(create_live_range): Ditto.
	(lra_merge_live_ranges): Ditto.
	(update_pseudo_point): Ditto.
	(mark_regno_live): Ditto.
	(mark_regno_dead): Ditto.
	(process_bb_lives): Ditto.
	(remove_some_program_points_and_update_live_ranges): Ditto.
	(lra_print_live_range_list): Ditto.
	(class subreg_live_item): New.
	(create_subregs_live_ranges): New.
	(lra_create_live_ranges_1): Ditto.
	* lra.cc (get_range_blocks): Ditto.
	(get_range_hard_regs): Ditto.
	(new_insn_reg): Ditto.
	(collect_non_operand_hard_regs): Ditto.
	(initialize_lra_reg_info_element): Ditto.
	(reg_same_range_p): New.
	(add_regs_to_insn_regno_info): Adjust.

---
 gcc/ira-build.cc       |  31 ++++
 gcc/lra-assigns.cc     | 111 ++++++++++++--
 gcc/lra-constraints.cc |  18 ++-
 gcc/lra-int.h          |  31 ++++
 gcc/lra-lives.cc       | 340 ++++++++++++++++++++++++++++++++++-------
 gcc/lra.cc             | 139 +++++++++++++++--
 6 files changed, 585 insertions(+), 85 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index f88aeeeeaef..bb29627d375 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
    basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
    the member loop_num.  */
 static void
@@ -3711,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
     }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (&pseudos_has_subreg_object, &reg_obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+    if (has_subreg_object_p (a))
+      {
+	bitmap_set_bit (&pseudos_has_subreg_object, ALLOCNO_REGNO (a));
+	if (ira_dump_file != NULL)
+	  {
+	    fprintf (ira_dump_file,
+		     "  a%d(r%d, nregs: %d) has subreg objects:\n",
+		     ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+	    ira_allocno_object_iterator oi;
+	    ira_object_t obj;
+	    FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+	      fprintf (ira_dump_file, "    object %d: start: %d, nregs: %d\n",
+		       OBJECT_INDEX (obj), OBJECT_START (obj),
+		       OBJECT_NREGS (obj));
+	    fprintf (ira_dump_file, "\n");
+	  }
+      }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
    loop tree nodes).  The function returns TRUE if we generate loop
    structure (besides nodes representing all function and the basic
@@ -3731,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1131,6 +1131,52 @@ assign_hard_regno (int hard_regno, int regno)
 /* Array used for sorting different pseudos.  */
 static int *sorted_pseudos;
 
+/* The detail conflict offsets If two live ranges conflict. Use to record
+   partail conflict.  */
+static bitmap_head live_range_conflicts;
+
+/* Set the conflict offset of the two registers REGNO1 and REGNO2. Use the
+   regno with bigger nregs as the base.  */
+static void
+set_offset_conflicts (int regno1, int regno2)
+{
+  gcc_assert (reg_renumber[regno1] >= 0 && reg_renumber[regno2] >= 0);
+  int nregs1 = get_nregs (regno1);
+  int nregs2 = get_nregs (regno2);
+  if (nregs1 < nregs2)
+    {
+      std::swap (nregs1, nregs2);
+      std::swap (regno1, regno2);
+    }
+
+  lra_live_range_t r1 = lra_reg_info[regno1].live_ranges;
+  lra_live_range_t r2 = lra_reg_info[regno2].live_ranges;
+  int total = nregs1;
+
+  bitmap_clear (&live_range_conflicts);
+  while (r1 != NULL && r2 != NULL)
+    {
+      if (r1->start > r2->finish)
+	r1 = r1->next;
+      else if (r2->start > r1->finish)
+	r2 = r2->next;
+      else
+	{
+	  for (const subreg_range &range1 : r1->subreg.ranges)
+	    for (const subreg_range &range2 : r2->subreg.ranges)
+	      /* Record all overlap offset.  */
+	      for (int i = range1.start - (range2.end - range2.start) + 1;
+		   i < range1.end; i++)
+		if (i >= 0 && i < total)
+		  bitmap_set_bit (&live_range_conflicts, i);
+	  if (r1->finish < r2->finish)
+	    r1 = r1->next;
+	  else
+	    r2 = r2->next;
+	}
+    }
+}
+
 /* The constraints pass is allowed to create equivalences between
    pseudos that make the current allocation "incorrect" (in the sense
    that pseudos are assigned to hard registers from their own conflict
@@ -1226,19 +1272,56 @@ setup_live_pseudos_and_spill_after_risky_transforms (bitmap
 	       the same hard register.	*/
 	    || hard_regno != reg_renumber[conflict_regno])
 	  {
-	    int conflict_hard_regno = reg_renumber[conflict_regno];
-	    
-	    biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
-	    biggest_nregs = hard_regno_nregs (conflict_hard_regno,
-					      biggest_mode);
-	    nregs_diff
-	      = (biggest_nregs
-		 - hard_regno_nregs (conflict_hard_regno,
-				     PSEUDO_REGNO_MODE (conflict_regno)));
-	    add_to_hard_reg_set (&conflict_set,
-				 biggest_mode,
-				 conflict_hard_regno
-				 - (WORDS_BIG_ENDIAN ? nregs_diff : 0));
+	  if (hard_regno >= 0 && reg_renumber[conflict_regno] >= 0
+	      && (has_subreg_object_p (regno)
+		  || has_subreg_object_p (conflict_regno)))
+	    {
+	      int nregs1 = get_nregs (regno);
+	      int nregs2 = get_nregs (conflict_regno);
+	      /* Quick check it is no overlap at all between them.  */
+	      if (hard_regno + nregs1 <= reg_renumber[conflict_regno]
+		  || reg_renumber[conflict_regno] + nregs2 <= hard_regno)
+		continue;
+
+	      /* Check the overlap is ok if them have partial overlap.  */
+	      set_offset_conflicts (regno, conflict_regno);
+	      if (nregs1 >= nregs2)
+		EXECUTE_IF_SET_IN_BITMAP (&live_range_conflicts, 0, k, bi)
+		  {
+		    int start_regno
+		      = WORDS_BIG_ENDIAN
+			  ? reg_renumber[conflict_regno] + nregs2 + k - nregs1
+			  : reg_renumber[conflict_regno] - k;
+		    if (start_regno >= 0 && hard_regno == start_regno)
+		      SET_HARD_REG_BIT (conflict_set, start_regno);
+		  }
+	      else
+		EXECUTE_IF_SET_IN_BITMAP (&live_range_conflicts, 0, k, bi)
+		  {
+		    int start_regno
+		      = WORDS_BIG_ENDIAN
+			  ? reg_renumber[conflict_regno] + nregs2 - k - nregs1
+			  : reg_renumber[conflict_regno] + k;
+		    if (start_regno < FIRST_PSEUDO_REGISTER
+			&& hard_regno == start_regno)
+		      SET_HARD_REG_BIT (conflict_set, start_regno);
+		  }
+	    }
+	  else
+	    {
+	      int conflict_hard_regno = reg_renumber[conflict_regno];
+
+	      biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
+	      biggest_nregs
+		= hard_regno_nregs (conflict_hard_regno, biggest_mode);
+	      nregs_diff
+		= (biggest_nregs
+		   - hard_regno_nregs (conflict_hard_regno,
+				       PSEUDO_REGNO_MODE (conflict_regno)));
+	      add_to_hard_reg_set (&conflict_set, biggest_mode,
+				   conflict_hard_regno
+				     - (WORDS_BIG_ENDIAN ? nregs_diff : 0));
+	    }
 	  }
       if (! overlaps_hard_reg_set_p (conflict_set, mode, hard_regno))
 	{
@@ -1637,7 +1720,9 @@ lra_assign (bool &fails_p)
   init_regno_assign_info ();
   bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
   create_live_range_start_chains ();
+  bitmap_initialize (&live_range_conflicts, &reg_obstack);
   setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
+  bitmap_clear (&live_range_conflicts);
   if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking)
     /* Check correctness of allocation but only when there are no hard reg
        splits and asm errors as in the case of errors explicit insns involving
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c3ad846b97b..912d0c3feec 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2363,13 +2363,19 @@ process_alt_operands (int only_alternative)
 		      {
 			/* We should reject matching of an early
 			   clobber operand if the matching operand is
-			   not dying in the insn.  */
-			if (!TEST_BIT (curr_static_id->operand[m]
-				       .early_clobber_alts, nalt)
+			   not dying in the insn. But for subreg of pseudo which
+			   has subreg live be tracked in ira, the REG_DEAD note
+			   doesn't have. that case we think them the matching is
+			   ok. */
+			if (!TEST_BIT (
+			      curr_static_id->operand[m].early_clobber_alts,
+			      nalt)
 			    || operand_reg[nop] == NULL_RTX
-			    || (find_regno_note (curr_insn, REG_DEAD,
-						 REGNO (op))
-				|| REGNO (op) == REGNO (operand_reg[m])))
+			    || find_regno_note (curr_insn, REG_DEAD, REGNO (op))
+			    || (read_modify_subreg_p (
+				  *curr_id->operand_loc[nop])
+				&& has_subreg_object_p (REGNO (op)))
+			    || REGNO (op) == REGNO (operand_reg[m]))
 			  match_p = true;
 		      }
 		    if (match_p)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 678377d9ec6..5a97bd61475 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.	If not see
 #ifndef GCC_LRA_INT_H
 #define GCC_LRA_INT_H
 
+#include "lra.h"
 #include "subreg-live-range.h"
 
 #define lra_assert(c) gcc_checking_assert (c)
@@ -48,6 +49,8 @@ struct lra_live_range
   lra_live_range_t next;
   /* Pointer to structures with the same start.	 */
   lra_live_range_t start_next;
+  /* Object whose live range is described by given structure.  */
+  subreg_ranges subreg;
 };
 
 typedef struct lra_copy *lra_copy_t;
@@ -110,6 +113,8 @@ public:
   /* The biggest size mode in which each pseudo reg is referred in
      whole function (possibly via subreg).  */
   machine_mode biggest_mode;
+  /* The real reg MODE.  */
+  machine_mode reg_mode;
   /* Live ranges of the pseudo.	 */
   lra_live_range_t live_ranges;
   /* This member is set up in lra-lives.cc for subsequent
@@ -161,6 +166,12 @@ struct lra_insn_reg
   unsigned int subreg_p : 1;
   /* The corresponding regno of the register.  */
   int regno;
+  /* The start and end of current ref of blocks, remember the use/def can be
+     a normal subreg.  */
+  int start, end;
+  /* The start and end of current ref of hard regs, remember the use/def can be
+     a normal subreg.  */
+  int start_reg, end_reg;
   /* Next reg info of the same insn.  */
   struct lra_insn_reg *next;
 };
@@ -332,6 +343,8 @@ extern struct lra_insn_reg *lra_get_insn_regs (int);
 extern void lra_free_copies (void);
 extern void lra_create_copy (int, int, int);
 extern lra_copy_t lra_get_copy (int);
+extern subreg_range
+get_range_hard_regs (int regno, const subreg_range &r);
 
 extern int lra_new_regno_start;
 extern int lra_constraint_new_regno_start;
@@ -533,4 +546,22 @@ lra_assign_reg_val (int from, int to)
   lra_reg_info[to].offset = lra_reg_info[from].offset;
 }
 
+/* Return the number regs of REGNO.  */
+inline int
+get_nregs (int regno)
+{
+  enum reg_class aclass = lra_get_allocno_class (regno);
+  gcc_assert (aclass != NO_REGS);
+  int nregs = ira_reg_class_max_nregs[aclass][lra_reg_info[regno].reg_mode];
+  return nregs;
+}
+
+extern bitmap_head pseudos_has_subreg_object;
+/* Return true if pseudo REGNO has subreg live range.  */
+inline bool
+has_subreg_object_p (int regno)
+{
+  return bitmap_bit_p (&pseudos_has_subreg_object, regno);
+}
+
 #endif /* GCC_LRA_INT_H */
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index d93921ad302..8a7c653fb09 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.	If not see
    stack memory slots to spilled pseudos.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -97,6 +98,9 @@ static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges.	 */
 static object_allocator<lra_live_range> lra_live_range_pool ("live ranges");
 
+/* Store def/use point of has_subreg_object_p register.  */
+static class subregs_live_points *live_points;
+
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -113,16 +117,26 @@ free_live_range_list (lra_live_range_t lr)
 
 /* Create and return pseudo live range with given attributes.  */
 static lra_live_range_t
-create_live_range (int regno, int start, int finish, lra_live_range_t next)
+create_live_range (int regno, const subreg_ranges &sr, int start, int finish,
+		   lra_live_range_t next)
 {
   lra_live_range_t p = lra_live_range_pool.allocate ();
   p->regno = regno;
   p->start = start;
   p->finish = finish;
   p->next = next;
+  p->subreg = sr;
   return p;
 }
 
+static lra_live_range_t
+create_live_range (int regno, int start, int finish, lra_live_range_t next)
+{
+  subreg_ranges sr = subreg_ranges (1);
+  sr.add_range (1, subreg_range (0, 1));
+  return create_live_range (regno, sr, start, finish, next);
+}
+
 /* Copy live range R and return the result.  */
 static lra_live_range_t
 copy_live_range (lra_live_range_t r)
@@ -164,7 +178,8 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
       if (r1->start < r2->start)
 	std::swap (r1, r2);
 
-      if (r1->start == r2->finish + 1)
+      if (r1->start == r2->finish + 1
+	  && (r1->regno != r2->regno || r1->subreg.same_p (r2->subreg)))
 	{
 	  /* Joint ranges: merge r1 and r2 into r1.  */
 	  r1->start = r2->start;
@@ -174,7 +189,8 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
 	}
       else
 	{
-	  gcc_assert (r2->finish + 1 < r1->start);
+	  gcc_assert (r2->finish + 1 < r1->start
+		      || !r1->subreg.same_p (r2->subreg));
 	  /* Add r1 to the result.  */
 	  if (first == NULL)
 	    first = last = r1;
@@ -237,6 +253,10 @@ sparseset_contains_pseudos_p (sparseset a)
   return false;
 }
 
+static void
+update_pseudo_point (int regno, const subreg_range &range, int point,
+		     enum point_type type);
+
 /* Mark pseudo REGNO as living or dying at program point POINT, depending on
    whether TYPE is a definition or a use.  If this is the first reference to
    REGNO that we've encountered, then create a new live range for it.  */
@@ -249,27 +269,78 @@ update_pseudo_point (int regno, int point, enum point_type type)
   /* Don't compute points for hard registers.  */
   if (HARD_REGISTER_NUM_P (regno))
     return;
+  if (!complete_info_p && lra_get_regno_hard_regno (regno) >= 0)
+    return;
 
-  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+  if (has_subreg_object_p (regno))
     {
-      if (type == DEF_POINT)
-	{
-	  if (sparseset_bit_p (pseudos_live, regno))
-	    {
-	      p = lra_reg_info[regno].live_ranges;
-	      lra_assert (p != NULL);
-	      p->finish = point;
-	    }
-	}
-      else /* USE_POINT */
+      update_pseudo_point (regno, subreg_range (0, get_nregs (regno)), point,
+			   type);
+      return;
+    }
+
+  if (type == DEF_POINT)
+    {
+      if (sparseset_bit_p (pseudos_live, regno))
 	{
-	  if (!sparseset_bit_p (pseudos_live, regno)
-	      && ((p = lra_reg_info[regno].live_ranges) == NULL
-		  || (p->finish != point && p->finish + 1 != point)))
-	    lra_reg_info[regno].live_ranges
-	      = create_live_range (regno, point, -1, p);
+	  p = lra_reg_info[regno].live_ranges;
+	  lra_assert (p != NULL);
+	  p->finish = point;
 	}
     }
+  else /* USE_POINT */
+    {
+      if (!sparseset_bit_p (pseudos_live, regno)
+	  && ((p = lra_reg_info[regno].live_ranges) == NULL
+	      || (p->finish != point && p->finish + 1 != point)))
+	lra_reg_info[regno].live_ranges
+	  = create_live_range (regno, point, -1, p);
+    }
+}
+
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+update_pseudo_point (int regno, const subreg_range &range, int point,
+		     enum point_type type)
+{
+  /* Don't compute points for hard registers.  */
+  if (HARD_REGISTER_NUM_P (regno))
+    return;
+
+  if (!complete_info_p && lra_get_regno_hard_regno (regno) >= 0)
+    {
+      if (has_subreg_object_p (regno))
+	live_points->add_range (regno, get_nregs (regno), range,
+				type == DEF_POINT);
+      return;
+    }
+
+  if (!has_subreg_object_p (regno))
+    {
+      update_pseudo_point (regno, point, type);
+      return;
+    }
+
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "       %s r%d",
+	       type == DEF_POINT ? "def" : "use", regno);
+      fprintf (lra_dump_file, "[subreg: start %d, nregs: %d]", range.start,
+	       range.end - range.start);
+      fprintf (lra_dump_file, " at point %d\n", point);
+    }
+
+  live_points->add_point (regno, get_nregs (regno), range, type == DEF_POINT,
+			  point);
+}
+
+/* Update each range in SR.  */
+static void
+update_pseudo_point (int regno, const subreg_ranges sr, int point,
+		     enum point_type type)
+{
+  for (const subreg_range &range : sr.ranges)
+    update_pseudo_point (regno, range, point, type);
 }
 
 /* Structure describing local BB data used for pseudo
@@ -354,12 +425,18 @@ mark_pseudo_dead (int regno)
   if (!sparseset_bit_p (pseudos_live, regno))
     return;
 
+  /* Just return if regno have partial subreg live for subreg access.  */
+  if (has_subreg_object_p (regno) && !live_points->empty_live_p (regno))
+    return;
+
   sparseset_clear_bit (pseudos_live, regno);
   sparseset_set_bit (start_dying, regno);
 }
 
+static void
+mark_regno_live (int regno, const subreg_range &range, machine_mode mode);
 /* Mark register REGNO (pseudo or hard register) in MODE as being live
-   and update BB_GEN_PSEUDOS.  */
+   and update CURR_BB_INFO.  */
 static void
 mark_regno_live (int regno, machine_mode mode)
 {
@@ -370,6 +447,11 @@ mark_regno_live (int regno, machine_mode mode)
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
 	make_hard_regno_live (regno);
     }
+  else if (has_subreg_object_p (regno))
+    {
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      mark_regno_live (regno, subreg_range (0, get_nregs (regno)), mode);
+    }
   else
     {
       mark_pseudo_live (regno);
@@ -379,9 +461,26 @@ mark_regno_live (int regno, machine_mode mode)
     }
 }
 
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+mark_regno_live (int regno, const subreg_range &range, machine_mode mode)
+{
+  if (HARD_REGISTER_NUM_P (regno) || !has_subreg_object_p (regno))
+    mark_regno_live (regno, mode);
+  else
+    {
+      mark_pseudo_live (regno);
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      if (!range.full_p (get_nregs (regno)))
+	has_subreg_live_p = true;
+      add_subreg_range (curr_bb_info, regno, mode, range, false);
+    }
+}
 
+static void
+mark_regno_dead (int regno, const subreg_range &range, machine_mode mode);
 /* Mark register REGNO (pseudo or hard register) in MODE as being dead
-   and update BB_GEN_PSEUDOS and BB_KILLED_PSEUDOS.  */
+   and update CURR_BB_INFO.  */
 static void
 mark_regno_dead (int regno, machine_mode mode)
 {
@@ -392,6 +491,12 @@ mark_regno_dead (int regno, machine_mode mode)
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
 	make_hard_regno_dead (regno);
     }
+  else if (has_subreg_object_p (regno))
+    {
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      subreg_range range = subreg_range (0, get_nregs (regno));
+      mark_regno_dead (regno, range, mode);
+    }
   else
     {
       mark_pseudo_dead (regno);
@@ -402,7 +507,22 @@ mark_regno_dead (int regno, machine_mode mode)
     }
 }
 
-\f
+/* Like the above mark_regno_dead but for has_subreg_object_p REGNO.  */
+static void
+mark_regno_dead (int regno, const subreg_range &range, machine_mode mode)
+{
+  if (HARD_REGISTER_NUM_P (regno) || !has_subreg_object_p (regno))
+    mark_regno_dead (regno, mode);
+  else
+    {
+      mark_pseudo_dead (regno);
+      machine_mode mode = lra_reg_info[regno].reg_mode;
+      if (!range.full_p (get_nregs (regno)))
+	has_subreg_live_p = true;
+      remove_subreg_range (curr_bb_info, regno, mode, range);
+      add_subreg_range (curr_bb_info, regno, mode, range, true);
+    }
+}
 
 /* This page contains code for making global live analysis of pseudos.
    The code works only when pseudo live info is changed on a BB
@@ -823,6 +943,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   function_abi last_call_abi = default_function_abi;
 
   reg_live_out = DF_LIVE_SUBREG_OUT (bb);
+  bitmap reg_live_partial_out = DF_LIVE_SUBREG_PARTIAL_OUT (bb);
+  subregs_live *range_out = DF_LIVE_SUBREG_RANGE_OUT (bb);
   sparseset_clear (pseudos_live);
   sparseset_clear (pseudos_live_through_calls);
   sparseset_clear (pseudos_live_through_setjumps);
@@ -830,7 +952,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   hard_regs_live &= ~eliminable_regset;
   EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
     {
-      update_pseudo_point (j, curr_point, USE_POINT);
+      if (bitmap_bit_p (reg_live_partial_out, j) && has_subreg_object_p (j))
+	for (const subreg_range &r : range_out->lives.at (j).ranges)
+	  update_pseudo_point (j, get_range_hard_regs (j, r), curr_point,
+			       USE_POINT);
+      else
+	update_pseudo_point (j, curr_point, USE_POINT);
       mark_pseudo_live (j);
     }
 
@@ -1023,8 +1150,11 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	if (reg->type != OP_IN)
 	  {
-	    update_pseudo_point (reg->regno, curr_point, USE_POINT);
-	    mark_regno_live (reg->regno, reg->biggest_mode);
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
+	    update_pseudo_point (reg->regno,
+				 get_range_hard_regs (reg->regno, range),
+				 curr_point, USE_POINT);
+	    mark_regno_live (reg->regno, range, reg->biggest_mode);
 	    /* ??? Should be a no-op for unused registers.  */
 	    check_pseudos_live_through_calls (reg->regno, last_call_abi);
 	  }
@@ -1045,17 +1175,20 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* See which defined values die here.  */
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && !reg_early_clobber_p (reg, n_alt)
+	    && (!reg->subreg_p || has_subreg_object_p (reg->regno)))
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_OUT)
-	      update_pseudo_point (reg->regno, curr_point, DEF_POINT);
-	    mark_regno_dead (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, DEF_POINT);
+	    mark_regno_dead (reg->regno, range, reg->biggest_mode);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && !reg_early_clobber_p (reg, n_alt)
+	    && !reg->subreg_p)
 	  make_hard_regno_dead (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
@@ -1086,7 +1219,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Increment the current program point if we must.  */
       if (sparseset_contains_pseudos_p (unused_set)
-	  || sparseset_contains_pseudos_p (start_dying))
+	  || sparseset_contains_pseudos_p (start_dying) || has_subreg_live_p)
 	next_program_point (curr_point, freq);
 
       /* If we removed the source reg from a simple register copy from the
@@ -1107,9 +1240,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	if (reg->type != OP_OUT)
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_IN)
-	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
-	    mark_regno_live (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, USE_POINT);
+	    mark_regno_live (reg->regno, range, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno, last_call_abi);
 	  }
 
@@ -1129,22 +1265,25 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Mark early clobber outputs dead.  */
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && reg_early_clobber_p (reg, n_alt)
+	    && (!reg->subreg_p || has_subreg_object_p (reg->regno)))
 	  {
+	    const subreg_range &range = subreg_range (reg->start, reg->end);
 	    if (reg->type == OP_OUT)
-	      update_pseudo_point (reg->regno, curr_point, DEF_POINT);
-	    mark_regno_dead (reg->regno, reg->biggest_mode);
+	      update_pseudo_point (reg->regno,
+				   get_range_hard_regs (reg->regno, range),
+				   curr_point, DEF_POINT);
+	    mark_regno_dead (reg->regno, range, reg->biggest_mode);
 
 	    /* We're done processing inputs, so make sure early clobber
 	       operands that are both inputs and outputs are still live.  */
 	    if (reg->type == OP_INOUT)
-	      mark_regno_live (reg->regno, reg->biggest_mode);
+	      mark_regno_live (reg->regno, range, reg->biggest_mode);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN
-	    && reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	if (reg->type != OP_IN && reg_early_clobber_p (reg, n_alt)
+	    && !reg->subreg_p)
 	  {
 	    struct lra_insn_reg *reg2;
 
@@ -1160,7 +1299,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* Increment the current program point if we must.  */
       if (sparseset_contains_pseudos_p (dead_set)
-	  || sparseset_contains_pseudos_p (start_dying))
+	  || sparseset_contains_pseudos_p (start_dying) || has_subreg_live_p)
 	next_program_point (curr_point, freq);
 
       /* Update notes.	*/
@@ -1293,13 +1432,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
     {
-      update_pseudo_point (i, curr_point, DEF_POINT);
+      if (has_subreg_object_p (i))
+	update_pseudo_point (i, live_points->subreg_live_ranges.at (i),
+			     curr_point, DEF_POINT);
+      else
+	update_pseudo_point (i, curr_point, DEF_POINT);
       mark_pseudo_dead (i);
     }
 
-    EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
-			      bi)
-      {
+  EXECUTE_IF_SET_IN_BITMAP (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER, j,
+			    bi)
+    {
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
@@ -1400,7 +1543,8 @@ remove_some_program_points_and_update_live_ranges (void)
 	      next_r = r->next;
 	      r->start = map[r->start];
 	      r->finish = map[r->finish];
-	      if (prev_r == NULL || prev_r->start > r->finish + 1)
+	      if (prev_r == NULL || prev_r->start > r->finish + 1
+		  || !prev_r->subreg.same_p (r->subreg))
 		{
 		  prev_r = r;
 		  continue;
@@ -1418,8 +1562,18 @@ remove_some_program_points_and_update_live_ranges (void)
 void
 lra_print_live_range_list (FILE *f, lra_live_range_t r)
 {
-  for (; r != NULL; r = r->next)
-    fprintf (f, " [%d..%d]", r->start, r->finish);
+  if (r != NULL && has_subreg_object_p (r->regno))
+    {
+      for (; r != NULL; r = r->next)
+	{
+	  fprintf (f, " [%d..%d]{", r->start, r->finish);
+	  r->subreg.dump (f);
+	  fprintf (f, "}");
+	}
+    }
+  else
+    for (; r != NULL; r = r->next)
+      fprintf (f, " [%d..%d]", r->start, r->finish);
   fprintf (f, "\n");
 }
 
@@ -1492,7 +1646,84 @@ compress_live_ranges (void)
     }
 }
 
-\f
+/* Use to temp record subregs live range in create_subregs_live_ranges function.
+ */
+class subreg_live_item
+{
+public:
+  subreg_ranges subreg;
+  int start, finish;
+};
+
+/* Create subreg live ranges from objects def/use point info.  */
+static void
+create_subregs_live_ranges ()
+{
+  for (const auto &subreg_point_it : live_points->subreg_points)
+    {
+      unsigned int regno = subreg_point_it.first;
+      const class live_points &points = subreg_point_it.second;
+      class lra_reg *reg_info = &lra_reg_info[regno];
+      std::vector<subreg_live_item> temps;
+      gcc_assert (has_subreg_object_p (regno));
+      for (const auto &point_it : points.points)
+	{
+	  int point = point_it.first;
+	  const live_point &regs = point_it.second;
+	  gcc_assert (temps.empty () || temps.back ().finish <= point);
+	  if (!regs.use_reg.empty_p ())
+	    {
+	      if (temps.empty ())
+		temps.push_back ({regs.use_reg, point, -1});
+	      else if (temps.back ().finish == -1)
+		{
+		  if (!temps.back ().subreg.same_p (regs.use_reg))
+		    {
+		      if (temps.back ().start == point)
+			temps.back ().subreg.add_ranges (regs.use_reg);
+		      else
+			{
+			  temps.back ().finish = point - 1;
+
+			  subreg_ranges temp = regs.use_reg;
+			  temp.add_ranges (temps.back ().subreg);
+			  temps.push_back ({temp, point, -1});
+			}
+		    }
+		}
+	      else if (temps.back ().subreg.same_p (regs.use_reg)
+		       && (temps.back ().finish == point
+			   || temps.back ().finish + 1 == point))
+		temps.back ().finish = -1;
+	      else
+		temps.push_back ({regs.use_reg, point, -1});
+	    }
+	  if (!regs.def_reg.empty_p ())
+	    {
+	      gcc_assert (!temps.empty ());
+	      if (regs.def_reg.include_ranges_p (temps.back ().subreg))
+		temps.back ().finish = point;
+	      else if (temps.back ().subreg.include_ranges_p (regs.def_reg))
+		{
+		  temps.back ().finish = point;
+
+		  subreg_ranges diff = temps.back ().subreg;
+		  diff.remove_ranges (regs.def_reg);
+		  temps.push_back ({diff, point + 1, -1});
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	}
+
+      gcc_assert (reg_info->live_ranges == NULL);
+
+      for (const subreg_live_item &item : temps)
+	reg_info->live_ranges
+	  = create_live_range (regno, item.subreg, item.start, item.finish,
+			       reg_info->live_ranges);
+    }
+}
 
 /* The number of the current live range pass.  */
 int lra_live_range_iter;
@@ -1573,6 +1804,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   int n = inverted_rev_post_order_compute (cfun, rpo);
   lra_assert (n == n_basic_blocks_for_fn (cfun));
   bb_live_change_p = false;
+  has_subreg_live_p = false;
+  live_points = new subregs_live_points ();
   for (i = 0; i < n; ++i)
     {
       bb = BASIC_BLOCK_FOR_FN (cfun, rpo[i]);
@@ -1655,9 +1888,14 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	}
     }
   lra_live_max_point = curr_point;
+  create_subregs_live_ranges ();
   if (lra_dump_file != NULL)
-    print_live_ranges (lra_dump_file);
+    {
+      live_points->dump (lra_dump_file);
+      print_live_ranges (lra_dump_file);
+    }
   /* Clean up.	*/
+  delete live_points;
   sparseset_free (unused_set);
   sparseset_free (dead_set);
   sparseset_free (start_dying);
diff --git a/gcc/lra.cc b/gcc/lra.cc
index bcc00ff7d6b..23fc0daf1ed 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -566,6 +566,54 @@ lra_asm_insn_error (rtx_insn *insn)
 /* Pools for insn reg info.  */
 object_allocator<lra_insn_reg> lra_insn_reg_pool ("insn regs");
 
+/* Return the subreg range of rtx SUBREG in blocks.  */
+static subreg_range
+get_range_blocks (int regno, bool subreg_p, machine_mode reg_mode,
+		  poly_int64 offset, poly_int64 size)
+{
+  gcc_assert (has_subreg_object_p (regno));
+  int nblocks = get_nblocks (reg_mode);
+  if (!subreg_p)
+    return subreg_range (0, nblocks);
+
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+  poly_int64 left = offset + size;
+
+  int subreg_start = -1;
+  int subreg_nregs = -1;
+  for (int i = 0; i < nblocks; i += 1)
+    {
+      poly_int64 right = unit_size * (i + 1);
+      if (subreg_start < 0 && maybe_lt (offset, right))
+	subreg_start = i;
+      if (subreg_nregs < 0 && maybe_le (left, right))
+	{
+	  subreg_nregs = i + 1 - subreg_start;
+	  break;
+	}
+    }
+  gcc_assert (subreg_start >= 0 && subreg_nregs > 0);
+  return subreg_range (subreg_start, subreg_start + subreg_nregs);
+}
+
+/* Return the subreg range of rtx SUBREG in hard regs.  */
+subreg_range
+get_range_hard_regs (int regno, const subreg_range &r)
+{
+  if (!has_subreg_object_p (regno) || lra_reg_info[regno].reg_mode == VOIDmode)
+    return subreg_range (0, 1);
+  enum reg_class aclass = lra_get_allocno_class (regno);
+  gcc_assert (aclass != NO_REGS);
+  int nregs = ira_reg_class_max_nregs[aclass][lra_reg_info[regno].reg_mode];
+  int nblocks = get_nblocks (lra_reg_info[regno].reg_mode);
+  int times = nblocks / nregs;
+  gcc_assert (nblocks >= nregs && times * nregs == nblocks);
+  int start = r.start / times;
+  int end = CEIL (r.end, times);
+
+  return subreg_range (start, end);
+}
+
 /* Create LRA insn related info about a reference to REGNO in INSN
    with TYPE (in/out/inout), biggest reference mode MODE, flag that it
    is reference through subreg (SUBREG_P), and reference to the next
@@ -573,21 +621,49 @@ object_allocator<lra_insn_reg> lra_insn_reg_pool ("insn regs");
    alternatives in which it can be early clobbered are given by
    EARLY_CLOBBER_ALTS.  */
 static struct lra_insn_reg *
-new_insn_reg (rtx_insn *insn, int regno, enum op_type type,
-	      machine_mode mode, bool subreg_p,
-	      alternative_mask early_clobber_alts,
+new_insn_reg (rtx_insn *insn, int regno, enum op_type type, poly_int64 size,
+	      poly_int64 offset, machine_mode mode, machine_mode reg_mode,
+	      bool subreg_p, alternative_mask early_clobber_alts,
 	      struct lra_insn_reg *next)
 {
   lra_insn_reg *ir = lra_insn_reg_pool.allocate ();
   ir->type = type;
   ir->biggest_mode = mode;
-  if (NONDEBUG_INSN_P (insn)
-      && partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
-    lra_reg_info[regno].biggest_mode = mode;
+  if (NONDEBUG_INSN_P (insn))
+    {
+      if (partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
+	{
+	  lra_reg_info[regno].biggest_mode = mode;
+	}
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  if (lra_reg_info[regno].reg_mode == VOIDmode)
+	    lra_reg_info[regno].reg_mode = reg_mode;
+	  else
+	    gcc_assert (maybe_eq (GET_MODE_SIZE (lra_reg_info[regno].reg_mode),
+				  GET_MODE_SIZE (reg_mode)));
+	}
+    }
   ir->subreg_p = subreg_p;
   ir->early_clobber_alts = early_clobber_alts;
   ir->regno = regno;
   ir->next = next;
+  if (has_subreg_object_p (regno))
+    {
+      const subreg_range &r
+	= get_range_blocks (regno, subreg_p, reg_mode, offset, size);
+      ir->start = r.start;
+      ir->end = r.end;
+      const subreg_range &r_hard = get_range_hard_regs (regno, r);
+      ir->start_reg = r_hard.start;
+      ir->end_reg = r_hard.end;
+    }
+  else
+    {
+      ir->start = 0;
+      ir->end = 1;
+    }
   return ir;
 }
 
@@ -887,11 +963,18 @@ collect_non_operand_hard_regs (rtx_insn *insn, rtx *x,
       return list;
   mode = GET_MODE (op);
   subreg_p = false;
+  poly_int64 size = GET_MODE_SIZE (mode);
+  poly_int64 offset = 0;
   if (code == SUBREG)
     {
       mode = wider_subreg_mode (op);
       if (read_modify_subreg_p (op))
-	subreg_p = true;
+	{
+	  offset = SUBREG_BYTE (op);
+	  subreg_p = true;
+	}
+      else
+	size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)));
       op = SUBREG_REG (op);
       code = GET_CODE (op);
     }
@@ -925,7 +1008,8 @@ collect_non_operand_hard_regs (rtx_insn *insn, rtx *x,
 		   && ! (FIRST_STACK_REG <= regno
 			 && regno <= LAST_STACK_REG));
 #endif
-	      list = new_insn_reg (data->insn, regno, type, mode, subreg_p,
+	      list = new_insn_reg (data->insn, regno, type, size, offset, mode,
+				   GET_MODE (op), subreg_p,
 				   early_clobber ? ALL_ALTERNATIVES : 0, list);
 	    }
 	}
@@ -1354,6 +1438,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].preferred_hard_regno_profit1 = 0;
   lra_reg_info[i].preferred_hard_regno_profit2 = 0;
   lra_reg_info[i].biggest_mode = VOIDmode;
+  lra_reg_info[i].reg_mode = VOIDmode;
   lra_reg_info[i].live_ranges = NULL;
   lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
   lra_reg_info[i].last_reload = 0;
@@ -1459,7 +1544,21 @@ lra_get_copy (int n)
   return copy_vec[n];
 }
 
-\f
+/* Return true if REG occupied the same blocks as OFFSET + SIZE subreg.  */
+static bool
+reg_same_range_p (lra_insn_reg *reg, poly_int64 offset, poly_int64 size,
+		  bool subreg_p)
+{
+  if (has_subreg_object_p (reg->regno))
+    {
+      const subreg_range &r
+	= get_range_blocks (reg->regno, subreg_p,
+			    lra_reg_info[reg->regno].reg_mode, offset, size);
+      return r.start == reg->start && r.end == reg->end;
+    }
+  else
+    return true;
+}
 
 /* This page contains code dealing with info about registers in
    insns.  */
@@ -1483,11 +1582,18 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
   code = GET_CODE (x);
   mode = GET_MODE (x);
   subreg_p = false;
+  poly_int64 size = GET_MODE_SIZE (mode);
+  poly_int64 offset = 0;
   if (GET_CODE (x) == SUBREG)
     {
       mode = wider_subreg_mode (x);
       if (read_modify_subreg_p (x))
-	subreg_p = true;
+	{
+	  offset = SUBREG_BYTE (x);
+	  subreg_p = true;
+	}
+      else
+	size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)));
       x = SUBREG_REG (x);
       code = GET_CODE (x);
     }
@@ -1499,7 +1605,8 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
       expand_reg_info ();
       if (bitmap_set_bit (&lra_reg_info[regno].insn_bitmap, INSN_UID (insn)))
 	{
-	  data->regs = new_insn_reg (data->insn, regno, type, mode, subreg_p,
+	  data->regs = new_insn_reg (data->insn, regno, type, size, offset,
+				     mode, GET_MODE (x), subreg_p,
 				     early_clobber_alts, data->regs);
 	  return;
 	}
@@ -1508,12 +1615,14 @@ add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
 	  for (curr = data->regs; curr != NULL; curr = curr->next)
 	    if (curr->regno == regno)
 	      {
-		if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
+		if (!reg_same_range_p (curr, offset, size, subreg_p)
+		    || curr->biggest_mode != mode)
 		  /* The info cannot be integrated into the found
 		     structure.  */
-		  data->regs = new_insn_reg (data->insn, regno, type, mode,
-					     subreg_p, early_clobber_alts,
-					     data->regs);
+		  data->regs
+		    = new_insn_reg (data->insn, regno, type, size, offset, mode,
+				    GET_MODE (x), subreg_p, early_clobber_alts,
+				    data->regs);
 		else
 		  {
 		    if (curr->type != type)
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 0/7] ira/lra: Support subreg coalesce
  2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
                   ` (6 preceding siblings ...)
  2023-11-12  9:58 ` [PATCH V2 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
@ 2023-11-12 12:08 ` Lehua Ding
  7 siblings, 0 replies; 9+ messages in thread
From: Lehua Ding @ 2023-11-12 12:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, richard.sandiford, juzhe.zhong

These patches found a new bug and I resend a v3 version, I'm sorry about 
this.

V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636178.html

On 2023/11/12 17:58, Lehua Ding wrote:
> Hi,
> 
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).
> 
> Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):
> 
> ```
> #include <riscv_vector.h>
> 
> void
> foo (int32_t *in, int32_t *out, size_t m)
> {
>    vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
>    vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
>    vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
>    for (size_t i = 0; i < m; i++)
>      {
>        v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
>        v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
>      }
>    *(vint32m1_t*)(out+4*0) = v0;
>    *(vint32m1_t*)(out+4*1) = v1;
> }
> ```
> 
> Before these patchs:
> 
> ```
> foo:
> 	li	a5,32
> 	vsetvli	zero,a5,e32,m2,ta,ma
> 	vle32.v	v4,0(a0)
> 	vmv1r.v	v2,v4
> 	vmv1r.v	v1,v5
> 	beq	a2,zero,.L2
> 	li	a5,0
> 	vsetivli	zero,4,e32,m1,ta,ma
> .L3:
> 	addi	a5,a5,1
> 	vadd.vv	v2,v2,v2
> 	vmul.vv	v1,v1,v1
> 	bne	a2,a5,.L3
> .L2:
> 	vs1r.v	v2,0(a1)
> 	addi	a1,a1,16
> 	vs1r.v	v1,0(a1)
> 	ret
> ```
> 
> After these patchs:
> 
> ```
> foo:
> 	li	a5,32
> 	vsetvli	zero,a5,e32,m2,ta,ma
> 	vle32.v	v2,0(a0)
> 	beq	a2,zero,.L2
> 	li	a5,0
> 	vsetivli	zero,4,e32,m1,ta,ma
> .L3:
> 	addi	a5,a5,1
> 	vadd.vv	v2,v2,v2
> 	vmul.vv	v3,v3,v3
> 	bne	a2,a5,.L3
> .L2:
> 	vs1r.v	v2,0(a1)
> 	addi	a1,a1,16
> 	vs1r.v	v3,0(a1)
> 	ret
> ```
> 
> As you can see, the two redundant vmv1r.v instructions were removed.
> The reason for the two redundant vmv1r.v instructions is because
> the current ira pass is being conservative in calculating the live
> range of pseduo registers that occupy multil hardregs. As in the
> following two RTL instructions. Where r134 occupies two physical
> registers and r135 and r136 occupy one physical register.
> At insn 12 point, ira considers the entire r134 pseudo register
> to be live, so r135 is in conflict with r134, as shown in the ira
> dump info. Then when the physical registers are allocated, r135 and
> r134 are allocated first because they are inside the loop body and
> have higher priority. This makes it difficult to assign r136 to
> overlap with r134, i.e., to assign r136 to hr100, thus eliminating
> the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
> appear.
> 
> If we refine the live information of r134 to the case of each subreg,
> we can remove this conflict. We can then create copies of the set
> with subreg reference, thus increasing the priority of the r134 allocation,
> which allow registers with bigger alignment requirements to prioritize
> the allocation of physical registers. In RVV, pseudo registers occupying
> two physical registers need to be time-2 aligned.
> 
> ```
> (insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
>          (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole}
>       (nil))
> (insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
>          (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole}
>       (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
>          (nil)))
> ```
> 
> ira dump:
> 
> ;; a1(r136,l0) conflicts: a3(r135,l0)
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
> ;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
> ;; a6(r134,l0) conflicts: a3(r135,l0)
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
> ;;
> ;; ...
>        Popping a1(r135,l0)  --         assign reg 97
>        Popping a3(r136,l0)  --         assign reg 98
>        Popping a4(r137,l0)  --         assign reg 15
>        Popping a5(r140,l0)  --         assign reg 12
>        Popping a10(r145,l0)  --         assign reg 12
>        Popping a2(r139,l0)  --         assign reg 11
>        Popping a9(r144,l0)  --         assign reg 11
>        Popping a0(r142,l0)  --         assign reg 11
>        Popping a6(r134,l0)  --         assign reg 100
>        Popping a7(r143,l0)  --         assign reg 10
>        Popping a8(r141,l0)  --         assign reg 15
> 
> The AArch64 SVE has the same problem. Consider the following
> code (https://godbolt.org/z/MYrK7Ghaj):
> 
> ```
> #include <arm_sve.h>
> 
> int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out)
> {
>    svint64x4_t result = svld4_s64 (pg, base);
>    svint64_t v0 = svget4_s64(result, 0);
>    svint64_t v1 = svget4_s64(result, 1);
>    svint64_t v2 = svget4_s64(result, 2);
>    svint64_t v3 = svget4_s64(result, 3);
> 
>    for (int i = 0; i < n; i += 1)
>      {
>          svint64_t v18 = svld1_s64(pg, in1);
>          svint64_t v19 = svld1_s64(pg, in2);
>          v0 = svmad_s64_z(pg, v0, v18, v19);
>          v1 = svmad_s64_z(pg, v1, v18, v19);
>          v2 = svmad_s64_z(pg, v2, v18, v19);
>          v3 = svmad_s64_z(pg, v3, v18, v19);
>      }
>    svst1_s64(pg, out+0,v0);
>    svst1_s64(pg, out+1,v1);
>    svst1_s64(pg, out+2,v2);
>    svst1_s64(pg, out+3,v3);
> }
> ```
> 
> Before these patchs:
> 
> ```
> bar:
> 	ld4d	{z4.d - z7.d}, p0/z, [x0]
> 	mov	z26.d, z4.d
> 	mov	z27.d, z5.d
> 	mov	z28.d, z6.d
> 	mov	z29.d, z7.d
> 	cmp	w1, 0
> 	...
> ```
> 
> After these patchs:
> 
> ```
> bar:
> 	ld4d	{z28.d - z31.d}, p0/z, [x0]
> 	cmp	w1, 0
> 	...
> ```
> 
> Lehua Ding (7):
>    df: Add DF_LIVE_SUBREG problem
>    ira: Switch to live_subreg data
>    ira: Support subreg live range track
>    ira: Support subreg copy
>    ira: Add all nregs >= 2 pseudos to tracke subreg list
>    lra: Switch to live_subreg data flow
>    lra: Support subreg live range track and conflict detect
> 
>   gcc/Makefile.in          |   1 +
>   gcc/df-problems.cc       | 889 ++++++++++++++++++++++++++++++++++++++-
>   gcc/df.h                 |  67 +++
>   gcc/hard-reg-set.h       |  33 ++
>   gcc/ira-build.cc         | 456 ++++++++++++++++----
>   gcc/ira-color.cc         | 851 ++++++++++++++++++++++++++-----------
>   gcc/ira-conflicts.cc     | 221 +++++++---
>   gcc/ira-emit.cc          |  24 +-
>   gcc/ira-int.h            |  67 ++-
>   gcc/ira-lives.cc         | 507 ++++++++++++++++------
>   gcc/ira.cc               |  73 ++--
>   gcc/lra-assigns.cc       | 111 ++++-
>   gcc/lra-coalesce.cc      |  20 +-
>   gcc/lra-constraints.cc   | 111 +++--
>   gcc/lra-int.h            |  33 ++
>   gcc/lra-lives.cc         | 660 ++++++++++++++++++++++++-----
>   gcc/lra-remat.cc         |  13 +-
>   gcc/lra-spills.cc        |  22 +-
>   gcc/lra.cc               | 139 +++++-
>   gcc/regs.h               |   7 +
>   gcc/subreg-live-range.cc | 628 +++++++++++++++++++++++++++
>   gcc/subreg-live-range.h  | 333 +++++++++++++++
>   gcc/timevar.def          |   1 +
>   23 files changed, 4490 insertions(+), 777 deletions(-)
>   create mode 100644 gcc/subreg-live-range.cc
>   create mode 100644 gcc/subreg-live-range.h
> 

-- 
Best,
Lehua (RiVAI)
lehua.ding@rivai.ai

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-12 12:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-12  9:58 [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding
2023-11-12  9:58 ` [PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem Lehua Ding
2023-11-12  9:58 ` [PATCH V2 2/7] ira: Switch to live_subreg data Lehua Ding
2023-11-12  9:58 ` [PATCH V2 3/7] ira: Support subreg live range track Lehua Ding
2023-11-12  9:58 ` [PATCH V2 4/7] ira: Support subreg copy Lehua Ding
2023-11-12  9:58 ` [PATCH V2 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list Lehua Ding
2023-11-12  9:58 ` [PATCH V2 6/7] lra: Switch to live_subreg data flow Lehua Ding
2023-11-12  9:58 ` [PATCH V2 7/7] lra: Support subreg live range track and conflict detect Lehua Ding
2023-11-12 12:08 ` [PATCH V2 0/7] ira/lra: Support subreg coalesce Lehua Ding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).