public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
@ 2012-08-17 16:36 Uros Bizjak
  2012-08-17 18:34 ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-08-17 16:36 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Richard Guenther, Yuri Rumyantsev, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

Hello!

Attached patch fixes one of the critical problems in combine.c:
combine pass blindly propagates hard registers into insn patterns and
this way creates partially invalid instructions. Most of the times,
reload is able to fix these inconsistencies, but operands with invalid
hard registers block IRA/reload to allocate most appropriate registers
from critically limited register sets, leading to spill failures.

To counter these problems, x86 port invented operand predicates like
"reg_not_xmm0_operand" and derivatives that prevented combine pass
from propagating invalid hard regs. This approach in fact papered over
real problem in the gcc infrastructure.

Attached patch introduces operand constraint checks directly into
recog_for_combine. Operands of valid instructions are checked against
their constraints and combined insn is rejected if they doesn't fit.

IMO, this patch is prerequisite for enabling scheduling pass on x86.
Proposed priority scheduling of function operands patch will hopefully
fix other issues with limited register set hard register allocations.

2012-08-17  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/46829
	* combine.c (recog_for_combine): Check operand constraints
	in case hard registers were propagater into insn pattern.

testsuite/ChangeLog:

2012-08-17  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/46829
	* gcc.target/i386/pr46829.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

OK for mainline?

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 3478 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 190480)
+++ combine.c	(working copy)
@@ -10507,6 +10507,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
   int i;
   rtx notes = 0;
   rtx old_notes, old_pat;
+  int old_icode;
 
   /* If PAT is a PARALLEL, check to see if it contains the CLOBBER
      we use to indicate that something didn't match.  If we find such a
@@ -10566,6 +10567,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
 	  print_rtl_single (dump_file, pat);
 	}
     }
+
   PATTERN (insn) = old_pat;
   REG_NOTES (insn) = old_notes;
 
@@ -10607,6 +10609,86 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
       pat = newpat;
     }
 
+  old_pat = PATTERN (insn);
+  old_notes = REG_NOTES (insn);
+  old_icode = INSN_CODE (insn);
+  PATTERN (insn) = pat;
+  REG_NOTES (insn) = notes;
+
+  /* Check operand constraints in case hard registers were propagated
+     into insn pattern.  This check prevents combine pass from
+     generating insn patterns with invalid hard register operands.
+     These invalid insns can eventually confuse reload to error out
+     with a spill failure.  See also PR 46829.  */
+  if (insn_code_number >= 0
+      && insn_code_number != NOOP_MOVE_INSN_CODE
+      && (INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0)
+    {
+      extract_insn (insn);
+      preprocess_constraints ();
+
+      for (i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op = recog_data.operand[i];
+	  enum machine_mode mode = GET_MODE (op);
+	  struct operand_alternative *op_alt;
+	  int offset = 0;
+	  bool win;
+	  int j;
+
+	  /* A unary operator may be accepted by the predicate, but it
+	     is irrelevant for matching constraints.  */
+	  if (UNARY_P (op))
+	    op = XEXP (op, 0);
+
+	  if (GET_CODE (op) == SUBREG)
+	    {
+	      if (REG_P (SUBREG_REG (op))
+		  && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER)
+		offset = subreg_regno_offset (REGNO (SUBREG_REG (op)),
+					      GET_MODE (SUBREG_REG (op)),
+					      SUBREG_BYTE (op),
+					      GET_MODE (op));
+	      op = SUBREG_REG (op);
+	    }
+
+	  if (!(REG_P (op) && HARD_REGISTER_P (op)))
+	    continue;
+
+	  op_alt = recog_op_alt[i];
+
+	  win = false;
+	  for (j = 0; j < recog_data.n_alternatives; j++)
+	    {
+	      if (op_alt[j].anything_ok
+		  || (op_alt[j].matches != -1
+		      && rtx_equal_p (recog_data.operand[j],
+				      recog_data.operand[op_alt[j].matches]))
+		  || (reg_fits_class_p (op, op_alt[j].cl, offset, mode)))
+		{
+		  win = true;
+		  break;
+		}
+	    }
+
+	  if (!win)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fputs ("Operand failed to match constraints:\n",
+			 dump_file);
+		  print_rtl_single (dump_file, op);
+		}
+	      insn_code_number = -1;
+	      break;
+	    }
+	}
+    }
+
+  PATTERN (insn) = old_pat;
+  REG_NOTES (insn) = old_notes;
+  INSN_CODE (insn) = old_icode;
+
   *pnewpat = pat;
   *pnotes = notes;
 
Index: testsuite/gcc.target/i386/pr46829.c
===================================================================
--- testsuite/gcc.target/i386/pr46829.c	(revision 0)
+++ testsuite/gcc.target/i386/pr46829.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fschedule-insns" } */
+
+struct S
+{
+  int i, j;
+};
+
+extern struct S s[];
+
+extern void bar (int, ...);
+
+void
+foo (int n)
+{
+  while (s[n].i)
+    bar (0, n, s[n].j, s, s[n].i / s[n].j);
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-17 16:36 [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86] Uros Bizjak
@ 2012-08-17 18:34 ` Uros Bizjak
  2012-08-17 19:59   ` Oleg Endo
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-08-17 18:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Richard Guenther, Yuri Rumyantsev, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

On Fri, Aug 17, 2012 at 6:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

> 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/46829
>         * combine.c (recog_for_combine): Check operand constraints
>         in case hard registers were propagater into insn pattern.
>
> testsuite/ChangeLog:
>
> 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/46829
>         * gcc.target/i386/pr46829.c: New test.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

Oh ...

This part:

+                     && rtx_equal_p (recog_data.operand[j],
+                                     recog_data.operand[op_alt[j].matches]))

should read:

+                     && rtx_equal_p (recog_data.operand[i],
+                                     recog_data.operand[op_alt[j].matches]))

Note the "j" vs "i" array index difference in the first line.

Correct patch attached, bootstrapped and re-tested on
x86_64-pc-linux-gnu {,-m32}.

Uros.

[-- Attachment #2: r.diff.txt --]
[-- Type: text/plain, Size: 3478 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 190480)
+++ combine.c	(working copy)
@@ -10507,6 +10507,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
   int i;
   rtx notes = 0;
   rtx old_notes, old_pat;
+  int old_icode;
 
   /* If PAT is a PARALLEL, check to see if it contains the CLOBBER
      we use to indicate that something didn't match.  If we find such a
@@ -10566,6 +10567,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
 	  print_rtl_single (dump_file, pat);
 	}
     }
+
   PATTERN (insn) = old_pat;
   REG_NOTES (insn) = old_notes;
 
@@ -10607,6 +10609,86 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
       pat = newpat;
     }
 
+  old_pat = PATTERN (insn);
+  old_notes = REG_NOTES (insn);
+  old_icode = INSN_CODE (insn);
+  PATTERN (insn) = pat;
+  REG_NOTES (insn) = notes;
+
+  /* Check operand constraints in case hard registers were propagated
+     into insn pattern.  This check prevents combine pass from
+     generating insn patterns with invalid hard register operands.
+     These invalid insns can eventually confuse reload to error out
+     with a spill failure.  See also PR 46829.  */
+  if (insn_code_number >= 0
+      && insn_code_number != NOOP_MOVE_INSN_CODE
+      && (INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0)
+    {
+      extract_insn (insn);
+      preprocess_constraints ();
+
+      for (i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op = recog_data.operand[i];
+	  enum machine_mode mode = GET_MODE (op);
+	  struct operand_alternative *op_alt;
+	  int offset = 0;
+	  bool win;
+	  int j;
+
+	  /* A unary operator may be accepted by the predicate, but it
+	     is irrelevant for matching constraints.  */
+	  if (UNARY_P (op))
+	    op = XEXP (op, 0);
+
+	  if (GET_CODE (op) == SUBREG)
+	    {
+	      if (REG_P (SUBREG_REG (op))
+		  && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER)
+		offset = subreg_regno_offset (REGNO (SUBREG_REG (op)),
+					      GET_MODE (SUBREG_REG (op)),
+					      SUBREG_BYTE (op),
+					      GET_MODE (op));
+	      op = SUBREG_REG (op);
+	    }
+
+	  if (!(REG_P (op) && HARD_REGISTER_P (op)))
+	    continue;
+
+	  op_alt = recog_op_alt[i];
+
+	  win = false;
+	  for (j = 0; j < recog_data.n_alternatives; j++)
+	    {
+	      if (op_alt[j].anything_ok
+		  || (op_alt[j].matches != -1
+		      && rtx_equal_p (recog_data.operand[i],
+				      recog_data.operand[op_alt[j].matches]))
+		  || (reg_fits_class_p (op, op_alt[j].cl, offset, mode)))
+		{
+		  win = true;
+		  break;
+		}
+	    }
+
+	  if (!win)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fputs ("Operand failed to match constraints:\n",
+			 dump_file);
+		  print_rtl_single (dump_file, op);
+		}
+	      insn_code_number = -1;
+	      break;
+	    }
+	}
+    }
+
+  PATTERN (insn) = old_pat;
+  REG_NOTES (insn) = old_notes;
+  INSN_CODE (insn) = old_icode;
+
   *pnewpat = pat;
   *pnotes = notes;
 
Index: testsuite/gcc.target/i386/pr46829.c
===================================================================
--- testsuite/gcc.target/i386/pr46829.c	(revision 0)
+++ testsuite/gcc.target/i386/pr46829.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fschedule-insns" } */
+
+struct S
+{
+  int i, j;
+};
+
+extern struct S s[];
+
+extern void bar (int, ...);
+
+void
+foo (int n)
+{
+  while (s[n].i)
+    bar (0, n, s[n].j, s, s[n].i / s[n].j);
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-17 18:34 ` Uros Bizjak
@ 2012-08-17 19:59   ` Oleg Endo
  2012-08-17 20:32     ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Endo @ 2012-08-17 19:59 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

Hi,

On Fri, 2012-08-17 at 20:34 +0200, Uros Bizjak wrote:
> On Fri, Aug 17, 2012 at 6:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> > 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
> >
> >         PR rtl-optimization/46829
> >         * combine.c (recog_for_combine): Check operand constraints
> >         in case hard registers were propagater into insn pattern.
> >
> > testsuite/ChangeLog:
> >
> > 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
> >
> >         PR rtl-optimization/46829
> >         * gcc.target/i386/pr46829.c: New test.
> >
> > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
> 
> Oh ...
> 
> This part:
> 
> +                     && rtx_equal_p (recog_data.operand[j],
> +                                     recog_data.operand[op_alt[j].matches]))
> 
> should read:
> 
> +                     && rtx_equal_p (recog_data.operand[i],
> +                                     recog_data.operand[op_alt[j].matches]))
> 
> Note the "j" vs "i" array index difference in the first line.
> 
> Correct patch attached, bootstrapped and re-tested on
> x86_64-pc-linux-gnu {,-m32}.
> 

I've briefly checked your patch against rev 190459 for the SH target.
The CSiBE result-size for '-m4 -ml -O2 -mpretend-cmove' shows quite some
movements and improvements.  However, the additional restrictions you've
added effectively disable some of the combine patterns I've been adding
recently to improve code quality on SH and I see some SH specific
failures (haven't run the full test suite though):
FAIL: gcc.target/sh/pr49263.c scan-assembler-not and
FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu
FAIL: gcc.target/sh/pr51244-10.c scan-assembler-not shll|subc|and
FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not cmp/hi
FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not mov\t#0
FAIL: gcc.target/sh/pr51244-8.c scan-assembler-not shad|neg
FAIL: gcc.target/sh/pr51244-9.c scan-assembler-not mov\t#0
FAIL: gcc.target/sh/pr52933-1.c scan-assembler-times div0s 25
FAIL: gcc.target/sh/pr52933-2.c scan-assembler-times div0s 25
FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times addc 3
FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times subc 3
FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times sett 4
FAIL: gcc.target/sh/pr54236-1.c scan-assembler-not movt

I've not checked every single case, but the first two seem to be the
core issues.

1) FAIL: gcc.target/sh/pr49263.c scan-assembler-not and

Test function:

int test (uint8_t x)
{
  return x & 15 ? -40 : -9;
}

Pattern in sh.md that would originally match:

(define_insn "tstsi_t_zero_extract_eq"
  [(set (reg:SI T_REG)
	(eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z")
		(match_operand:SI 1 "const_int_operand")
		(match_operand:SI 2 "const_int_operand"))
         (const_int 0)))]
  ...)

("z" is a constraint for the r0 register on SH)

Combine now says:

Trying 11 -> 12:
Successfully matched this instruction:
(set (reg:SI 147 t)
    (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ])
            (const_int 4 [0x4])
            (const_int 0 [0]))
        (const_int 0 [0])))
Operand failed to match constraints:
(reg:SI 4 r4 [ x ])

The problem in this case is that on SH (and other targets, too) function
args are passed in regs.  In this case it's the hard reg r4, which is
sort of pre-allocated already.  Before your patch, the pattern would
match and reload would stuff in a 'mov r4,r0' to satisfy the "z"
constraint.

2) FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu

Test function:

int
testfunc_01 (int a, int b, int c, int d)
{
  return (a == b || a == d) ? b : c;
}

Pattern in sh.md that would originally match:

(define_insn_and_split "nott"
  [(set (reg:SI T_REG)
	(xor:SI (match_operand:SI 0 "t_reg_operand" "") (const_int 1)))]
  "TARGET_SH1"
  ...)

Combine now says:

Trying 14 -> 15:
Successfully matched this instruction:
(set (reg:SI 147 t)
    (xor:SI (reg:SI 147 t)
        (const_int 1 [0x1])))
Operand failed to match constraints:
(reg:SI 147 t)

In this case the T_REG matching is already in the predicate (similar to
the thing that's being done on x86 you've mentioned before).  I've
started using this kind of matching pattern a while ago to get simpler
and better matching combine patterns around SH's T_REG.
(T_REG is a fixed 1 bit hard reg on SH, treated as SImode throughout the
MD.  It is used to hold comparison results, carry/borrow bits etc).

I'm not sure what would be the possible solutions for those cases.
For 1) maybe pre-allocated regs of args should be moved to pseudos
before combine?
For 2) I could probably try to come up with some matching that would
satisfy the new restrictions in combine and fix up the affected patterns
throughout the MD.

In the worst case, would it be an option to make this restriction in
combine optional (target hook or something)?

Cheers,
Oleg



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-17 19:59   ` Oleg Endo
@ 2012-08-17 20:32     ` Uros Bizjak
  2012-08-17 21:26       ` Oleg Endo
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-08-17 20:32 UTC (permalink / raw)
  To: Oleg Endo
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

On Fri, Aug 17, 2012 at 9:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote:

>> > 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
>> >
>> >         PR rtl-optimization/46829
>> >         * combine.c (recog_for_combine): Check operand constraints
>> >         in case hard registers were propagater into insn pattern.
>> >
>> > testsuite/ChangeLog:
>> >
>> > 2012-08-17  Uros Bizjak  <ubizjak@gmail.com>
>> >
>> >         PR rtl-optimization/46829
>> >         * gcc.target/i386/pr46829.c: New test.
>> >
>> > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
>>
>> Oh ...
>>
>> This part:
>>
>> +                     && rtx_equal_p (recog_data.operand[j],
>> +                                     recog_data.operand[op_alt[j].matches]))
>>
>> should read:
>>
>> +                     && rtx_equal_p (recog_data.operand[i],
>> +                                     recog_data.operand[op_alt[j].matches]))
>>
>> Note the "j" vs "i" array index difference in the first line.
>>
>> Correct patch attached, bootstrapped and re-tested on
>> x86_64-pc-linux-gnu {,-m32}.
>>
>
> I've briefly checked your patch against rev 190459 for the SH target.
> The CSiBE result-size for '-m4 -ml -O2 -mpretend-cmove' shows quite some
> movements and improvements.  However, the additional restrictions you've
> added effectively disable some of the combine patterns I've been adding
> recently to improve code quality on SH and I see some SH specific
> failures (haven't run the full test suite though):
> FAIL: gcc.target/sh/pr49263.c scan-assembler-not and
> FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu
> FAIL: gcc.target/sh/pr51244-10.c scan-assembler-not shll|subc|and
> FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not cmp/hi
> FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not mov\t#0
> FAIL: gcc.target/sh/pr51244-8.c scan-assembler-not shad|neg
> FAIL: gcc.target/sh/pr51244-9.c scan-assembler-not mov\t#0
> FAIL: gcc.target/sh/pr52933-1.c scan-assembler-times div0s 25
> FAIL: gcc.target/sh/pr52933-2.c scan-assembler-times div0s 25
> FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times addc 3
> FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times subc 3
> FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times sett 4
> FAIL: gcc.target/sh/pr54236-1.c scan-assembler-not movt
>
> I've not checked every single case, but the first two seem to be the
> core issues.
>
> 1) FAIL: gcc.target/sh/pr49263.c scan-assembler-not and
>
> Test function:
>
> int test (uint8_t x)
> {
>   return x & 15 ? -40 : -9;
> }
>
> Pattern in sh.md that would originally match:
>
> (define_insn "tstsi_t_zero_extract_eq"
>   [(set (reg:SI T_REG)
>         (eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z")
>                 (match_operand:SI 1 "const_int_operand")
>                 (match_operand:SI 2 "const_int_operand"))
>          (const_int 0)))]
>   ...)
>
> ("z" is a constraint for the r0 register on SH)
>
> Combine now says:
>
> Trying 11 -> 12:
> Successfully matched this instruction:
> (set (reg:SI 147 t)
>     (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ])
>             (const_int 4 [0x4])
>             (const_int 0 [0]))
>         (const_int 0 [0])))
> Operand failed to match constraints:
> (reg:SI 4 r4 [ x ])
>
> The problem in this case is that on SH (and other targets, too) function
> args are passed in regs.  In this case it's the hard reg r4, which is
> sort of pre-allocated already.  Before your patch, the pattern would
> match and reload would stuff in a 'mov r4,r0' to satisfy the "z"
> constraint.

Yes, x86_64 also has register passing convention. So, i.e.:

--cut here--
int foo (int a, int b, int c)
{
        return a + b + c;
}
--cut here--

expands to:

(note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 6 3 2 (set (reg/v:SI 62 [ a ])
        (reg:SI 5 di [ a ])) t.c:2 -1
     (nil))
(insn 3 2 4 2 (set (reg/v:SI 63 [ b ])
        (reg:SI 4 si [ b ])) t.c:2 -1
     (nil))
(insn 4 3 5 2 (set (reg/v:SI 64 [ c ])
        (reg:SI 1 dx [ c ])) t.c:2 -1
     (nil))
(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 (parallel [
            (set (reg:SI 66 [ D.1722 ])
                (plus:SI (reg/v:SI 62 [ a ])
                    (reg/v:SI 63 [ b ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:3 -1
     (nil))
(...)

So, we expand function arguments to pseudos first, and these survive
up to the combine pass. If sh expanded func arguments to pseudos, then
invalid hard register won't be propagated to insn pattern, leaving
reload with much more freedom to choose correct register. I believe
that code improvements you are seeing come from this added reload
freedom.


> 2) FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu
>
> Test function:
>
> int
> testfunc_01 (int a, int b, int c, int d)
> {
>   return (a == b || a == d) ? b : c;
> }
>
> Pattern in sh.md that would originally match:
>
> (define_insn_and_split "nott"
>   [(set (reg:SI T_REG)
>         (xor:SI (match_operand:SI 0 "t_reg_operand" "") (const_int 1)))]
>   "TARGET_SH1"
>   ...)
>
> Combine now says:
>
> Trying 14 -> 15:
> Successfully matched this instruction:
> (set (reg:SI 147 t)
>     (xor:SI (reg:SI 147 t)
>         (const_int 1 [0x1])))
> Operand failed to match constraints:
> (reg:SI 147 t)
>
> In this case the T_REG matching is already in the predicate (similar to
> the thing that's being done on x86 you've mentioned before).  I've
> started using this kind of matching pattern a while ago to get simpler
> and better matching combine patterns around SH's T_REG.
> (T_REG is a fixed 1 bit hard reg on SH, treated as SImode throughout the
> MD.  It is used to hold comparison results, carry/borrow bits etc).

Hm, I'd write this instruction as:

(define_insn_and_split "nott"
  [(set (match_operand:SI 0 "t_reg_operand" "t")
        (xor:SI (match_operand:SI 1 "t_reg_operand" "0") (const_int 1)))]
  "TARGET_SH1"
  ...)

This would match new combine limitation without problems, although
empty constraint should be matched as well. Can you perhaps
investigate a bit, why op_alt[j].anything.ok in

+             if (op_alt[j].anything_ok
+                 || (op_alt[j].matches != -1
+                     && rtx_equal_p (recog_data.operand[i],
+                                     recog_data.operand[op_alt[j].matches]))
+                 || (reg_fits_class_p (op, op_alt[j].cl, offset, mode)))

does not trigger for this operation? Maybe we need to add another
condition to allow empty constraint.

> I'm not sure what would be the possible solutions for those cases.
> For 1) maybe pre-allocated regs of args should be moved to pseudos
> before combine?

x86_64 implement this approach.

> For 2) I could probably try to come up with some matching that would
> satisfy the new restrictions in combine and fix up the affected patterns
> throughout the MD.

No, empty operand constraints should be universally matched.

> In the worst case, would it be an option to make this restriction in
> combine optional (target hook or something)?

IMO, a patch like this that touches many targets won't be applied over
night. Rest assured that there will be a discussion about it.

Thanks for testing the patch, I look forward to comments.

BR,
Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-17 20:32     ` Uros Bizjak
@ 2012-08-17 21:26       ` Oleg Endo
  2012-08-18  8:14         ` [PATCH v2, " Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Endo @ 2012-08-17 21:26 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

On Fri, 2012-08-17 at 22:32 +0200, Uros Bizjak wrote:

> Yes, x86_64 also has register passing convention. So, i.e.:
> 
> --cut here--
> int foo (int a, int b, int c)
> {
>         return a + b + c;
> }
> --cut here--
> 
> expands to:
> 
> (note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 2 6 3 2 (set (reg/v:SI 62 [ a ])
>         (reg:SI 5 di [ a ])) t.c:2 -1
>      (nil))
> (insn 3 2 4 2 (set (reg/v:SI 63 [ b ])
>         (reg:SI 4 si [ b ])) t.c:2 -1
>      (nil))
> (insn 4 3 5 2 (set (reg/v:SI 64 [ c ])
>         (reg:SI 1 dx [ c ])) t.c:2 -1
>      (nil))
> (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)
> (insn 8 5 9 2 (parallel [
>             (set (reg:SI 66 [ D.1722 ])
>                 (plus:SI (reg/v:SI 62 [ a ])
>                     (reg/v:SI 63 [ b ])))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:3 -1
>      (nil))
> (...)
> 
> So, we expand function arguments to pseudos first, and these survive
> up to the combine pass. If sh expanded func arguments to pseudos, then
> invalid hard register won't be propagated to insn pattern, leaving
> reload with much more freedom to choose correct register. I believe
> that code improvements you are seeing come from this added reload
> freedom.

It seems on SH func args are expanded to pseudos, sorry for not checking
this properly.

int test (uint8_t x)
{
  return x & 15 ? -40 : -9;
}

expands to:
(note 1 0 7 NOTE_INSN_DELETED)
(note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 7 3 2 (set (reg:SI 164)
        (reg:SI 4 r4 [ x ])) sh_tmp.cpp:9 -1
     (nil))
(insn 3 2 4 2 (set (reg/v:SI 163 [ x ])
        (zero_extend:SI (subreg:QI (reg:SI 164) 0))) sh_tmp.cpp:9 -1
     (nil))
(note 4 3 9 2 NOTE_INSN_FUNCTION_BEG)
(insn 9 4 10 2 (set (reg:SI 165)
        (and:SI (reg/v:SI 163 [ x ])
            (const_int 15 [0xf]))) sh_tmp.cpp:10 -1
     (nil))
 ...

Then in combine it goes like:

Successfully matched this instruction:
(set (reg/v:SI 163 [ x ])
    (zero_extend:SI (reg:QI 4 r4 [ x ])))
deferring deletion of insn with uid = 2.
modifying insn i3     3 r163:SI=zero_extend(r4:QI)
      REG_DEAD: r4:SI
deferring rescan insn with uid = 3.

[[note: the matched pattern above is "*zero_extendqisi2_compact" ]]

Trying 3 -> 9:
Successfully matched this instruction:
(set (reg:SI 165)
    (and:SI (reg:SI 4 r4 [ x ])
        (const_int 15 [0xf])))
deferring deletion of insn with uid = 3.
modifying insn i3     9 r165:SI=r4:SI&0xf
      REG_DEAD: r4:SI
deferring rescan insn with uid = 9.

Trying 9 -> 11:
Successfully matched this instruction:
(set (reg:SI 167)
    (and:SI (reg:SI 4 r4 [ x ])
        (const_int 15 [0xf])))
deferring deletion of insn with uid = 9.
modifying insn i3    11 r167:SI=r4:SI&0xf
      REG_DEAD: r4:SI
deferring rescan insn with uid = 11.

Trying 11 -> 12:
Successfully matched this instruction:
(set (reg:SI 147 t)
    (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ])
            (const_int 4 [0x4])
            (const_int 0 [0]))
        (const_int 0 [0])))
Operand failed to match constraints:
(reg:SI 4 r4 [ x ])


> Hm, I'd write this instruction as:
> 
> (define_insn_and_split "nott"
>   [(set (match_operand:SI 0 "t_reg_operand" "t")
>         (xor:SI (match_operand:SI 1 "t_reg_operand" "0") (const_int 1)))]
>   "TARGET_SH1"
>   ...)
> 
> This would match new combine limitation without problems, although
> empty constraint should be matched as well. Can you perhaps
> investigate a bit, why op_alt[j].anything.ok in
> 
> +             if (op_alt[j].anything_ok
> +                 || (op_alt[j].matches != -1
> +                     && rtx_equal_p (recog_data.operand[i],
> +                                     recog_data.operand[op_alt[j].matches]))
> +                 || (reg_fits_class_p (op, op_alt[j].cl, offset, mode)))
> 
> does not trigger for this operation? Maybe we need to add another
> condition to allow empty constraint.

It seems that in this case there are no alternatives in recog_data at
all.  The 'for' never runs, 'win' remains 'false'.
Doing

- if (!win)
+ if (!win && recog_data.n_alternatives > 0)

seems to help.

> IMO, a patch like this that touches many targets won't be applied over
> night. Rest assured that there will be a discussion about it.
> 
> Thanks for testing the patch, I look forward to comments.

Sure, no problem.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-17 21:26       ` Oleg Endo
@ 2012-08-18  8:14         ` Uros Bizjak
  2012-08-18  8:23           ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-08-18  8:14 UTC (permalink / raw)
  To: Oleg Endo
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

Hello!

After discussion with Oleg, it looks that it is enough to prevent
wrong registers in the output of the (multi-output) insn pattern. As
far as inputs are concerned, combine already handles limited reload
classes in the right way. The problem with x86 is, that reload tried
to fix output operand of the multi-output ins, where scheduler already
moved load of ax register before this insn.

Version 2 of the patch now handles only output operands. Also,
handling of empty constraints was fixed.

2012-08-18  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/46829
	* combine.c (recog_for_combine): Check operand constraints
	to reject instructions where wrong hard registers were propagated
	into output operands.

testsuite/ChangeLog:

2012-08-18  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/46829
	* gcc.target/i386/pr46829.c: New test.

Patch was bootstrapped and regression tested on
x86_64-unknown-linux-gnu {,-m32}.

Uros.

[-- Attachment #2: p-v2.diff.txt --]
[-- Type: text/plain, Size: 3090 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 190500)
+++ combine.c	(working copy)
@@ -10507,6 +10507,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
   int i;
   rtx notes = 0;
   rtx old_notes, old_pat;
+  int old_icode;
 
   /* If PAT is a PARALLEL, check to see if it contains the CLOBBER
      we use to indicate that something didn't match.  If we find such a
@@ -10566,6 +10567,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
 	  print_rtl_single (dump_file, pat);
 	}
     }
+
   PATTERN (insn) = old_pat;
   REG_NOTES (insn) = old_notes;
 
@@ -10607,6 +10609,93 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
       pat = newpat;
     }
 
+  old_pat = PATTERN (insn);
+  old_notes = REG_NOTES (insn);
+  old_icode = INSN_CODE (insn);
+  PATTERN (insn) = pat;
+  REG_NOTES (insn) = notes;
+
+  /* Check operand constraints in case wrong hard registers were
+     propagated into output operands of insn pattern.  These invalid
+     insns can eventually confuse reload to error out with a
+     spill failure.  See also PR 46829.  */
+  if (insn_code_number >= 0
+      && insn_code_number != NOOP_MOVE_INSN_CODE
+      && (INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0)
+    {
+      extract_insn (insn);
+      preprocess_constraints ();
+
+      for (i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op;
+	  enum machine_mode mode;
+	  struct operand_alternative *op_alt;
+	  int offset = 0;
+	  bool win;
+	  int j;
+
+	  if (recog_data.operand_type[i] == OP_IN)
+	    continue;
+
+	  op = recog_data.operand[i];
+	  mode = GET_MODE (op);
+
+	  /* A unary operator may be accepted by the predicate, but it
+	     is irrelevant for matching constraints.  */
+	  if (UNARY_P (op))
+	    op = XEXP (op, 0);
+
+	  if (GET_CODE (op) == SUBREG)
+	    {
+	      if (REG_P (SUBREG_REG (op))
+		  && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER)
+		offset = subreg_regno_offset (REGNO (SUBREG_REG (op)),
+					      GET_MODE (SUBREG_REG (op)),
+					      SUBREG_BYTE (op),
+					      GET_MODE (op));
+	      op = SUBREG_REG (op);
+	    }
+
+	  if (!(REG_P (op) && HARD_REGISTER_P (op)))
+	    continue;
+
+	  op_alt = recog_op_alt[i];
+
+	  /* Operand has no constraints, anything is OK.  */
+	  win = !recog_data.n_alternatives;
+
+	  for (j = 0; j < recog_data.n_alternatives; j++)
+	    {
+	      if (op_alt[j].anything_ok
+		  || (op_alt[j].matches != -1
+		      && reg_fits_class_p (op, recog_op_alt[op_alt[j].matches][j].cl,
+					   offset, mode))
+		  || reg_fits_class_p (op, op_alt[j].cl, offset, mode))
+		{
+		  win = true;
+		  break;
+		}
+	    }
+
+	  if (!win)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fputs ("Operand failed to match constraints:\n",
+			 dump_file);
+		  print_rtl_single (dump_file, recog_data.operand[i]);
+		}
+	      insn_code_number = -1;
+	      break;
+	    }
+	}
+    }
+
+  PATTERN (insn) = old_pat;
+  REG_NOTES (insn) = old_notes;
+  INSN_CODE (insn) = old_icode;
+
   *pnewpat = pat;
   *pnotes = notes;
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-18  8:14         ` [PATCH v2, " Uros Bizjak
@ 2012-08-18  8:23           ` Uros Bizjak
  2012-08-18 13:54             ` H.J. Lu
  2012-08-20 23:34             ` Oleg Endo
  0 siblings, 2 replies; 10+ messages in thread
From: Uros Bizjak @ 2012-08-18  8:23 UTC (permalink / raw)
  To: Oleg Endo
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:

> After discussion with Oleg, it looks that it is enough to prevent
> wrong registers in the output of the (multi-output) insn pattern. As
> far as inputs are concerned, combine already handles limited reload
> classes in the right way. The problem with x86 is, that reload tried
> to fix output operand of the multi-output ins, where scheduler already
> moved load of ax register before this insn.
>
> Version 2 of the patch now handles only output operands. Also,
> handling of empty constraints was fixed.

... but here we fail testcase from PR46843... so we HAVE to check
input operands.

Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-18  8:23           ` Uros Bizjak
@ 2012-08-18 13:54             ` H.J. Lu
  2012-08-20 23:34             ` Oleg Endo
  1 sibling, 0 replies; 10+ messages in thread
From: H.J. Lu @ 2012-08-18 13:54 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Oleg Endo, gcc-patches, Jakub Jelinek, Richard Guenther,
	Yuri Rumyantsev, Igor Zamyatin

On Sat, Aug 18, 2012 at 1:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>> After discussion with Oleg, it looks that it is enough to prevent
>> wrong registers in the output of the (multi-output) insn pattern. As
>> far as inputs are concerned, combine already handles limited reload
>> classes in the right way. The problem with x86 is, that reload tried
>> to fix output operand of the multi-output ins, where scheduler already
>> moved load of ax register before this insn.
>>
>> Version 2 of the patch now handles only output operands. Also,
>> handling of empty constraints was fixed.
>
> ... but here we fail testcase from PR46843... so we HAVE to check
> input operands.
>

If this is very critical, can you add a target hook to decide
whether to check input/output operands?

-- 
H.J.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-18  8:23           ` Uros Bizjak
  2012-08-18 13:54             ` H.J. Lu
@ 2012-08-20 23:34             ` Oleg Endo
  2012-08-22 19:19               ` Uros Bizjak
  1 sibling, 1 reply; 10+ messages in thread
From: Oleg Endo @ 2012-08-20 23:34 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

On Sat, 2012-08-18 at 10:23 +0200, Uros Bizjak wrote:
> On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> > After discussion with Oleg, it looks that it is enough to prevent
> > wrong registers in the output of the (multi-output) insn pattern. As
> > far as inputs are concerned, combine already handles limited reload
> > classes in the right way. The problem with x86 is, that reload tried
> > to fix output operand of the multi-output ins, where scheduler already
> > moved load of ax register before this insn.
> >
> > Version 2 of the patch now handles only output operands. Also,
> > handling of empty constraints was fixed.
> 
> ... but here we fail testcase from PR46843... so we HAVE to check
> input operands.

Hm, I'm curious ... how would that work for patterns such as

(define_insn "*addc"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
        (plus:SI (plus:SI
                    (match_operand:SI 1 "arith_reg_operand" "%0")
                    (match_operand:SI 2 "arith_reg_or_0_operand" "r"))
		 (match_operand:SI 3 "t_reg_operand" "")))
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "addc	%2,%0"
  [(set_attr "type" "arith")])

... where the predicate "arith_reg_or_0_operand" allows either
"const_int 0" or an "arith_reg_operand", but the constraint "r" tells
reload to load the constant into a register for this insn.
Probably those kind of patterns that rely on this behavior would need to
be rewritten as insn_and_split to do the constant loading 'manually'?

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
  2012-08-20 23:34             ` Oleg Endo
@ 2012-08-22 19:19               ` Uros Bizjak
  0 siblings, 0 replies; 10+ messages in thread
From: Uros Bizjak @ 2012-08-22 19:19 UTC (permalink / raw)
  To: Oleg Endo
  Cc: gcc-patches, Jakub Jelinek, Richard Guenther, Yuri Rumyantsev,
	Igor Zamyatin

On Tue, Aug 21, 2012 at 1:34 AM, Oleg Endo <oleg.endo@t-online.de> wrote:
> On Sat, 2012-08-18 at 10:23 +0200, Uros Bizjak wrote:
>> On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> > After discussion with Oleg, it looks that it is enough to prevent
>> > wrong registers in the output of the (multi-output) insn pattern. As
>> > far as inputs are concerned, combine already handles limited reload
>> > classes in the right way. The problem with x86 is, that reload tried
>> > to fix output operand of the multi-output ins, where scheduler already
>> > moved load of ax register before this insn.
>> >
>> > Version 2 of the patch now handles only output operands. Also,
>> > handling of empty constraints was fixed.
>>
>> ... but here we fail testcase from PR46843... so we HAVE to check
>> input operands.
>
> Hm, I'm curious ... how would that work for patterns such as
>
> (define_insn "*addc"
>   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
>         (plus:SI (plus:SI
>                     (match_operand:SI 1 "arith_reg_operand" "%0")
>                     (match_operand:SI 2 "arith_reg_or_0_operand" "r"))
>                  (match_operand:SI 3 "t_reg_operand" "")))
>    (clobber (reg:SI T_REG))]
>   "TARGET_SH1"
>   "addc %2,%0"
>   [(set_attr "type" "arith")])
>
> ... where the predicate "arith_reg_or_0_operand" allows either
> "const_int 0" or an "arith_reg_operand", but the constraint "r" tells
> reload to load the constant into a register for this insn.
> Probably those kind of patterns that rely on this behavior would need to
> be rewritten as insn_and_split to do the constant loading 'manually'?

I think that we have to introduce a target hook that would "approve"
the combined insn. This way, every target can analyse the combined
insn and handle it in the most appropriate way.

Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-22 19:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-17 16:36 [PATCH, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86] Uros Bizjak
2012-08-17 18:34 ` Uros Bizjak
2012-08-17 19:59   ` Oleg Endo
2012-08-17 20:32     ` Uros Bizjak
2012-08-17 21:26       ` Oleg Endo
2012-08-18  8:14         ` [PATCH v2, " Uros Bizjak
2012-08-18  8:23           ` Uros Bizjak
2012-08-18 13:54             ` H.J. Lu
2012-08-20 23:34             ` Oleg Endo
2012-08-22 19:19               ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).