public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR 61225
@ 2014-05-21  9:58 Zhenqiang Chen
  2014-05-21 12:44 ` Steven Bosscher
  0 siblings, 1 reply; 27+ messages in thread
From: Zhenqiang Chen @ 2014-05-21  9:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek

Hi,

The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
test case tends to check a peephole2 optimization, which optimizes the
following sequence

    2: bx:SI=ax:SI
    25: ax:SI=[bx:SI]
    7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
    8: [bx:SI]=ax:SI
    9: flags:CCZ=cmp(ax:SI,0)
to
   2: bx:SI=ax:SI
   41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}

The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
changes the INSN 25 to

    25: ax:SI=[ax:SI]

Then peephole2 can not optimize it since two memory_operands look like
different.

To fix it, the patch adds another peephole2 rule to read one more
insn. From the register copy, it knows the address is the same.

Bootstrap and no make check regression on X86-64.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-05-21  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        Part of PR rtl-optimization/61225
        * config/i386/i386.md: New peephole2 rule.
        * rtl.c (rtx_equal_p_if_reg_equal): New function.
        * rtl.h (rtx_equal_p_if_reg_equal): New prototype.

testsuite/ChangeLog:
2014-05-21  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * gcc.dg/pr61225.c: New test.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 44e80ec..40639a5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17021,6 +17021,49 @@
                                 operands[5], const0_rtx);
 })

+;; Attempt to use arith or logical operations with memory outputs with
+;; setting of flags.  The difference from previous pattern is that it includes
+;; one more register copy insn, which is copy-forwarded to the address.
+(define_peephole2
+  [(set (match_operand:SWI 6 "register_operand")
+       (match_operand:SWI 0 "register_operand"))
+   (set (match_dup 0)
+       (match_operand:SWI 1 "memory_operand"))
+   (parallel [(set (match_dup 0)
+                  (match_operator:SWI 3 "plusminuslogic_operator"
+                    [(match_dup 0)
+                     (match_operand:SWI 2 "<nonmemory_operand>")]))
+             (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SWI 7 "memory_operand") (match_dup 0))
+   (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
+  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (5, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])
+   && !reg_overlap_mentioned_p (operands[0], operands[6])
+   && (GET_MODE (operands[0]) == GET_MODE (operands[6]))
+   && (MEM_ADDR_SPACE (operands[1]) == MEM_ADDR_SPACE (operands[7]))
+   && rtx_equal_p_if_reg_equal (operands[0], operands[6],
+                               XEXP (operands[1], 0), XEXP (operands[7], 0))
+   && (<MODE>mode != QImode
+       || immediate_operand (operands[2], QImode)
+       || q_regs_operand (operands[2], QImode))
+   && ix86_match_ccmode (peep2_next_insn (4),
+                        (GET_CODE (operands[3]) == PLUS
+                         || GET_CODE (operands[3]) == MINUS)
+                        ? CCGOCmode : CCNOmode)"
+  [(set (match_dup 6) (match_dup 0))
+   (parallel [(set (match_dup 4) (match_dup 5))
+             (set (match_dup 1) (match_op_dup 3 [(match_dup 1)
+                                                 (match_dup 2)]))])]
+{
+  operands[4] = SET_DEST (PATTERN (peep2_next_insn (4)));
+  operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[3]), <MODE>mode,
+                               copy_rtx (operands[1]),
+                               copy_rtx (operands[2]));
+  operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
+                                operands[5], const0_rtx);
+})
+
(define_peephole2
   [(parallel [(set (match_operand:SWI 0 "register_operand")
                   (match_operator:SWI 2 "plusminuslogic_operator"
diff --git a/gcc/rtl.c b/gcc/rtl.c
index 520f9a8..99418fc 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -654,6 +654,82 @@ rtx_equal_p (const_rtx x, const_rtx y)
   return 1;
 }

+/* Return 1 if X and Y are equal rtx's if RX used in X and RY used
+   Y are equal.  */
+
+int
+rtx_equal_p_if_reg_equal (const_rtx rx, const_rtx ry,
+                         const_rtx x, const_rtx y)
+{
+  int i;
+  enum rtx_code code;
+  const char *fmt;
+
+  gcc_assert (REG_P (rx) && REG_P (ry) && x && y);
+
+  code = GET_CODE (x);
+  /* Rtx's of different codes cannot be equal.  */
+  if (code != GET_CODE (y))
+    return 0;
+
+  if (rtx_equal_p (x, y) == 1)
+    return 1;
+
+  /* Since rx == ry, if x == rx && y == ry, x == y.  */
+  if (REG_P (x) && REG_P (y)
+      && GET_MODE (x) == GET_MODE (rx)
+      && REGNO (x) == REGNO (rx)
+      && GET_MODE (y) == GET_MODE (ry)
+      && REGNO (y) == REGNO (ry))
+    return 1;
+
+  /* Compare the elements.  If any pair of corresponding elements
+     fail to match, return 0 for the whole thing.  */
+
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      switch (fmt[i])
+       {
+       case 'e':
+         if (rtx_equal_p (XEXP (x, i), XEXP (y, i)) == 0)
+           {
+             rtx xi = XEXP (x, i);
+             rtx yi = XEXP (y, i);
+
+             if (!(REG_P (xi) && REG_P (yi)
+                   && GET_MODE (xi) == GET_MODE (rx)
+                   && REGNO (xi) == REGNO (rx)
+                   && GET_MODE (yi) == GET_MODE (ry)
+                   && REGNO (yi) == REGNO (ry)))
+               return 0;
+           }
+         break;
+       case 'w':
+         if (XWINT (x, i) != XWINT (y, i))
+           return 0;
+         break;
+
+       case 'n':
+       case 'i':
+         if (XINT (x, i) != XINT (y, i))
+           return 0;
+         break;
+
+       case 'u':
+       case '0':
+       case 't':
+         break;
+
+       default:
+         if (rtx_equal_p (XEXP (x, i), XEXP (y, i)) == 0)
+           return 0;
+         break;
+       }
+    }
+  return 1;
+}
+
 /* Iteratively hash rtx X.  */

 hashval_t
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 10ae1e9..7f9e9e7 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -1983,6 +1983,8 @@ extern unsigned int rtx_size (const_rtx);
 extern rtx shallow_copy_rtx_stat (const_rtx MEM_STAT_DECL);
 #define shallow_copy_rtx(a) shallow_copy_rtx_stat (a MEM_STAT_INFO)
 extern int rtx_equal_p (const_rtx, const_rtx);
+extern int rtx_equal_p_if_reg_equal (const_rtx, const_rtx,
+                                    const_rtx, const_rtx);
 extern hashval_t iterative_hash_rtx (const_rtx, hashval_t);

 /* In emit-rtl.c */
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..1b33e9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-options "-Os -mregparm=2" { target ia32 } } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*(x))
+    foo (x);
+  return x;
+}
+
+int *
+f2 (int *x)
+{
+  if (!--*(x + 4))
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "test\[lq\]" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-05-21  9:58 [PATCH] Fix PR 61225 Zhenqiang Chen
@ 2014-05-21 12:44 ` Steven Bosscher
  2014-05-22  2:01   ` Zhenqiang Chen
  2014-05-22  9:52   ` Zhenqiang Chen
  0 siblings, 2 replies; 27+ messages in thread
From: Steven Bosscher @ 2014-05-21 12:44 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: gcc-patches, Jakub Jelinek

On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
> Hi,
>
> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
> test case tends to check a peephole2 optimization, which optimizes the
> following sequence
>
>     2: bx:SI=ax:SI
>     25: ax:SI=[bx:SI]
>     7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>     8: [bx:SI]=ax:SI
>     9: flags:CCZ=cmp(ax:SI,0)
> to
>    2: bx:SI=ax:SI
>    41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>
> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
> changes the INSN 25 to
>
>     25: ax:SI=[ax:SI]
>
> Then peephole2 can not optimize it since two memory_operands look like
> different.
>
> To fix it, the patch adds another peephole2 rule to read one more
> insn. From the register copy, it knows the address is the same.

That is one complex peephole2 to deal with a transformation like this.
It seems to be like it's a too specific solution for a bigger problem.

Could you please try one of the following solutions instead:

1. Track register values for peephole2 and try different alternatives
based on known register equivalences? E.g. in your example, perhaps
there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
[bx:SI] at that point (or if that information is not available, it is
not very difficult to make it available). Then you could try applying
peephole2 on the original pattern but also on patterns modified with
the known equivalences (i.e. try peephole2 on multiple equivalent
patterns for the same insn). This may expose other peephole2
opportunities, not just the specific one your patch addresses.

2. Avoid copy-prop'ing ax:SI into [bx:SI] to begin with. At insn 7,
both ax:SI and bx:SI are live, so insn 2 is not dead (i.e. cannot be
eliminated) and there is no benefit in this transformation. It only
hides (or at least makes it harder to see) that [ax:SI] in insn 25 is
the same memory reference as [bx:SI] in insn 8. So perhaps the copy
propagation should simply not be done unless it turns at least one
instruction into dead code.


Any reason why this transformation isn't done much earlier, e.g. in combine?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-05-21 12:44 ` Steven Bosscher
@ 2014-05-22  2:01   ` Zhenqiang Chen
  2014-05-22  9:52   ` Zhenqiang Chen
  1 sibling, 0 replies; 27+ messages in thread
From: Zhenqiang Chen @ 2014-05-22  2:01 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches, Jakub Jelinek

On 21 May 2014 20:43, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
>> Hi,
>>
>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
>> test case tends to check a peephole2 optimization, which optimizes the
>> following sequence
>>
>>     2: bx:SI=ax:SI
>>     25: ax:SI=[bx:SI]
>>     7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>>     8: [bx:SI]=ax:SI
>>     9: flags:CCZ=cmp(ax:SI,0)
>> to
>>    2: bx:SI=ax:SI
>>    41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>>
>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
>> changes the INSN 25 to
>>
>>     25: ax:SI=[ax:SI]
>>
>> Then peephole2 can not optimize it since two memory_operands look like
>> different.
>>
>> To fix it, the patch adds another peephole2 rule to read one more
>> insn. From the register copy, it knows the address is the same.
>
> That is one complex peephole2 to deal with a transformation like this.
> It seems to be like it's a too specific solution for a bigger problem.
>
> Could you please try one of the following solutions instead:

Thanks for the comments.

> 1. Track register values for peephole2 and try different alternatives
> based on known register equivalences? E.g. in your example, perhaps
> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
> [bx:SI] at that point (or if that information is not available, it is
> not very difficult to make it available). Then you could try applying
> peephole2 on the original pattern but also on patterns modified with
> the known equivalences (i.e. try peephole2 on multiple equivalent
> patterns for the same insn). This may expose other peephole2
> opportunities, not just the specific one your patch addresses.

I will try this one.

> 2. Avoid copy-prop'ing ax:SI into [bx:SI] to begin with. At insn 7,
> both ax:SI and bx:SI are live, so insn 2 is not dead (i.e. cannot be
> eliminated) and there is no benefit in this transformation. It only
> hides (or at least makes it harder to see) that [ax:SI] in insn 25 is
> the same memory reference as [bx:SI] in insn 8. So perhaps the copy
> propagation should simply not be done unless it turns at least one
> instruction into dead code.

This is a good heuristics. But it will lead copy-prop much more
complexity. copy-prop pass scans INSN one by one to do the
propagation. If there have multi reference INSNs, you can not make the
decision until changing the last reference.

> Any reason why this transformation isn't done much earlier, e.g. in combine?

I do not know. The similar peephole2 rule was added to fix pr49095.
Will try it if Solution 1 does not work.

Thanks!
-Zhenqiang

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-05-21 12:44 ` Steven Bosscher
  2014-05-22  2:01   ` Zhenqiang Chen
@ 2014-05-22  9:52   ` Zhenqiang Chen
  2014-07-18  5:05     ` Jeff Law
  1 sibling, 1 reply; 27+ messages in thread
From: Zhenqiang Chen @ 2014-05-22  9:52 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches, Jakub Jelinek

On 21 May 2014 20:43, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
>> Hi,
>>
>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
>> test case tends to check a peephole2 optimization, which optimizes the
>> following sequence
>>
>>     2: bx:SI=ax:SI
>>     25: ax:SI=[bx:SI]
>>     7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>>     8: [bx:SI]=ax:SI
>>     9: flags:CCZ=cmp(ax:SI,0)
>> to
>>    2: bx:SI=ax:SI
>>    41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>>
>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
>> changes the INSN 25 to
>>
>>     25: ax:SI=[ax:SI]
>>
>> Then peephole2 can not optimize it since two memory_operands look like
>> different.
>>
>> To fix it, the patch adds another peephole2 rule to read one more
>> insn. From the register copy, it knows the address is the same.
>
> That is one complex peephole2 to deal with a transformation like this.
> It seems to be like it's a too specific solution for a bigger problem.
>
> Could you please try one of the following solutions instead:
>
> 1. Track register values for peephole2 and try different alternatives
> based on known register equivalences? E.g. in your example, perhaps
> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
> [bx:SI] at that point (or if that information is not available, it is
> not very difficult to make it available). Then you could try applying
> peephole2 on the original pattern but also on patterns modified with
> the known equivalences (i.e. try peephole2 on multiple equivalent
> patterns for the same insn). This may expose other peephole2
> opportunities, not just the specific one your patch addresses.

Patch is updated according to the comment. There is no REG_EQUAL. So I
add it when replace_oldest_value_reg.

ChangeLog:
2014-05-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        Part of PR rtl-optimization/61225
        * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New proto.
        * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function.
        * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when
        propagating to SET.

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 39462bd..0c4a2b9 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -42,6 +42,7 @@ extern enum calling_abi ix86_function_type_abi (const_tree);

 extern void ix86_reset_previous_fndecl (void);

+extern bool ix86_peephole2_rtx_equal_p (rtx, rtx, rtx, rtx);
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6ffb788..583ebe8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46856,6 +46856,29 @@ ix86_atomic_assign_expand_fenv (tree *hold,
tree *clear, tree *update)
                    atomic_feraiseexcept_call);
 }

+/* OP0 is the SET_DEST of INSN and OP1 is the SET_SRC of INSN.
+   Check whether OP1 and OP6 is equal.  */
+
+bool
+ix86_peephole2_rtx_equal_p (rtx insn, rtx op0, rtx op1, rtx op6)
+{
+  rtx note;
+
+  if (!reg_overlap_mentioned_p (op0, op1) && rtx_equal_p (op1, op6))
+    return true;
+
+  gcc_assert (single_set (insn)
+             && op0 == SET_DEST (single_set (insn))
+             && op1 == SET_SRC (single_set (insn)));
+
+  note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+  if (note
+      && !reg_overlap_mentioned_p (op0, XEXP (note, 0))
+      && rtx_equal_p (XEXP (note, 0), op6))
+    return true;
+
+  return false;
+}
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 44e80ec..b57fc86 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16996,11 +16996,12 @@
                     [(match_dup 0)
                      (match_operand:SWI 2 "<nonmemory_operand>")]))
              (clobber (reg:CC FLAGS_REG))])
-   (set (match_dup 1) (match_dup 0))
+   (set (match_operand:SWI 6 "memory_operand") (match_dup 0))
    (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
    && peep2_reg_dead_p (4, operands[0])
-   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && ix86_peephole2_rtx_equal_p (peep2_next_insn (0), operands[0],
+                                 operands[1], operands[6])
    && !reg_overlap_mentioned_p (operands[0], operands[2])
    && (<MODE>mode != QImode
        || immediate_operand (operands[2], QImode)
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 7a5a4f6..4e09724 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -510,6 +510,22 @@ replace_oldest_value_reg (rtx *loc, enum
reg_class cl, rtx insn,
        fprintf (dump_file, "insn %u: replaced reg %u with %u\n",
                 INSN_UID (insn), REGNO (*loc), REGNO (new_rtx));

+      if (single_set (insn) && GET_CODE (PATTERN (insn)) == SET
+         && GET_MODE_CLASS (GET_MODE (SET_DEST (insn))) != MODE_CC
+         && GET_CODE (SET_SRC (single_set (insn))) != COMPARE)
+       {
+         rtx set = single_set (insn);
+         rtx dest = SET_DEST (set);
+
+         if (REG_P (dest) && REG_P (new_rtx)
+             && REGNO (dest) == REGNO (new_rtx))
+           /* REGNO of the NEW_RTX is modified by INSN.  No way to forwarded
+              it any more.  So add REG_EQUAL note to record its previous
+              value.  This note can be used to check whether two RTXs
+              equal or not.  */
+           add_reg_note (insn, REG_EQUAL, copy_rtx (SET_SRC (set)));
+       }
+
       validate_change (insn, loc, new_rtx, 1);
       return true;
     }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-05-22  9:52   ` Zhenqiang Chen
@ 2014-07-18  5:05     ` Jeff Law
  2014-08-04  8:24       ` Zhenqiang Chen
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Law @ 2014-07-18  5:05 UTC (permalink / raw)
  To: Zhenqiang Chen, Steven Bosscher; +Cc: gcc-patches, Jakub Jelinek

On 05/22/14 03:52, Zhenqiang Chen wrote:
> On 21 May 2014 20:43, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
>>> Hi,
>>>
>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
>>> test case tends to check a peephole2 optimization, which optimizes the
>>> following sequence
>>>
>>>      2: bx:SI=ax:SI
>>>      25: ax:SI=[bx:SI]
>>>      7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>>>      8: [bx:SI]=ax:SI
>>>      9: flags:CCZ=cmp(ax:SI,0)
>>> to
>>>     2: bx:SI=ax:SI
>>>     41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>>>
>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
>>> changes the INSN 25 to
>>>
>>>      25: ax:SI=[ax:SI]
>>>
>>> Then peephole2 can not optimize it since two memory_operands look like
>>> different.
>>>
>>> To fix it, the patch adds another peephole2 rule to read one more
>>> insn. From the register copy, it knows the address is the same.
>>
>> That is one complex peephole2 to deal with a transformation like this.
>> It seems to be like it's a too specific solution for a bigger problem.
>>
>> Could you please try one of the following solutions instead:
>>
>> 1. Track register values for peephole2 and try different alternatives
>> based on known register equivalences? E.g. in your example, perhaps
>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
>> [bx:SI] at that point (or if that information is not available, it is
>> not very difficult to make it available). Then you could try applying
>> peephole2 on the original pattern but also on patterns modified with
>> the known equivalences (i.e. try peephole2 on multiple equivalent
>> patterns for the same insn). This may expose other peephole2
>> opportunities, not just the specific one your patch addresses.
>
> Patch is updated according to the comment. There is no REG_EQUAL. So I
> add it when replace_oldest_value_reg.
>
> ChangeLog:
> 2014-05-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>          Part of PR rtl-optimization/61225
>          * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New proto.
>          * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function.
>          * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when
>          propagating to SET.
I can't help but wonder why the new 4 insn combination code isn't 
presenting this as a nice big fat insn to the x86 backend which would 
eliminate the need for the peep2.

But, assuming there's a fundamental reason why that's not kicking in...

In replace_oldest_value_reg, why not use reg_overlap_mentioned_p to 
determine if the REGNO of NEW_RTX is modified by INSN?  I'd look to 
avoid some of those calls to single_set (insn).  Just call it once and 
reuse the value.

Shouldn't you be ensuring the REG_EQUAL note is unique?  I think we have 
a routine to avoid creating a note that already exists.

Don't you have to ensure that the value in the REG_EQUAL note has not 
changed?  A REG_EQUAL note denotes an equivalence that holds at the 
single insn where it appears.  If you want to use the value elsewhere 
you'd have to ensure the value hasn't been changed.  If RTX referred to 
by the REG_EQUAL note is a MEM, this can be relatively difficult due to 
aliasing issues.

Jeff




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-07-18  5:05     ` Jeff Law
@ 2014-08-04  8:24       ` Zhenqiang Chen
  2014-12-01 22:10         ` Jeff Law
  0 siblings, 1 reply; 27+ messages in thread
From: Zhenqiang Chen @ 2014-08-04  8:24 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

On 17 July 2014 11:10, Jeff Law <law@redhat.com> wrote:
> On 05/22/14 03:52, Zhenqiang Chen wrote:
>>
>> On 21 May 2014 20:43, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>>
>>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
>>>>
>>>> Hi,
>>>>
>>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
>>>> test case tends to check a peephole2 optimization, which optimizes the
>>>> following sequence
>>>>
>>>>      2: bx:SI=ax:SI
>>>>      25: ax:SI=[bx:SI]
>>>>      7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>>>>      8: [bx:SI]=ax:SI
>>>>      9: flags:CCZ=cmp(ax:SI,0)
>>>> to
>>>>     2: bx:SI=ax:SI
>>>>     41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>>>>
>>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
>>>> changes the INSN 25 to
>>>>
>>>>      25: ax:SI=[ax:SI]
>>>>
>>>> Then peephole2 can not optimize it since two memory_operands look like
>>>> different.
>>>>
>>>> To fix it, the patch adds another peephole2 rule to read one more
>>>> insn. From the register copy, it knows the address is the same.
>>>
>>>
>>> That is one complex peephole2 to deal with a transformation like this.
>>> It seems to be like it's a too specific solution for a bigger problem.
>>>
>>> Could you please try one of the following solutions instead:
>>>
>>> 1. Track register values for peephole2 and try different alternatives
>>> based on known register equivalences? E.g. in your example, perhaps
>>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
>>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
>>> [bx:SI] at that point (or if that information is not available, it is
>>> not very difficult to make it available). Then you could try applying
>>> peephole2 on the original pattern but also on patterns modified with
>>> the known equivalences (i.e. try peephole2 on multiple equivalent
>>> patterns for the same insn). This may expose other peephole2
>>> opportunities, not just the specific one your patch addresses.
>>
>>
>> Patch is updated according to the comment. There is no REG_EQUAL. So I
>> add it when replace_oldest_value_reg.
>>
>> ChangeLog:
>> 2014-05-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>
>>          Part of PR rtl-optimization/61225
>>          * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New
>> proto.
>>          * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function.
>>          * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when
>>          propagating to SET.
>
> I can't help but wonder why the new 4 insn combination code isn't presenting
> this as a nice big fat insn to the x86 backend which would eliminate the
> need for the peep2.
>
> But, assuming there's a fundamental reason why that's not kicking in...

Current combine pass can only handle

I0 -> I1 -> I2 -> I3.
I0, I1 -> I2, I2 -> I3.
I0 -> I2; I1, I2 -> I3.
I0 -> I1; I1, I2 -> I3.

For the case, it is
I1 -> I2 -> I3; I2 -> INSN

I3 and INSN looks like not related. But INSN is a COMPARE to set CC
and I3 can also set CC. I3 and INSN can be combined together as one
instruction to set CC.

The following patch enhances combine pass to handle the case.

A new parameter is added for try_combine to accept INSN as
TO_COMBINED_INSN. It reuses the 3-insn combine method to combine I1 ->
I2 -> I3. If there is TO_COMBINED_INSN, combine I2 ->
TO_COMBINED_INSN. Then create an new insn "parallel (TO_COMBINED_INSN,
I3)". refer_same_reg_p is some check to make sure the change is safe.

Bootstrap and no make check regression on X86-64 and i686.

X86-64 bootstrap logs show 358 cases were combined by the patch.

Ok for trunk?

Thanks!
-Zhenqiang

ChangeLog
2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        Part of PR rtl-optimization/61225
        * combine.c (refer_same_reg_p): New function.
        (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
        (try_combine): Add one more parameter TO_COMBINED_INSN, which is
        used to create a new insn parallel (TO_COMBINED_INSN, I3).

testsuite/ChangeLog:
2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * gcc.target/i386/pr61225.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index 53ac1d6..42098ab 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
 static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
 static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *);
 static int contains_muldiv (rtx);
-static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
+static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
 static void undo_all (void);
 static void undo_commit (void);
 static rtx *find_split_point (rtx *, rtx, bool);
@@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b)
 #endif
   return false;
 }
+
+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+   It returns TRUE, if reg1 == reg2, and no other refer of reg1
+   except A and B.  */
+
+static bool
+refer_same_reg_p (rtx a, rtx b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
+     || !seta || !setb)
+    return false;
+
+  if (GET_CODE (SET_SRC (seta)) != COMPARE
+      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+      || !REG_P (XEXP (SET_SRC (seta), 0))
+      || !const0_rtx
+      || !REG_P (SET_SRC (setb))
+      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+    return false;
+
+  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
+    {
+      df_ref use;
+      rtx insn;
+      unsigned int i = REGNO (SET_SRC (setb));
+
+      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use))
+        {
+         insn = DF_REF_INSN (use);
+         if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn)))
+           return false;
+       }
+    }
+
+  return true;
+}
+
 ^L
 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.
@@ -1108,10 +1148,7 @@ insn_a_feeds_b (rtx a, rtx b)
 static int
 combine_instructions (rtx f, unsigned int nregs)
 {
-  rtx insn, next;
-#ifdef HAVE_cc0
-  rtx prev;
-#endif
+  rtx insn, next, prev;
   struct insn_link *links, *nextlinks;
   rtx first;
   basic_block last_bb;
@@ -1258,7 +1295,7 @@ combine_instructions (rtx f, unsigned int nregs)
          FOR_EACH_LOG_LINK (links, insn)
            if ((next = try_combine (insn, links->insn, NULL_RTX,
                                     NULL_RTX, &new_direct_jump_p,
-                                    last_combined_insn)) != 0)
+                                    last_combined_insn, NULL_RTX)) != 0)
              {
                statistics_counter_event (cfun, "two-insn combine", 1);
                goto retry;
@@ -1279,7 +1316,7 @@ combine_instructions (rtx f, unsigned int nregs)
                FOR_EACH_LOG_LINK (nextlinks, link)
                  if ((next = try_combine (insn, link, nextlinks->insn,
                                           NULL_RTX, &new_direct_jump_p,
-                                          last_combined_insn)) != 0)
+                                          last_combined_insn, NULL_RTX)) != 0)
                    {
                      statistics_counter_event (cfun, "three-insn combine", 1);
                      goto retry;
@@ -1301,13 +1338,13 @@ combine_instructions (rtx f, unsigned int nregs)
            {
              if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX,
                                       &new_direct_jump_p,
-                                      last_combined_insn)) != 0)
+                                      last_combined_insn, NULL_RTX)) != 0)
                goto retry;

              FOR_EACH_LOG_LINK (nextlinks, prev)
                  if ((next = try_combine (insn, prev, nextlinks->insn,
                                           NULL_RTX, &new_direct_jump_p,
-                                          last_combined_insn)) != 0)
+                                          last_combined_insn, NULL_RTX)) != 0)
                    goto retry;
            }

@@ -1321,13 +1358,13 @@ combine_instructions (rtx f, unsigned int nregs)
            {
              if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX,
                                       &new_direct_jump_p,
-                                      last_combined_insn)) != 0)
+                                      last_combined_insn, NULL_RTX)) != 0)
                goto retry;

              FOR_EACH_LOG_LINK (nextlinks, prev)
                  if ((next = try_combine (insn, prev, nextlinks->insn,
                                           NULL_RTX, &new_direct_jump_p,
-                                          last_combined_insn)) != 0)
+                                          last_combined_insn, NULL_RTX)) != 0)
                    goto retry;
            }

@@ -1343,7 +1380,7 @@ combine_instructions (rtx f, unsigned int nregs)
                  && sets_cc0_p (PATTERN (prev))
                  && (next = try_combine (insn, links->insn,
                                          prev, NULL_RTX, &new_direct_jump_p,
-                                         last_combined_insn)) != 0)
+                                         last_combined_insn, NULL_RTX)) != 0)
                goto retry;
 #endif

@@ -1356,7 +1393,7 @@ combine_instructions (rtx f, unsigned int nregs)
                if ((next = try_combine (insn, links->insn,
                                         nextlinks->insn, NULL_RTX,
                                         &new_direct_jump_p,
-                                        last_combined_insn)) != 0)
+                                        last_combined_insn, NULL_RTX)) != 0)

                  {
                    statistics_counter_event (cfun, "three-insn combine", 1);
@@ -1385,7 +1422,7 @@ combine_instructions (rtx f, unsigned int nregs)
                      if ((next = try_combine (insn, link, link1,
                                               nextlinks->insn,
                                               &new_direct_jump_p,
-                                              last_combined_insn)) != 0)
+                                              last_combined_insn,
NULL_RTX)) != 0)
                        {
                          statistics_counter_event (cfun, "four-insn
combine", 1);
                          goto retry;
@@ -1396,7 +1433,7 @@ combine_instructions (rtx f, unsigned int nregs)
                      if ((next = try_combine (insn, link, link1,
                                               nextlinks->insn,
                                               &new_direct_jump_p,
-                                              last_combined_insn)) != 0)
+                                              last_combined_insn,
NULL_RTX)) != 0)
                        {
                          statistics_counter_event (cfun, "four-insn
combine", 1);
                          goto retry;
@@ -1413,7 +1450,7 @@ combine_instructions (rtx f, unsigned int nregs)
                      if ((next = try_combine (insn, link, link1,
                                               nextlinks->insn,
                                               &new_direct_jump_p,
-                                              last_combined_insn)) != 0)
+                                              last_combined_insn,
NULL_RTX)) != 0)
                        {
                          statistics_counter_event (cfun, "four-insn
combine", 1);
                          goto retry;
@@ -1423,7 +1460,7 @@ combine_instructions (rtx f, unsigned int nregs)
                      if ((next = try_combine (insn, link, link1,
                                               nextlinks->insn,
                                               &new_direct_jump_p,
-                                              last_combined_insn)) != 0)
+                                              last_combined_insn,
NULL_RTX)) != 0)
                        {
                          statistics_counter_event (cfun, "four-insn
combine", 1);
                          goto retry;
@@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
                  }
              }

+         /* Try to combine a compare insn that sets CC
+            with a preceding insn that can set CC, and maybe with its
+            logical predecessor as well.
+            We need this special code because data flow connections
+            do not get entered in LOG_LINKS.  */
+         if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
+             && refer_same_reg_p (insn, prev)
+             && max_combine >= 4)
+           {
+               struct insn_link *next1;
+               FOR_EACH_LOG_LINK (next1, prev)
+                 {
+                   rtx link1 = next1->insn;
+                   if (NOTE_P (link1))
+                     continue;
+                   /* I1 -> I2 -> I3; I2 -> insn;
+                      output parallel (insn, I3).  */
+                   FOR_EACH_LOG_LINK (nextlinks, link1)
+                     if ((next = try_combine (prev, link1,
+                                              nextlinks->insn, NULL_RTX,
+                                              &new_direct_jump_p,
+                                              last_combined_insn, insn)) != 0)
+
+                       {
+                         delete_insn (insn);
+                         insn = next;
+                         statistics_counter_event (cfun, "four-insn
combine", 1);
+                         goto retry;
+                       }
+                   /* I2 -> I3; I2 -> insn
+                      output next = parallel (insn, I3).  */
+                   if ((next = try_combine (prev, link1,
+                                            NULL_RTX, NULL_RTX,
+                                            &new_direct_jump_p,
+                                            last_combined_insn, insn)) != 0)
+
+                     {
+                       delete_insn (insn);
+                       insn = next;
+                       statistics_counter_event (cfun, "three-insn
combine", 1);
+                       goto retry;
+                     }
+                 }
+           }
          /* Try this insn with each REG_EQUAL note it links back to.  */
          FOR_EACH_LOG_LINK (links, insn)
            {
@@ -1456,7 +1537,7 @@ combine_instructions (rtx f, unsigned int nregs)
                  i2mod_new_rhs = copy_rtx (note);
                  next = try_combine (insn, i2mod, NULL_RTX, NULL_RTX,
                                      &new_direct_jump_p,
-                                     last_combined_insn);
+                                     last_combined_insn, NULL_RTX);
                  i2mod = NULL_RTX;
                  if (next)
                    {
@@ -2450,11 +2531,14 @@ update_cfg_for_uncondjump (rtx insn)

    LAST_COMBINED_INSN is either I3, or some insn after I3 that has
    been I3 passed to an earlier try_combine within the same basic
-   block.  */
+   block.
+
+   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
+   combine with I3 to make a new insn.  */

 static rtx
 try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p,
-            rtx last_combined_insn)
+            rtx last_combined_insn, rtx to_combined_insn)
 {
   /* New patterns for I3 and I2, respectively.  */
   rtx newpat, newi2pat = 0;
@@ -2543,6 +2627,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int
*new_direct_jump_p,
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
       || (i0 && cant_combine_insn_p (i0))
+      || (to_combined_insn && cant_combine_insn_p (to_combined_insn))
       || likely_spilled_retval_p (i3))
     return 0;

@@ -3216,7 +3301,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0,
int *new_direct_jump_p,
          rtx old = newpat;
          total_sets = 1 + extra_sets;
          newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
-         XVECEXP (newpat, 0, 0) = old;
+
+         if (to_combined_insn)
+           XVECEXP (newpat, 0, --total_sets) = old;
+         else
+           XVECEXP (newpat, 0, 0) = old;
        }

       if (added_sets_0)
@@ -3239,6 +3328,21 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0,
int *new_direct_jump_p,
          if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)
            t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0);

+         if (to_combined_insn)
+           {
+             rtx todest = NULL_RTX, tosrc = NULL_RTX;
+             if (can_combine_p (i2, to_combined_insn, NULL_RTX, NULL_RTX,
+                                i3, NULL_RTX, &todest, &tosrc))
+               {
+                 rtx pat = copy_rtx (PATTERN (to_combined_insn));
+                 t = subst (pat, todest, tosrc, 0, 0, 0);
+               }
+             else
+               {
+                 undo_all ();
+                 return 0;
+               }
+           }
          XVECEXP (newpat, 0, --total_sets) = t;
        }
     }
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..9c08102
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/61225 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-rtl-combine-details" } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*x)
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-rtl-dump "Successfully matched this instruction"
"combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */



> In replace_oldest_value_reg, why not use reg_overlap_mentioned_p to
> determine if the REGNO of NEW_RTX is modified by INSN?  I'd look to avoid
> some of those calls to single_set (insn).  Just call it once and reuse the
> value.
>
> Shouldn't you be ensuring the REG_EQUAL note is unique?  I think we have a
> routine to avoid creating a note that already exists.
>
> Don't you have to ensure that the value in the REG_EQUAL note has not
> changed?  A REG_EQUAL note denotes an equivalence that holds at the single
> insn where it appears.  If you want to use the value elsewhere you'd have to
> ensure the value hasn't been changed.  If RTX referred to by the REG_EQUAL
> note is a MEM, this can be relatively difficult due to aliasing issues.
>
> Jeff
>
>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-08-04  8:24       ` Zhenqiang Chen
@ 2014-12-01 22:10         ` Jeff Law
  2014-12-01 23:30           ` Eric Botcazou
  2014-12-04  8:43           ` Zhenqiang Chen
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff Law @ 2014-12-01 22:10 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

On 08/04/14 02:24, Zhenqiang Chen wrote:

>>>
>>> ChangeLog:
>>> 2014-05-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>>
>>>           Part of PR rtl-optimization/61225
>>>           * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New
>>> proto.
>>>           * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function.
>>>           * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when
>>>           propagating to SET.
>>
>> I can't help but wonder why the new 4 insn combination code isn't presenting
>> this as a nice big fat insn to the x86 backend which would eliminate the
>> need for the peep2.
>>
>> But, assuming there's a fundamental reason why that's not kicking in...
>
> Current combine pass can only handle
>
> I0 -> I1 -> I2 -> I3.
> I0, I1 -> I2, I2 -> I3.
> I0 -> I2; I1, I2 -> I3.
> I0 -> I1; I1, I2 -> I3.
>
> For the case, it is
> I1 -> I2 -> I3; I2 -> INSN
>
> I3 and INSN looks like not related. But INSN is a COMPARE to set CC
> and I3 can also set CC. I3 and INSN can be combined together as one
> instruction to set CC.
Presumably there's no dataflow between I3 and INSN because they both set 
CC (doesn't that make them anti-dependent?

Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply 
find it easier to look at RTL rather than guess why we don't have the 
appropriate linkage and thus not attempting the combinations we want.

Pseudo code for the resulting I3 and INSN would help -- as I work 
through this there's some inconsistencies in how I'm interpreting a few 
things and RTL and pseudo-rtl for the desired output RTL would help a lot.


I
>
> ChangeLog
> 2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>          Part of PR rtl-optimization/61225
>          * combine.c (refer_same_reg_p): New function.
>          (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
>          (try_combine): Add one more parameter TO_COMBINED_INSN, which is
>          used to create a new insn parallel (TO_COMBINED_INSN, I3).
>
> testsuite/ChangeLog:
> 2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>          * gcc.target/i386/pr61225.c: New test.
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 53ac1d6..42098ab 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
>   static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
>   static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *);
>   static int contains_muldiv (rtx);
> -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
> +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
>   static void undo_all (void);
>   static void undo_commit (void);
>   static rtx *find_split_point (rtx *, rtx, bool);
> @@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b)
>   #endif
>     return false;
>   }
> +
> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> +   except A and B.  */
> +
> +static bool
> +refer_same_reg_p (rtx a, rtx b)
> +{
> +  rtx seta = single_set (a);
> +  rtx setb = single_set (b);
> +
> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> +     || !seta || !setb)
> +    return false;
Go ahead and use
         || !setb
         || !setb

It's a bit more vertical space, but I believe closer in line with our 
coding standards.


> +
> +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> +      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> +      || !REG_P (XEXP (SET_SRC (seta), 0))
> +      || !const0_rtx
> +      || !REG_P (SET_SRC (setb))
> +      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> +    return false;
What's the !const0_rtx test here?  Don't you want to test some object 
from SETA against const0_rtx?  Also note that you may need to test 
against CONST0_RTX (mode)

> @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
>                    }
>                }
>
> +         /* Try to combine a compare insn that sets CC
> +            with a preceding insn that can set CC, and maybe with its
> +            logical predecessor as well.
> +            We need this special code because data flow connections
> +            do not get entered in LOG_LINKS.  */
So you'd want to be more specific about what dataflow connections are 
not in the LOG_LINKS that we want.

It feels to me like we're missing the anti-dependence links on CC and 
that there's a general aspect to combine missing here.  But I want to 
hold off on final judgement until I know more.

I also wonder if compare-elim ought to be helping here.  Isn't that the 
point here, to eliminate the comparison and instead get it for free as 
part of the arithmetic?  If so, is it the fact that we have memory 
references that prevents compare-elim from kicking in?

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-01 22:10         ` Jeff Law
@ 2014-12-01 23:30           ` Eric Botcazou
  2014-12-04  8:43           ` Zhenqiang Chen
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Botcazou @ 2014-12-01 23:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Zhenqiang Chen, Steven Bosscher, Jakub Jelinek

> I also wonder if compare-elim ought to be helping here.  Isn't that the
> point here, to eliminate the comparison and instead get it for free as
> part of the arithmetic?  If so, is it the fact that we have memory
> references that prevents compare-elim from kicking in?

Yes, compare-elim doesn't work with memory references but, more radically, it 
is not enabled for x86 (it is only enabled for aarch64, mn10300 and rx).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] Fix PR 61225
  2014-12-01 22:10         ` Jeff Law
  2014-12-01 23:30           ` Eric Botcazou
@ 2014-12-04  8:43           ` Zhenqiang Chen
  2014-12-04 20:50             ` Segher Boessenkool
  2014-12-08 21:29             ` Jeff Law
  1 sibling, 2 replies; 27+ messages in thread
From: Zhenqiang Chen @ 2014-12-04  8:43 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 6003 bytes --]


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Jeff Law
> Sent: Tuesday, December 02, 2014 6:11 AM
> To: Zhenqiang Chen
> Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 08/04/14 02:24, Zhenqiang Chen wrote:
> 
> >>>
> >>> ChangeLog:
> >>> 2014-05-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
> >>>
> >>>           Part of PR rtl-optimization/61225
> >>>           * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p):
> >>> New proto.
> >>>           * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New
function.
> >>>           * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note
> when
> >>>           propagating to SET.
> >>
> >> I can't help but wonder why the new 4 insn combination code isn't
> >> presenting this as a nice big fat insn to the x86 backend which would
> >> eliminate the need for the peep2.
> >>
> >> But, assuming there's a fundamental reason why that's not kicking in...
> >
> > Current combine pass can only handle
> >
> > I0 -> I1 -> I2 -> I3.
> > I0, I1 -> I2, I2 -> I3.
> > I0 -> I2; I1, I2 -> I3.
> > I0 -> I1; I1, I2 -> I3.
> >
> > For the case, it is
> > I1 -> I2 -> I3; I2 -> INSN
> >
> > I3 and INSN looks like not related. But INSN is a COMPARE to set CC
> > and I3 can also set CC. I3 and INSN can be combined together as one
> > instruction to set CC.
> Presumably there's no dataflow between I3 and INSN because they both set
> CC (doesn't that make them anti-dependent?
> 
> Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply
find it
> easier to look at RTL rather than guess why we don't have the appropriate
> linkage and thus not attempting the combinations we want.
> 
> Pseudo code for the resulting I3 and INSN would help -- as I work through
> this there's some inconsistencies in how I'm interpreting a few things and
RTL
> and pseudo-rtl for the desired output RTL would help a lot.

C code:

    if (!--*p)

rtl code:

    6: r91:SI=[r90:SI]
    7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
    8: [r90:SI]=r88:SI
    9: flags:CCZ=cmp(r88:SI,0)

expected output:

    8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}

in assemble, it is

  decl (%eax)

> >
> > ChangeLog
> > 2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
> >
> >          Part of PR rtl-optimization/61225
> >          * combine.c (refer_same_reg_p): New function.
> >          (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
> >          (try_combine): Add one more parameter TO_COMBINED_INSN, which
> is
> >          used to create a new insn parallel (TO_COMBINED_INSN, I3).
> >
> > testsuite/ChangeLog:
> > 2014-08-04  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
> >
> >          * gcc.target/i386/pr61225.c: New test.
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab
> > 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
> >   static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
> >   static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx
*);
> >   static int contains_muldiv (rtx);
> > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
> > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
> >   static void undo_all (void);
> >   static void undo_commit (void);
> >   static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46
> > @@ insn_a_feeds_b (rtx a, rtx b)
> >   #endif
> >     return false;
> >   }
> > +
> > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> > +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> > +   except A and B.  */
> > +
> > +static bool
> > +refer_same_reg_p (rtx a, rtx b)
> > +{
> > +  rtx seta = single_set (a);
> > +  rtx setb = single_set (b);
> > +
> > +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> > +     || !seta || !setb)
> > +    return false;
> Go ahead and use
>          || !setb
>          || !setb
> 
> It's a bit more vertical space, but I believe closer in line with our
coding
> standards.

Updated.
 
> > +
> > +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> > +      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> > +      || !REG_P (XEXP (SET_SRC (seta), 0))
> > +      || !const0_rtx
> > +      || !REG_P (SET_SRC (setb))
> > +      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> > +    return false;
> What's the !const0_rtx test here?  Don't you want to test some object
> from SETA against const0_rtx?  Also note that you may need to test
> against CONST0_RTX (mode)

It's my fault. It should be

XEXP (SET_SRC (seta), 1) != const0_rtx

The updated patch is attached.

Thanks!
-Zhenqiang

 > > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
> >                    }
> >                }
> >
> > +         /* Try to combine a compare insn that sets CC
> > +            with a preceding insn that can set CC, and maybe with its
> > +            logical predecessor as well.
> > +            We need this special code because data flow connections
> > +            do not get entered in LOG_LINKS.  */
> So you'd want to be more specific about what dataflow connections are
> not in the LOG_LINKS that we want.
> 
> It feels to me like we're missing the anti-dependence links on CC and
> that there's a general aspect to combine missing here.  But I want to
> hold off on final judgement until I know more.
> 
> I also wonder if compare-elim ought to be helping here.  Isn't that the
> point here, to eliminate the comparison and instead get it for free as
> part of the arithmetic?  If so, is it the fact that we have memory
> references that prevents compare-elim from kicking in?
> 
> jeff
> 

[-- Attachment #2: pr61225.patch --]
[-- Type: application/octet-stream, Size: 10085 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index e6deb41..4ecfe9c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -431,7 +431,7 @@ static int can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *,
 static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int, rtx *);
 static int contains_muldiv (rtx);
 static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *,
-			      int *, rtx_insn *);
+			      int *, rtx_insn *, rtx_insn *);
 static void undo_all (void);
 static void undo_commit (void);
 static rtx *find_split_point (rtx *, rtx_insn *, bool);
@@ -1120,6 +1120,47 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 #endif
   return false;
 }
+
+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+   It returns TRUE, if reg1 == reg2, and no other refer of reg1
+   except A and B.  */
+
+static bool
+refer_same_reg_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
+      || !seta
+      || !setb)
+    return false;
+
+  if (GET_CODE (SET_SRC (seta)) != COMPARE
+      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+      || !REG_P (XEXP (SET_SRC (seta), 0))
+      || XEXP (SET_SRC (seta), 1) != const0_rtx
+      || !REG_P (SET_SRC (setb))
+      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+    return false;
+
+  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
+    {
+      df_ref use;
+      rtx insn;
+      unsigned int i = REGNO (SET_SRC (setb));
+
+      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use))
+        {
+	  insn = DF_REF_INSN (use);
+	  if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn)))
+	    return false;
+	}
+    }
+
+  return true;
+}
+
 \f
 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.
@@ -1129,10 +1170,7 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 static int
 combine_instructions (rtx_insn *f, unsigned int nregs)
 {
-  rtx_insn *insn, *next;
-#ifdef HAVE_cc0
-  rtx_insn *prev;
-#endif
+  rtx_insn *insn, *next, *prev;
   struct insn_link *links, *nextlinks;
   rtx_insn *first;
   basic_block last_bb;
@@ -1279,7 +1317,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 	  FOR_EACH_LOG_LINK (links, insn)
 	    if ((next = try_combine (insn, links->insn, NULL,
 				     NULL, &new_direct_jump_p,
-				     last_combined_insn)) != 0)
+				     last_combined_insn, NULL)) != 0)
 	      {
 		statistics_counter_event (cfun, "two-insn combine", 1);
 		goto retry;
@@ -1300,7 +1338,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		FOR_EACH_LOG_LINK (nextlinks, link)
 		  if ((next = try_combine (insn, link, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    {
 		      statistics_counter_event (cfun, "three-insn combine", 1);
 		      goto retry;
@@ -1322,13 +1360,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1342,13 +1380,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1364,7 +1402,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  && sets_cc0_p (PATTERN (prev))
 		  && (next = try_combine (insn, links->insn,
 					  prev, NULL, &new_direct_jump_p,
-					  last_combined_insn)) != 0)
+					  last_combined_insn, NULL)) != 0)
 		goto retry;
 #endif
 
@@ -1377,7 +1415,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		if ((next = try_combine (insn, links->insn,
 					 nextlinks->insn, NULL,
 					 &new_direct_jump_p,
-					 last_combined_insn)) != 0)
+					 last_combined_insn, NULL)) != 0)
 
 		  {
 		    statistics_counter_event (cfun, "three-insn combine", 1);
@@ -1406,7 +1444,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) != 0)
 			{
 			  statistics_counter_event (cfun, "four-insn combine", 1);
 			  goto retry;
@@ -1417,7 +1455,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) != 0)
 			{
 			  statistics_counter_event (cfun, "four-insn combine", 1);
 			  goto retry;
@@ -1434,7 +1472,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) != 0)
 			{
 			  statistics_counter_event (cfun, "four-insn combine", 1);
 			  goto retry;
@@ -1444,7 +1482,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) != 0)
 			{
 			  statistics_counter_event (cfun, "four-insn combine", 1);
 			  goto retry;
@@ -1452,6 +1490,50 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  }
 	      }
 
+	  /* Try to combine a compare insn that sets CC
+	     with a preceding insn that can set CC, and maybe with its
+	     logical predecessor as well.
+	     We need this special code because data flow connections
+	     do not get entered in LOG_LINKS.  */
+	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
+	      && refer_same_reg_p (insn, prev)
+	      && max_combine >= 4)
+	    {
+		struct insn_link *next1;
+		FOR_EACH_LOG_LINK (next1, prev)
+		  {
+		    rtx_insn *link1 = next1->insn;
+		    if (NOTE_P (link1))
+		      continue;
+		    /* I1 -> I2 -> I3; I2 -> insn;
+		       output parallel (insn, I3).  */
+		    FOR_EACH_LOG_LINK (nextlinks, link1)
+		      if ((next = try_combine (prev, link1,
+					       nextlinks->insn, NULL,
+					       &new_direct_jump_p,
+					       last_combined_insn, insn)) != 0)
+
+			{
+			  delete_insn (insn);
+			  insn = next;
+			  statistics_counter_event (cfun, "four-insn combine", 1);
+			  goto retry;
+			}
+		    /* I2 -> I3; I2 -> insn
+		       output next = parallel (insn, I3).  */
+		    if ((next = try_combine (prev, link1,
+					     NULL, NULL,
+					     &new_direct_jump_p,
+					     last_combined_insn, insn)) != 0)
+
+		      {
+			delete_insn (insn);
+			insn = next;
+			statistics_counter_event (cfun, "three-insn combine", 1);
+			goto retry;
+		      }
+		  }
+	    }
 	  /* Try this insn with each REG_EQUAL note it links back to.  */
 	  FOR_EACH_LOG_LINK (links, insn)
 	    {
@@ -1477,7 +1559,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  i2mod_new_rhs = copy_rtx (note);
 		  next = try_combine (insn, i2mod, NULL, NULL,
 				      &new_direct_jump_p,
-				      last_combined_insn);
+				      last_combined_insn, NULL);
 		  i2mod = NULL;
 		  if (next)
 		    {
@@ -2533,11 +2615,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
 
    LAST_COMBINED_INSN is either I3, or some insn after I3 that has
    been I3 passed to an earlier try_combine within the same basic
-   block.  */
+   block.
+
+   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
+   combine with I3 to make a new insn.  */
 
 static rtx_insn *
 try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
-	     int *new_direct_jump_p, rtx_insn *last_combined_insn)
+	     int *new_direct_jump_p, rtx_insn *last_combined_insn,
+	     rtx_insn *to_combined_insn)
 {
   /* New patterns for I3 and I2, respectively.  */
   rtx newpat, newi2pat = 0;
@@ -2630,6 +2716,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
       || (i0 && cant_combine_insn_p (i0))
+      || (to_combined_insn && cant_combine_insn_p (to_combined_insn))
       || likely_spilled_retval_p (i3))
     return 0;
 
@@ -3323,7 +3410,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
 	  rtx old = newpat;
 	  total_sets = 1 + extra_sets;
 	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
-	  XVECEXP (newpat, 0, 0) = old;
+
+	  if (to_combined_insn)
+	    XVECEXP (newpat, 0, --total_sets) = old;
+	  else
+	    XVECEXP (newpat, 0, 0) = old;
 	}
 
       if (added_sets_0)
@@ -3346,6 +3437,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
 	  if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)
 	    t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0);
 
+	  if (to_combined_insn)
+	    {
+	      rtx todest = NULL_RTX, tosrc = NULL_RTX;
+	      if (can_combine_p (i2, to_combined_insn, NULL, NULL,
+				 i3, NULL, &todest, &tosrc))
+		{
+		  rtx pat = copy_rtx (PATTERN (to_combined_insn));
+		  t = subst (pat, todest, tosrc, 0, 0, 0);
+		}
+	      else
+		{
+		  undo_all ();
+		  return 0;
+		}
+	    }
 	  XVECEXP (newpat, 0, --total_sets) = t;
 	}
     }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04  8:43           ` Zhenqiang Chen
@ 2014-12-04 20:50             ` Segher Boessenkool
  2014-12-04 20:58               ` Segher Boessenkool
  2014-12-05 22:32               ` Jeff Law
  2014-12-08 21:29             ` Jeff Law
  1 sibling, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-04 20:50 UTC (permalink / raw)
  To: Zhenqiang Chen
  Cc: 'Jeff Law', Steven Bosscher, gcc-patches, Jakub Jelinek

On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote:
> C code:
> 
>     if (!--*p)
> 
> rtl code:
> 
>     6: r91:SI=[r90:SI]
>     7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
>     8: [r90:SI]=r88:SI
>     9: flags:CCZ=cmp(r88:SI,0)
> 
> expected output:
> 
>     8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}
> 
> in assemble, it is
> 
>   decl (%eax)

Combine does not consider combining 9 into 7 because there is no LOG_LINK
between them (the link for r88 is between 8 and 7 already).


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04 20:50             ` Segher Boessenkool
@ 2014-12-04 20:58               ` Segher Boessenkool
  2014-12-05 22:36                 ` Jeff Law
  2014-12-06  0:22                 ` Segher Boessenkool
  2014-12-05 22:32               ` Jeff Law
  1 sibling, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-04 20:58 UTC (permalink / raw)
  To: Zhenqiang Chen
  Cc: 'Jeff Law', Steven Bosscher, gcc-patches, Jakub Jelinek

On Thu, Dec 04, 2014 at 02:49:56PM -0600, Segher Boessenkool wrote:
> On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote:
> > C code:
> > 
> >     if (!--*p)
> > 
> > rtl code:
> > 
> >     6: r91:SI=[r90:SI]
> >     7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
> >     8: [r90:SI]=r88:SI
> >     9: flags:CCZ=cmp(r88:SI,0)
> > 
> > expected output:
> > 
> >     8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}
> > 
> > in assemble, it is
> > 
> >   decl (%eax)
> 
> Combine does not consider combining 9 into 7 because there is no LOG_LINK
> between them (the link for r88 is between 8 and 7 already).

So combine tries to combine 6+7+8; the RTL it comes up with is a parallel
of the memory decrement (without cc clobber, but that is fine), and setting
r88 to the mem minus one.  There is no such pattern in the target, and
combine cannot break the parallel into two sets (because the first modifies
the mem used by the second), so 6+7+8 doesn't combine.

Adding a bridge pattern in the target would work; or you can enhance combine
so it can break up this parallel correctly.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04 20:50             ` Segher Boessenkool
  2014-12-04 20:58               ` Segher Boessenkool
@ 2014-12-05 22:32               ` Jeff Law
  2014-12-06  0:16                 ` Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2014-12-05 22:32 UTC (permalink / raw)
  To: Segher Boessenkool, Zhenqiang Chen
  Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

On 12/04/14 13:49, Segher Boessenkool wrote:
> On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote:
>> C code:
>>
>>      if (!--*p)
>>
>> rtl code:
>>
>>      6: r91:SI=[r90:SI]
>>      7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
>>      8: [r90:SI]=r88:SI
>>      9: flags:CCZ=cmp(r88:SI,0)
>>
>> expected output:
>>
>>      8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}
>>
>> in assemble, it is
>>
>>    decl (%eax)
>
> Combine does not consider combining 9 into 7 because there is no LOG_LINK
> between them (the link for r88 is between 8 and 7 already).
OK, yea, that's a long standing design decision.  We don't feed a single 
def into multiple use sites.

jeff


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04 20:58               ` Segher Boessenkool
@ 2014-12-05 22:36                 ` Jeff Law
  2014-12-06  0:09                   ` Segher Boessenkool
  2014-12-06  0:22                 ` Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2014-12-05 22:36 UTC (permalink / raw)
  To: Segher Boessenkool, Zhenqiang Chen
  Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

On 12/04/14 13:57, Segher Boessenkool wrote:
>
> So combine tries to combine 6+7+8; the RTL it comes up with is a parallel
> of the memory decrement (without cc clobber, but that is fine), and setting
> r88 to the mem minus one.  There is no such pattern in the target, and
> combine cannot break the parallel into two sets (because the first modifies
> the mem used by the second), so 6+7+8 doesn't combine.
>
> Adding a bridge pattern in the target would work; or you can enhance combine
> so it can break up this parallel correctly.
I think myself or someone suggested a bridge pattern in the past, but I 
can't find it, perhaps it was one of the other threads WRT limitations 
of the combiner.

Zhenqiang, can you look at what happens if you provide a pattern for 
6+7+8 (probably via a define_and_split)?

Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-05 22:36                 ` Jeff Law
@ 2014-12-06  0:09                   ` Segher Boessenkool
  2014-12-07  5:03                     ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-06  0:09 UTC (permalink / raw)
  To: Jeff Law; +Cc: Zhenqiang Chen, Steven Bosscher, gcc-patches, Jakub Jelinek

On Fri, Dec 05, 2014 at 03:36:01PM -0700, Jeff Law wrote:
> >So combine tries to combine 6+7+8; the RTL it comes up with is a parallel
> >of the memory decrement (without cc clobber, but that is fine), and setting
> >r88 to the mem minus one.  There is no such pattern in the target, and
> >combine cannot break the parallel into two sets (because the first modifies
> >the mem used by the second), so 6+7+8 doesn't combine.
> >
> >Adding a bridge pattern in the target would work; or you can enhance 
> >combine
> >so it can break up this parallel correctly.
> I think myself or someone suggested a bridge pattern in the past, but I 
> can't find it, perhaps it was one of the other threads WRT limitations 
> of the combiner.
> 
> Zhenqiang, can you look at what happens if you provide a pattern for 
> 6+7+8 (probably via a define_and_split)?

I tried this out yesterday.  There are a few options (a bridge pattern
for 6+7+8, or one for 7+8).  I went with 6+7+8.

So the code combine is asked to optimise is

6  A = M
7  T = A + B
8  M = T
9  C = cmp T, 0

and the bridge pattern I added is

M = M + B  ::  T = M + B

(I made it to split to  M = M + B ; T = M  which is probably not optimal,
but irrelevant for the rest here).

So combine happily combines 6+7+8 to the bridge pattern.  But then it
forgets to make a link from 9.  I suppose it just doesn't know how to
make a link to a parallel (it wouldn't ever be useful before my recent
patches).

Investigating...


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-05 22:32               ` Jeff Law
@ 2014-12-06  0:16                 ` Segher Boessenkool
  2014-12-08 19:07                   ` Jeff Law
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-06  0:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: Zhenqiang Chen, Steven Bosscher, gcc-patches, Jakub Jelinek

On Fri, Dec 05, 2014 at 03:31:54PM -0700, Jeff Law wrote:
> >Combine does not consider combining 9 into 7 because there is no LOG_LINK
> >between them (the link for r88 is between 8 and 7 already).
> OK, yea, that's a long standing design decision.  We don't feed a single 
> def into multiple use sites.

There is no real reason not to do that.  It doesn't increase computational
complexity, although it is of course more expensive than what combine does
today (it is more work, after all).  And combining with a later use does
not have too big a chance to succeed (since it has to keep the result of
the earlier insn around always).

GCC 6 or later ;-)


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04 20:58               ` Segher Boessenkool
  2014-12-05 22:36                 ` Jeff Law
@ 2014-12-06  0:22                 ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-06  0:22 UTC (permalink / raw)
  To: Zhenqiang Chen
  Cc: 'Jeff Law', Steven Bosscher, gcc-patches, Jakub Jelinek

On Thu, Dec 04, 2014 at 02:57:56PM -0600, Segher Boessenkool wrote:
> Adding a bridge pattern in the target would work; or you can enhance combine
> so it can break up this parallel correctly.

I also investigated that second option.  The enhancement transforms
the combine result

M = XXX  ::  T = XXX

into

M = XXX
T = M

and then the set of T can combine with its later use (the compare), but
it won't ever combine that with the store to M: there is never a link
for memory, only for registers.

Never mind that this is unsuitable for many targets anyway (it creates
a read-after-write hazard).


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-06  0:09                   ` Segher Boessenkool
@ 2014-12-07  5:03                     ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-07  5:03 UTC (permalink / raw)
  To: Jeff Law; +Cc: Zhenqiang Chen, Steven Bosscher, gcc-patches, Jakub Jelinek

On Fri, Dec 05, 2014 at 06:09:11PM -0600, Segher Boessenkool wrote:
> On Fri, Dec 05, 2014 at 03:36:01PM -0700, Jeff Law wrote:
> > Zhenqiang, can you look at what happens if you provide a pattern for 
> > 6+7+8 (probably via a define_and_split)?
> 
> I tried this out yesterday.  There are a few options (a bridge pattern
> for 6+7+8, or one for 7+8).  I went with 6+7+8.
> 
> So the code combine is asked to optimise is
> 
> 6  A = M
> 7  T = A + B
> 8  M = T
> 9  C = cmp T, 0

... and combine will never combine a write to memory (8 here) into a
later insn (see can_combine_p).  So this won't ever fly.

I see no reasonably simple way combine can be convinced to do this.
There are various possible schemes to pull insn 9 to before 8, but when
does this help and when does it hurt?  It all depends on the target :-(


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-06  0:16                 ` Segher Boessenkool
@ 2014-12-08 19:07                   ` Jeff Law
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Law @ 2014-12-08 19:07 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Zhenqiang Chen, Steven Bosscher, gcc-patches, Jakub Jelinek

On 12/05/14 17:16, Segher Boessenkool wrote:
> On Fri, Dec 05, 2014 at 03:31:54PM -0700, Jeff Law wrote:
>>> Combine does not consider combining 9 into 7 because there is no LOG_LINK
>>> between them (the link for r88 is between 8 and 7 already).
>> OK, yea, that's a long standing design decision.  We don't feed a single
>> def into multiple use sites.
>
> There is no real reason not to do that.  It doesn't increase computational
> complexity, although it is of course more expensive than what combine does
> today (it is more work, after all).  And combining with a later use does
> not have too big a chance to succeed (since it has to keep the result of
> the earlier insn around always).
No fundamental reason, it's just always been that way.  One could argue 
that a bridge pattern often makes this unnecessary and bridges have been 
a well known way to work around combine's failings for a long time.

> GCC 6 or later ;-)
Yea, I think so :)

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-04  8:43           ` Zhenqiang Chen
  2014-12-04 20:50             ` Segher Boessenkool
@ 2014-12-08 21:29             ` Jeff Law
  2014-12-09  9:49               ` Zhenqiang Chen
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2014-12-08 21:29 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: Steven Bosscher, gcc-patches, Jakub Jelinek

On 12/04/14 01:43, Zhenqiang Chen wrote:
>>> > >
>>> > >          Part of PR rtl-optimization/61225
>>> > >          * combine.c (refer_same_reg_p): New function.
>>> > >          (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
>>> > >          (try_combine): Add one more parameter TO_COMBINED_INSN, which
>> >is
>>> > >          used to create a new insn parallel (TO_COMBINED_INSN, I3).
>>> > >
>>> > >testsuite/ChangeLog:
>>> > >2014-08-04  Zhenqiang Chen<zhenqiang.chen@linaro.org>
>>> > >
>>> > >          * gcc.target/i386/pr61225.c: New test.
THanks for the updates and clarifications.  Just a few minor things and 
while it's a bit of a hack, I'll approve:



> +
> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> +   except A and B.  */
> +
> +static bool
> +refer_same_reg_p (rtx_insn *a, rtx_insn *b)
> +{
> +  rtx seta = single_set (a);
> +  rtx setb = single_set (b);
> +
> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> +      || !seta
> +      || !setb)
> +    return false;
> +
> +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> +      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> +      || !REG_P (XEXP (SET_SRC (seta), 0))
> +      || XEXP (SET_SRC (seta), 1) != const0_rtx
> +      || !REG_P (SET_SRC (setb))
> +      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> +    return false;
Do you need to verify SETA and SETB satisfy single_set?  Or has that 
already been done elsewhere?

The name refer_same_reg_p seems wrong -- your function is verifying the 
underlying RTL store as well as the existence of a a dependency between 
the insns.  Can you try to come up with a better name?

Please use CONST0_RTX (mode)  IIRC that'll allow this to work regardless 
of the size of the modes relative to the host word size.



> +
> +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
> +    {
> +      df_ref use;
> +      rtx insn;
> +      unsigned int i = REGNO (SET_SRC (setb));
> +
> +      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use))
> +        {
> +	  insn = DF_REF_INSN (use);
> +	  if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn)))
> +	    return false;
> +	}
> +    }
> +
> +  return true;
> +}
Is this fragment really needed?  Does it ever trigger?  I'd think that 
for > 2 uses punting would be fine.  Do we really commonly have cases 
with > 2 uses, but where they're all in SETA and SETB?

>   		  }
>   	      }
>
> +	  /* Try to combine a compare insn that sets CC
> +	     with a preceding insn that can set CC, and maybe with its
> +	     logical predecessor as well.
> +	     We need this special code because data flow connections
> +	     do not get entered in LOG_LINKS.  */
> +	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
> +	      && refer_same_reg_p (insn, prev)
> +	      && max_combine >= 4)
> +	    {
> +		struct insn_link *next1;
> +		FOR_EACH_LOG_LINK (next1, prev)
> +		  {
> +		    rtx_insn *link1 = next1->insn;
> +		    if (NOTE_P (link1))
> +		      continue;
> +		    /* I1 -> I2 -> I3; I2 -> insn;
> +		       output parallel (insn, I3).  */
> +		    FOR_EACH_LOG_LINK (nextlinks, link1)
> +		      if ((next = try_combine (prev, link1,
> +					       nextlinks->insn, NULL,
> +					       &new_direct_jump_p,
> +					       last_combined_insn, insn)) != 0)
> +
> +			{
> +			  delete_insn (insn);
> +			  insn = next;
> +			  statistics_counter_event (cfun, "four-insn combine", 1);
> +			  goto retry;
> +			}
> +		    /* I2 -> I3; I2 -> insn
> +		       output next = parallel (insn, I3).  */
> +		    if ((next = try_combine (prev, link1,
> +					     NULL, NULL,
> +					     &new_direct_jump_p,
> +					     last_combined_insn, insn)) != 0)
> +
> +		      {
> +			delete_insn (insn);
> +			insn = next;
> +			statistics_counter_event (cfun, "three-insn combine", 1);
> +			goto retry;
> +		      }
> +		  }
> +	    }
So you've got two new combine cases here, but I think the testcase only 
tests one of them.  Can you include a testcase for both of hte major 
paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)

Please make those changes and repost for final approval.

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] Fix PR 61225
  2014-12-08 21:29             ` Jeff Law
@ 2014-12-09  9:49               ` Zhenqiang Chen
  2014-12-09 18:52                 ` Jeff Law
  2014-12-09 19:07                 ` Segher Boessenkool
  0 siblings, 2 replies; 27+ messages in thread
From: Zhenqiang Chen @ 2014-12-09  9:49 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: gcc-patches


> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Tuesday, December 09, 2014 5:29 AM
> To: Zhenqiang Chen
> Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 12/04/14 01:43, Zhenqiang Chen wrote:
> >>> > >
> >>> > >          Part of PR rtl-optimization/61225
> >>> > >          * combine.c (refer_same_reg_p): New function.
> >>> > >          (combine_instructions): Handle I1 -> I2 -> I3; I2 ->
insn.
> >>> > >          (try_combine): Add one more parameter TO_COMBINED_INSN,
> >>> > > which
> >> >is
> >>> > >          used to create a new insn parallel (TO_COMBINED_INSN,
I3).
> >>> > >
> >>> > >testsuite/ChangeLog:
> >>> > >2014-08-04  Zhenqiang Chen<zhenqiang.chen@linaro.org>
> >>> > >
> >>> > >          * gcc.target/i386/pr61225.c: New test.
> THanks for the updates and clarifications.  Just a few minor things and
while
> it's a bit of a hack, I'll approve:
> 
> 
> 
> > +
> > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> > +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> > +   except A and B.  */
> > +
> > +static bool
> > +refer_same_reg_p (rtx_insn *a, rtx_insn *b)
> > +{
> > +  rtx seta = single_set (a);
> > +  rtx setb = single_set (b);
> > +
> > +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> > +      || !seta
> > +      || !setb)
> > +    return false;
> > 
> > +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> > +      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> > +      || !REG_P (XEXP (SET_SRC (seta), 0))
> > +      || XEXP (SET_SRC (seta), 1) != const0_rtx
> > +      || !REG_P (SET_SRC (setb))
> > +      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> > +    return false;
> Do you need to verify SETA and SETB satisfy single_set?  Or has that
> already been done elsewhere?

A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 -> I2 -> B; I2 -> A;
LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
found codes in function try_combine, which can make sure B (or I3) is
single_set. 

So I think the check can skip failed cases at early stage.
 
> The name refer_same_reg_p seems wrong -- your function is verifying the
> underlying RTL store as well as the existence of a a dependency between
> the insns.  Can you try to come up with a better name?

Change it to can_reuse_cc_set_p.

> Please use CONST0_RTX (mode)  IIRC that'll allow this to work regardless
> of the size of the modes relative to the host word size.
 
Updated. 
 
> > +
> > +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
> > +    {
> > +      df_ref use;
> > +      rtx insn;
> > +      unsigned int i = REGNO (SET_SRC (setb));
> > +
> > +      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
(use))
> > +        {
> > +	  insn = DF_REF_INSN (use);
> > +	  if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P
> (insn)))
> > +	    return false;
> > +	}
> > +    }
> > +
> > +  return true;
> > +}
> Is this fragment really needed?  Does it ever trigger?  I'd think that
> for > 2 uses punting would be fine.  Do we really commonly have cases
> with > 2 uses, but where they're all in SETA and SETB?

The check is to make sure the correctness.  Here is a case,

int 
f1 (int *x)
{
  int t = --*x;
  if (!t)
    foo (x);
  return t;
}

  _4 = *x_3(D);
  _5 = _4 + -1;
  *x_3(D) = _5;
  # DEBUG t => _5
  if (_5 == 0)
   ...
  <bb 4>:
  return _5;

"_5" is used in "return _5". So we can not remove "_5 = _4 + -1".
 
> >   		  }
> >   	      }
> >
> > +	  /* Try to combine a compare insn that sets CC
> > +	     with a preceding insn that can set CC, and maybe with its
> > +	     logical predecessor as well.
> > +	     We need this special code because data flow connections
> > +	     do not get entered in LOG_LINKS.  */
> > +	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
> > +	      && refer_same_reg_p (insn, prev)
> > +	      && max_combine >= 4)
> > +	    {
> > +		struct insn_link *next1;
> > +		FOR_EACH_LOG_LINK (next1, prev)
> > +		  {
> > +		    rtx_insn *link1 = next1->insn;
> > +		    if (NOTE_P (link1))
> > +		      continue;
> > +		    /* I1 -> I2 -> I3; I2 -> insn;
> > +		       output parallel (insn, I3).  */
> > +		    FOR_EACH_LOG_LINK (nextlinks, link1)
> > +		      if ((next = try_combine (prev, link1,
> > +					       nextlinks->insn, NULL,
> > +					       &new_direct_jump_p,
> > +					       last_combined_insn, insn)) !=
0)
> > +
> > +			{
> > +			  delete_insn (insn);
> > +			  insn = next;
> > +			  statistics_counter_event (cfun, "four-insn
combine",
> 1);
> > +			  goto retry;
> > +			}
> > +		    /* I2 -> I3; I2 -> insn
> > +		       output next = parallel (insn, I3).  */
> > +		    if ((next = try_combine (prev, link1,
> > +					     NULL, NULL,
> > +					     &new_direct_jump_p,
> > +					     last_combined_insn, insn)) !=
0)
> > +
> > +		      {
> > +			delete_insn (insn);
> > +			insn = next;
> > +			statistics_counter_event (cfun, "three-insn
combine",
> 1);
> > +			goto retry;
> > +		      }
> > +		  }
> > +	    }
> So you've got two new combine cases here, but I think the testcase only
> tests one of them.  Can you include a testcase for both of hte major
> paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)

pr61225.c is the case to cover I1->I2->I3; I2->insn.

For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
handle them. So I removed the code from the patch.

Here is the final patch.
Bootstrap and no make check regression on X86-64.

ChangeLog:
2014-11-09  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

	Part of PR rtl-optimization/61225
	* combine.c (can_reuse_cc_set_p): New function.
	(combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
	(try_combine): Add one more parameter TO_COMBINED_INSN, which
	is used to create a new insn parallel (TO_COMBINED_INSN, I3).

testsuite/ChangeLog:
2014-11-09  Zhenqiang Chen<zhenqiang.chen@linaro.org>

	* gcc.target/i386/pr61225.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index e6deb41..7360ca3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -431,7 +431,7 @@ static int can_combine_p (rtx_insn *, rtx_insn *,
rtx_insn *, rtx_insn *,
 static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int,
rtx *);
 static int contains_muldiv (rtx);
 static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn
*,
-			      int *, rtx_insn *);
+			      int *, rtx_insn *, rtx_insn *);
 static void undo_all (void);
 static void undo_commit (void);
 static rtx *find_split_point (rtx *, rtx_insn *, bool);
@@ -1120,6 +1120,46 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 #endif
   return false;
 }
+
+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+   It returns TRUE, if reg1 == reg2, and no other refer of reg1
+   except A and B.  */
+
+static bool
+can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
+      || !seta
+      || !setb)
+    return false;
+
+  if (GET_CODE (SET_SRC (seta)) != COMPARE
+      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+      || !REG_P (XEXP (SET_SRC (seta), 0))
+      || XEXP (SET_SRC (seta), 1) != CONST0_RTX (GET_MODE (SET_SRC (seta)))
+      || !REG_P (SET_SRC (setb))
+      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+    return false;
+
+  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
+    {
+      df_ref use;
+      rtx insn;
+      unsigned int i = REGNO (SET_SRC (setb));
+
+      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use))
+        {
+	  insn = DF_REF_INSN (use);
+	  if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P
(insn)))
+	    return false;
+	}
+    }
+  return true;
+}
+
 

 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.
@@ -1129,10 +1169,7 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 static int
 combine_instructions (rtx_insn *f, unsigned int nregs)
 {
-  rtx_insn *insn, *next;
-#ifdef HAVE_cc0
-  rtx_insn *prev;
-#endif
+  rtx_insn *insn, *next, *prev;
   struct insn_link *links, *nextlinks;
   rtx_insn *first;
   basic_block last_bb;
@@ -1279,7 +1316,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 	  FOR_EACH_LOG_LINK (links, insn)
 	    if ((next = try_combine (insn, links->insn, NULL,
 				     NULL, &new_direct_jump_p,
-				     last_combined_insn)) != 0)
+				     last_combined_insn, NULL)) != 0)
 	      {
 		statistics_counter_event (cfun, "two-insn combine", 1);
 		goto retry;
@@ -1300,7 +1337,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		FOR_EACH_LOG_LINK (nextlinks, link)
 		  if ((next = try_combine (insn, link, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    {
 		      statistics_counter_event (cfun, "three-insn combine",
1);
 		      goto retry;
@@ -1322,13 +1359,13 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1342,13 +1379,13 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1364,7 +1401,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  && sets_cc0_p (PATTERN (prev))
 		  && (next = try_combine (insn, links->insn,
 					  prev, NULL, &new_direct_jump_p,
-					  last_combined_insn)) != 0)
+					  last_combined_insn, NULL)) != 0)
 		goto retry;
 #endif
 
@@ -1377,7 +1414,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		if ((next = try_combine (insn, links->insn,
 					 nextlinks->insn, NULL,
 					 &new_direct_jump_p,
-					 last_combined_insn)) != 0)
+					 last_combined_insn, NULL)) != 0)
 
 		  {
 		    statistics_counter_event (cfun, "three-insn combine",
1);
@@ -1406,7 +1443,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1417,7 +1454,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1434,7 +1471,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1444,7 +1481,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1452,6 +1489,37 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 		  }
 	      }
 
+	  /* Try to combine a compare insn that sets CC
+	     with a preceding insn that can set CC, and maybe with its
+	     logical predecessor as well.
+	     We need this special code because data flow connections
+	     do not get entered in LOG_LINKS.  */
+	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
+	      && can_reuse_cc_set_p (insn, prev)
+	      && max_combine >= 4)
+	    {
+		struct insn_link *next1;
+		FOR_EACH_LOG_LINK (next1, prev)
+		  {
+		    rtx_insn *link1 = next1->insn;
+		    if (NOTE_P (link1))
+		      continue;
+		    /* I1 -> I2 -> I3; I2 -> insn;
+		       output parallel (insn, I3).  */
+		    FOR_EACH_LOG_LINK (nextlinks, link1)
+		      if ((next = try_combine (prev, link1,
+					       nextlinks->insn, NULL,
+					       &new_direct_jump_p,
+					       last_combined_insn, insn)) !=
0)
+
+			{
+			  delete_insn (insn);
+			  insn = next;
+			  statistics_counter_event (cfun, "four-insn
combine", 1);
+			  goto retry;
+			}
+		  }
+	    }
 	  /* Try this insn with each REG_EQUAL note it links back to.  */
 	  FOR_EACH_LOG_LINK (links, insn)
 	    {
@@ -1477,7 +1545,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  i2mod_new_rhs = copy_rtx (note);
 		  next = try_combine (insn, i2mod, NULL, NULL,
 				      &new_direct_jump_p,
-				      last_combined_insn);
+				      last_combined_insn, NULL);
 		  i2mod = NULL;
 		  if (next)
 		    {
@@ -2533,11 +2601,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn,
int n)
 
    LAST_COMBINED_INSN is either I3, or some insn after I3 that has
    been I3 passed to an earlier try_combine within the same basic
-   block.  */
+   block.
+
+   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
+   combine with I3 to make a new insn.  */
 
 static rtx_insn *
 try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
-	     int *new_direct_jump_p, rtx_insn *last_combined_insn)
+	     int *new_direct_jump_p, rtx_insn *last_combined_insn,
+	     rtx_insn *to_combined_insn)
 {
   /* New patterns for I3 and I2, respectively.  */
   rtx newpat, newi2pat = 0;
@@ -2630,6 +2702,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
       || (i0 && cant_combine_insn_p (i0))
+      || (to_combined_insn && cant_combine_insn_p (to_combined_insn))
       || likely_spilled_retval_p (i3))
     return 0;
 
@@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  rtx old = newpat;
 	  total_sets = 1 + extra_sets;
 	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
-	  XVECEXP (newpat, 0, 0) = old;
+
+	  if (to_combined_insn)
+	    XVECEXP (newpat, 0, --total_sets) = old;
+	  else
+	    XVECEXP (newpat, 0, 0) = old;
 	}
 
       if (added_sets_0)
@@ -3346,6 +3423,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)
 	    t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0,
0);
 
+	  if (to_combined_insn)
+	    {
+	      rtx todest = NULL_RTX, tosrc = NULL_RTX;
+	      if (can_combine_p (i2, to_combined_insn, NULL, NULL,
+				 i3, NULL, &todest, &tosrc))
+		{
+		  rtx pat = copy_rtx (PATTERN (to_combined_insn));
+		  t = subst (pat, todest, tosrc, 0, 0, 0);
+		}
+	      else
+		{
+		  undo_all ();
+		  return 0;
+		}
+	    }
 	  XVECEXP (newpat, 0, --total_sets) = t;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..9c08102
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/61225 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-rtl-combine-details" } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*x)
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-rtl-dump "Successfully matched this instruction"
"combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-09  9:49               ` Zhenqiang Chen
@ 2014-12-09 18:52                 ` Jeff Law
  2014-12-09 19:07                 ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff Law @ 2014-12-09 18:52 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: gcc-patches

On 12/09/14 02:49, Zhenqiang Chen wrote:
>> Do you need to verify SETA and SETB satisfy single_set?  Or has that
>> already been done elsewhere?
>
> A is NEXT_INSN (insn)
> B is prev_nonnote_nondebug_insn (insn),
>
> For I1 -> I2 -> B; I2 -> A;
> LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
> found codes in function try_combine, which can make sure B (or I3) is
> single_set.
>
> So I think the check can skip failed cases at early stage.
Thanks for doing the research on this.

>
> The check is to make sure the correctness.  Here is a case,
>
> int
> f1 (int *x)
> {
>    int t = --*x;
>    if (!t)
>      foo (x);
>    return t;
> }
>
>    _4 = *x_3(D);
>    _5 = _4 + -1;
>    *x_3(D) = _5;
>    # DEBUG t => _5
>    if (_5 == 0)
>     ...
>    <bb 4>:
>    return _5;
>
> "_5" is used in "return _5". So we can not remove "_5 = _4 + -1".
Right, but ISTM that if the # uses > 2, then we could just return false 
rather than bothering to see if all the uses are consumed by A or B. 
It's not a big deal, I just have a hard time seeing that doing something 
more complex than "if (# uses > 2) return false;" makes sense.




>> So you've got two new combine cases here, but I think the testcase only
>> tests one of them.  Can you include a testcase for both of hte major
>> paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
>
> pr61225.c is the case to cover I1->I2->I3; I2->insn.
>
> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
> handle them. So I removed the code from the patch.
Seems like the reasonable thing to do.

>
> Here is the final patch.
> Bootstrap and no make check regression on X86-64.
>
> ChangeLog:
> 2014-11-09  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
> 	Part of PR rtl-optimization/61225
> 	* combine.c (can_reuse_cc_set_p): New function.
> 	(combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
> 	(try_combine): Add one more parameter TO_COMBINED_INSN, which
> 	is used to create a new insn parallel (TO_COMBINED_INSN, I3).
>
> testsuite/ChangeLog:
> 2014-11-09  Zhenqiang Chen<zhenqiang.chen@linaro.org>
>
> 	* gcc.target/i386/pr61225.c: New test.
>

OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-09  9:49               ` Zhenqiang Chen
  2014-12-09 18:52                 ` Jeff Law
@ 2014-12-09 19:07                 ` Segher Boessenkool
  2014-12-09 19:15                   ` Jeff Law
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-09 19:07 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: 'Jeff Law', gcc-patches

On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
> > Do you need to verify SETA and SETB satisfy single_set?  Or has that
> > already been done elsewhere?
> 
> A is NEXT_INSN (insn)
> B is prev_nonnote_nondebug_insn (insn),
> 
> For I1 -> I2 -> B; I2 -> A;
> LOG_LINK can make sure I1 and I2 are single_set,

It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
SETs; multiple LOG_LINKs can point to such an insn.

The only thing a LOG_LINK from Y to X tells you is that Y is the earliest
insn after X that uses some register set by X (and it knows which register,
too, nowadays).

> > > +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
> > > +    {
> > > +      df_ref use;
> > > +      rtx insn;
> > > +      unsigned int i = REGNO (SET_SRC (setb));
> > > +
> > > +      for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
> (use))
> > > +        {
> > > +	  insn = DF_REF_INSN (use);
> > > +	  if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P
> > (insn)))
> > > +	    return false;
> > > +	}
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > Is this fragment really needed?  Does it ever trigger?  I'd think that
> > for > 2 uses punting would be fine.  Do we really commonly have cases
> > with > 2 uses, but where they're all in SETA and SETB?

Can't you just check for a death note on the second insn?  Together with
reg_used_between_p?

> > > +	  /* Try to combine a compare insn that sets CC
> > > +	     with a preceding insn that can set CC, and maybe with its
> > > +	     logical predecessor as well.
> > > +	     We need this special code because data flow connections
> > > +	     do not get entered in LOG_LINKS.  */

I think you mean "not _all_ data flow connections"?

> > > +	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
> > > +	      && refer_same_reg_p (insn, prev)
> > > +	      && max_combine >= 4)
> > > +	    {
> > > +		struct insn_link *next1;
> > > +		FOR_EACH_LOG_LINK (next1, prev)
> > > +		  {
> > > +		    rtx_insn *link1 = next1->insn;
> > > +		    if (NOTE_P (link1))
> > > +		      continue;
> > > +		    /* I1 -> I2 -> I3; I2 -> insn;
> > > +		       output parallel (insn, I3).  */
> > > +		    FOR_EACH_LOG_LINK (nextlinks, link1)
> > > +		      if ((next = try_combine (prev, link1,
> > > +					       nextlinks->insn, NULL,
> > > +					       &new_direct_jump_p,
> > > +					       last_combined_insn, insn)) !=
> 0)
> > > +
> > > +			{
> > > +			  delete_insn (insn);
> > > +			  insn = next;
> > > +			  statistics_counter_event (cfun, "four-insn
> combine",
> > 1);
> > > +			  goto retry;
> > > +			}
> > > +		    /* I2 -> I3; I2 -> insn
> > > +		       output next = parallel (insn, I3).  */
> > > +		    if ((next = try_combine (prev, link1,
> > > +					     NULL, NULL,
> > > +					     &new_direct_jump_p,
> > > +					     last_combined_insn, insn)) !=
> 0)
> > > +
> > > +		      {
> > > +			delete_insn (insn);
> > > +			insn = next;
> > > +			statistics_counter_event (cfun, "three-insn
> combine",
> > 1);
> > > +			goto retry;
> > > +		      }
> > > +		  }
> > > +	    }
> > So you've got two new combine cases here, but I think the testcase only
> > tests one of them.  Can you include a testcase for both of hte major
> > paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
> 
> pr61225.c is the case to cover I1->I2->I3; I2->insn.
> 
> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
> handle them. So I removed the code from the patch.

Why?  The simpler case has much better chances of being used.

In fact, there are many more cases you could handle:

You handle

I1 -> I2 -> I3; I2 -> insn
      I2 -> I3; I2 -> insn

but there are also

   I1,I2 -> I3; I2 -> insn

and the many 4-insn combos, too.
But that's not all: instead of just dealing with I2->insn, you can just as
well have I1->insn or I0->insn, and if you could handle the SET not dying
in the resulting insn, I3->insn.

In fact, in that case you really only need to handle I3->insn (no other
instructions involved), as a simple 2-insn combination that combines into
the earlier insn instead of the later, to get the effect you want.

Just like your patch, that would pull "insn" earlier, but it would do it
much more explicitly.


Some comments on the patch...

> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> +   except A and B.  */

That sound like the only correct inputs are such a compare etc., but the
routine tests whether that is true.

> +static bool
> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
> +{
> +  rtx seta = single_set (a);
> +  rtx setb = single_set (b);
> +
> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)

Neither the comment nor the function name mention this.  This test is
better placed in the caller of this function, anyway.

> +	  /* Try to combine a compare insn that sets CC
> +	     with a preceding insn that can set CC, and maybe with its
> +	     logical predecessor as well.

If you do remove the I2->I3 case this comment needs updating.  If you
don't remove it, it should be tried before the I1->I2->I3 case.

> +   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
> +   combine with I3 to make a new insn.  */

"combine into I3", to make clear the resulting insn is placed at I3?

> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> *i1, rtx_insn *i0,
>  	  rtx old = newpat;
>  	  total_sets = 1 + extra_sets;
>  	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
> -	  XVECEXP (newpat, 0, 0) = old;
> +
> +	  if (to_combined_insn)
> +	    XVECEXP (newpat, 0, --total_sets) = old;
> +	  else
> +	    XVECEXP (newpat, 0, 0) = old;
>  	}

Is this correct?  If so, it needs a big fat comment, because it is
not exactly obvious :-)

Also, it doesn't handle at all the case where the new pattern already is
a PARALLEL; can that never happen?

Cheers,


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-09 19:07                 ` Segher Boessenkool
@ 2014-12-09 19:15                   ` Jeff Law
  2014-12-10 13:47                     ` Segher Boessenkool
  2014-12-12  7:27                     ` Zhenqiang Chen
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff Law @ 2014-12-09 19:15 UTC (permalink / raw)
  To: Segher Boessenkool, Zhenqiang Chen; +Cc: gcc-patches

On 12/09/14 12:07, Segher Boessenkool wrote:
> On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
>>> Do you need to verify SETA and SETB satisfy single_set?  Or has that
>>> already been done elsewhere?
>>
>> A is NEXT_INSN (insn)
>> B is prev_nonnote_nondebug_insn (insn),
>>
>> For I1 -> I2 -> B; I2 -> A;
>> LOG_LINK can make sure I1 and I2 are single_set,
>
> It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
> SETs; multiple LOG_LINKs can point to such an insn.
So let's go ahead and put a single_set test in this function.

>>>> Is this fragment really needed?  Does it ever trigger?  I'd think that
>>> for > 2 uses punting would be fine.  Do we really commonly have cases
>>> with > 2 uses, but where they're all in SETA and SETB?
>
> Can't you just check for a death note on the second insn?  Together with
> reg_used_between_p?
Yea, that'd accomplish the same thing I think Zhenqiang is trying to 
catch and is simpler than walking the lists.

>
>>>> +	  /* Try to combine a compare insn that sets CC
>>>> +	     with a preceding insn that can set CC, and maybe with its
>>>> +	     logical predecessor as well.
>>>> +	     We need this special code because data flow connections
>>>> +	     do not get entered in LOG_LINKS.  */
>
> I think you mean "not _all_ data flow connections"?
I almost said something about this comment, but figured I was nitpicking 
too much :-)

>>> So you've got two new combine cases here, but I think the testcase only
>>> tests one of them.  Can you include a testcase for both of hte major
>>> paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
>>
>> pr61225.c is the case to cover I1->I2->I3; I2->insn.
>>
>> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
>> handle them. So I removed the code from the patch.
>
> Why?  The simpler case has much better chances of being used.
The question does it actually catch anything not already handled?  I 
guess you could argue that doing it in combine is better than peep2 and 
I'd agree with that.

>
> In fact, there are many more cases you could handle:
>
> You handle
>
> I1 -> I2 -> I3; I2 -> insn
>        I2 -> I3; I2 -> insn
>
> but there are also
>
>     I1,I2 -> I3; I2 -> insn
>
> and the many 4-insn combos, too.
Yes, but I wonder how much of this is really necessary in practice.  We 
could do exhaustive testing here, but I suspect the payoff isn't all 
that great.  Thus I'm comfortable with faulting in the cases we actually 
find are useful in practice.

>
>> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
>> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
>> +   except A and B.  */
>
> That sound like the only correct inputs are such a compare etc., but the
> routine tests whether that is true.
Correct, the RTL has to have a specific form and that is tested for. 
Comment updates can't hurt.


>
>> +static bool
>> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
>> +{
>> +  rtx seta = single_set (a);
>> +  rtx setb = single_set (b);
>> +
>> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
>
> Neither the comment nor the function name mention this.  This test is
> better placed in the caller of this function, anyway.
Didn't consider it terribly important.  Moving it to the caller doesn't 
change anything significantly, though I would agree it's martinally cleaner.

>
>> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
>> *i1, rtx_insn *i0,
>>   	  rtx old = newpat;
>>   	  total_sets = 1 + extra_sets;
>>   	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
>> -	  XVECEXP (newpat, 0, 0) = old;
>> +
>> +	  if (to_combined_insn)
>> +	    XVECEXP (newpat, 0, --total_sets) = old;
>> +	  else
>> +	    XVECEXP (newpat, 0, 0) = old;
>>   	}
>
> Is this correct?  If so, it needs a big fat comment, because it is
> not exactly obvious :-)
>
> Also, it doesn't handle at all the case where the new pattern already is
> a PARALLEL; can that never happen?
I'd convinced myself it was.  But yes, a comment here would be good.

Presumably you're thinking about a PARALLEL that satisfies single_set_p?

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-09 19:15                   ` Jeff Law
@ 2014-12-10 13:47                     ` Segher Boessenkool
  2015-01-14 22:05                       ` Jeff Law
  2014-12-12  7:27                     ` Zhenqiang Chen
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-10 13:47 UTC (permalink / raw)
  To: Jeff Law; +Cc: Zhenqiang Chen, gcc-patches

On Tue, Dec 09, 2014 at 12:15:30PM -0700, Jeff Law wrote:
> >>@@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> >>*i1, rtx_insn *i0,
> >>  	  rtx old = newpat;
> >>  	  total_sets = 1 + extra_sets;
> >>  	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
> >>-	  XVECEXP (newpat, 0, 0) = old;
> >>+
> >>+	  if (to_combined_insn)
> >>+	    XVECEXP (newpat, 0, --total_sets) = old;
> >>+	  else
> >>+	    XVECEXP (newpat, 0, 0) = old;
> >>  	}
> >
> >Is this correct?  If so, it needs a big fat comment, because it is
> >not exactly obvious :-)
> >
> >Also, it doesn't handle at all the case where the new pattern already is
> >a PARALLEL; can that never happen?
> I'd convinced myself it was.  But yes, a comment here would be good.
> 
> Presumably you're thinking about a PARALLEL that satisfies single_set_p?

I wasn't thinking about anything in particular; this code does not handle
a PARALLEL newpat with to_combined_insn correctly, and it doesn't say it
cannot happen.

But yes, I don't see why it could not happen?  E.g. a parallel of multiple
sets with all but one of those dead?

Why should it be single_set here anyway?  (Maybe I need more coffee, sorry
if so).


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] Fix PR 61225
  2014-12-09 19:15                   ` Jeff Law
  2014-12-10 13:47                     ` Segher Boessenkool
@ 2014-12-12  7:27                     ` Zhenqiang Chen
  2014-12-12 11:08                       ` Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Zhenqiang Chen @ 2014-12-12  7:27 UTC (permalink / raw)
  To: 'Jeff Law', Segher Boessenkool; +Cc: gcc-patches



> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Wednesday, December 10, 2014 3:16 AM
> To: Segher Boessenkool; Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 12/09/14 12:07, Segher Boessenkool wrote:
> > On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
> >>> Do you need to verify SETA and SETB satisfy single_set?  Or has that
> >>> already been done elsewhere?
> >>
> >> A is NEXT_INSN (insn)
> >> B is prev_nonnote_nondebug_insn (insn),
> >>
> >> For I1 -> I2 -> B; I2 -> A;
> >> LOG_LINK can make sure I1 and I2 are single_set,
> >
> > It cannot, not anymore anyway.  LOG_LINKs can point to an insn with
> > multiple SETs; multiple LOG_LINKs can point to such an insn.
> So let's go ahead and put a single_set test in this function.
> 
> >>>> Is this fragment really needed?  Does it ever trigger?  I'd think
> >>>> that
> >>> for > 2 uses punting would be fine.  Do we really commonly have
> >>> cases with > 2 uses, but where they're all in SETA and SETB?
> >
> > Can't you just check for a death note on the second insn?  Together
> > with reg_used_between_p?
> Yea, that'd accomplish the same thing I think Zhenqiang is trying to catch
and
> is simpler than walking the lists.

Updated. Check for a death note is enough since b is
prev_nonnote_nondebug_insn (a). 
 
> >
> >>>> +	  /* Try to combine a compare insn that sets CC
> >>>> +	     with a preceding insn that can set CC, and maybe with
its
> >>>> +	     logical predecessor as well.
> >>>> +	     We need this special code because data flow connections
> >>>> +	     do not get entered in LOG_LINKS.  */
> >
> > I think you mean "not _all_ data flow connections"?
> I almost said something about this comment, but figured I was nitpicking
too
> much :-)

Updated. 

> >>> So you've got two new combine cases here, but I think the testcase
> >>> only tests one of them.  Can you include a testcase for both of hte
> >>> major paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
> >>
> >> pr61225.c is the case to cover I1->I2->I3; I2->insn.
> >>
> >> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2
> >> can also handle them. So I removed the code from the patch.
> >
> > Why?  The simpler case has much better chances of being used.
> The question does it actually catch anything not already handled?  I guess
you
> could argue that doing it in combine is better than peep2 and I'd agree
with
> that.
> 
> >
> > In fact, there are many more cases you could handle:
> >
> > You handle
> >
> > I1 -> I2 -> I3; I2 -> insn
> >        I2 -> I3; I2 -> insn
> >
> > but there are also
> >
> >     I1,I2 -> I3; I2 -> insn
> >
> > and the many 4-insn combos, too.
> Yes, but I wonder how much of this is really necessary in practice.  We
> could do exhaustive testing here, but I suspect the payoff isn't all
> that great.  Thus I'm comfortable with faulting in the cases we actually
> find are useful in practice.
> 
> >
> >> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> >> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> >> +   except A and B.  */
> >
> > That sound like the only correct inputs are such a compare etc., but the
> > routine tests whether that is true.
> Correct, the RTL has to have a specific form and that is tested for.
> Comment updates can't hurt.
 
Updated.
 
> >
> >> +static bool
> >> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
> >> +{
> >> +  rtx seta = single_set (a);
> >> +  rtx setb = single_set (b);
> >> +
> >> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> >
> > Neither the comment nor the function name mention this.  This test is
> > better placed in the caller of this function, anyway.
> Didn't consider it terribly important.  Moving it to the caller doesn't
> change anything significantly, though I would agree it's martinally
cleaner.

Updated.
 
> >
> >> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
> rtx_insn
> >> *i1, rtx_insn *i0,
> >>   	  rtx old = newpat;
> >>   	  total_sets = 1 + extra_sets;
> >>   	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
> >> -	  XVECEXP (newpat, 0, 0) = old;
> >> +
> >> +	  if (to_combined_insn)
> >> +	    XVECEXP (newpat, 0, --total_sets) = old;
> >> +	  else
> >> +	    XVECEXP (newpat, 0, 0) = old;
> >>   	}
> >
> > Is this correct?  If so, it needs a big fat comment, because it is
> > not exactly obvious :-)
> >
> > Also, it doesn't handle at all the case where the new pattern already is
> > a PARALLEL; can that never happen?
> I'd convinced myself it was.  But yes, a comment here would be good.

The following comments are added.

+	    /* This is a hack to match i386 instruction pattern, which
+		is like
+		    (parallel [
+			(set (reg:CCZ 17 flags)
+				...)
+			(set ...)})
+		we have to swap the newpat order of I3 and TO_COMBINED_INSN.
*/

> Presumably you're thinking about a PARALLEL that satisfies single_set_p?

No. It has nothing to do with single_set_p. I just want to reuse the code to
match the instruction pattern.

In common, the new PARALLEL is like

  Parallel
    newpat from I3
    newpat from I2 // if have
    newpat from I1 // if have
    newpat from I0 // if have

For to_combined_insn, i0 is NULL and there should have no

    newpat from I1

When handling I1->I2->I3, with normal order, it will get
  Parallel
    newpat from I3

After I2-> to_combined_insn, the parallel will be
  Parallel
    newpat from I3
    newpat from to_combined_insn.

But this can not match the insn pattern. So I swap the order to.
  Parallel
    newpat from to_combined_insn.
    newpat from I3

Here is the updated patch.

diff --git a/gcc/combine.c b/gcc/combine.c
index 9ed03be..bbdddfb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -431,7 +431,7 @@ static int can_combine_p (rtx_insn *, rtx_insn *,
rtx_insn *, rtx_insn *,
 static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int,
rtx *);
 static int contains_muldiv (rtx);
 static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn
*,
-			      int *, rtx_insn *);
+			      int *, rtx_insn *, rtx_insn *);
 static void undo_all (void);
 static void undo_commit (void);
 static rtx *find_split_point (rtx *, rtx_insn *, bool);
@@ -1120,6 +1120,32 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 #endif
   return false;
 }
+
+/* The function checks whether it is possible to use B to set the CC flag
+   in A.  It returns TRUE, if A is a compare (reg1, 0), B is a SINGLE_SET
+   which SET_SRC is reg2, reg1 == reg2, and no other refer of reg1
+   except A and B.  */
+
+static bool
+can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (!seta || !setb)
+    return false;
+
+  if (GET_CODE (SET_SRC (seta)) != COMPARE
+      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+      || !REG_P (XEXP (SET_SRC (seta), 0))
+      || XEXP (SET_SRC (seta), 1) != CONST0_RTX (GET_MODE (SET_SRC (seta)))
+      || !REG_P (SET_SRC (setb))
+      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+    return false;
+
+  return find_reg_note (a, REG_DEAD, XEXP (SET_SRC (seta), 0));
+}
+
 

 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.
@@ -1129,10 +1155,7 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 static int
 combine_instructions (rtx_insn *f, unsigned int nregs)
 {
-  rtx_insn *insn, *next;
-#ifdef HAVE_cc0
-  rtx_insn *prev;
-#endif
+  rtx_insn *insn, *next, *prev;
   struct insn_link *links, *nextlinks;
   rtx_insn *first;
   basic_block last_bb;
@@ -1279,7 +1302,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 	  FOR_EACH_LOG_LINK (links, insn)
 	    if ((next = try_combine (insn, links->insn, NULL,
 				     NULL, &new_direct_jump_p,
-				     last_combined_insn)) != 0)
+				     last_combined_insn, NULL)) != 0)
 	      {
 		statistics_counter_event (cfun, "two-insn combine", 1);
 		goto retry;
@@ -1300,7 +1323,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		FOR_EACH_LOG_LINK (nextlinks, link)
 		  if ((next = try_combine (insn, link, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    {
 		      statistics_counter_event (cfun, "three-insn combine",
1);
 		      goto retry;
@@ -1322,13 +1345,13 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1342,13 +1365,13 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1364,7 +1387,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  && sets_cc0_p (PATTERN (prev))
 		  && (next = try_combine (insn, links->insn,
 					  prev, NULL, &new_direct_jump_p,
-					  last_combined_insn)) != 0)
+					  last_combined_insn, NULL)) != 0)
 		goto retry;
 #endif
 
@@ -1377,7 +1400,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		if ((next = try_combine (insn, links->insn,
 					 nextlinks->insn, NULL,
 					 &new_direct_jump_p,
-					 last_combined_insn)) != 0)
+					 last_combined_insn, NULL)) != 0)
 
 		  {
 		    statistics_counter_event (cfun, "three-insn combine",
1);
@@ -1406,7 +1429,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1417,7 +1440,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1434,7 +1457,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1444,7 +1467,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1452,6 +1475,38 @@ combine_instructions (rtx_insn *f, unsigned int
nregs)
 		  }
 	      }
 
+	  /* Try to combine a compare insn that sets CC
+	     with a preceding insn that can set CC, and maybe with its
+	     logical predecessor as well.
+	     We need this special code because not all data flow connections
+	     get entered in LOG_LINKS.  */
+	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
+	      && BLOCK_FOR_INSN (insn) == BLOCK_FOR_INSN (prev)
+	      && can_reuse_cc_set_p (insn, prev)
+	      && max_combine >= 4)
+	    {
+		struct insn_link *next1;
+		FOR_EACH_LOG_LINK (next1, prev)
+		  {
+		    rtx_insn *link1 = next1->insn;
+		    if (NOTE_P (link1))
+		      continue;
+		    /* I1 -> I2 -> I3; I2 -> insn;
+		       output parallel (insn, I3).  */
+		    FOR_EACH_LOG_LINK (nextlinks, link1)
+		      if ((next = try_combine (prev, link1,
+					       nextlinks->insn, NULL,
+					       &new_direct_jump_p,
+					       last_combined_insn, insn)) !=
0)
+
+			{
+			  delete_insn (insn);
+			  insn = next;
+			  statistics_counter_event (cfun, "four-insn
combine", 1);
+			  goto retry;
+			}
+		  }
+	    }
 	  /* Try this insn with each REG_EQUAL note it links back to.  */
 	  FOR_EACH_LOG_LINK (links, insn)
 	    {
@@ -1477,7 +1532,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		  i2mod_new_rhs = copy_rtx (note);
 		  next = try_combine (insn, i2mod, NULL, NULL,
 				      &new_direct_jump_p,
-				      last_combined_insn);
+				      last_combined_insn, NULL);
 		  i2mod = NULL;
 		  if (next)
 		    {
@@ -2535,11 +2590,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn,
int n)
 
    LAST_COMBINED_INSN is either I3, or some insn after I3 that has
    been I3 passed to an earlier try_combine within the same basic
-   block.  */
+   block.
+
+   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
+   combine with I3 to make a new insn.  */
 
 static rtx_insn *
 try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
-	     int *new_direct_jump_p, rtx_insn *last_combined_insn)
+	     int *new_direct_jump_p, rtx_insn *last_combined_insn,
+	     rtx_insn *to_combined_insn)
 {
   /* New patterns for I3 and I2, respectively.  */
   rtx newpat, newi2pat = 0;
@@ -2632,6 +2691,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
       || (i0 && cant_combine_insn_p (i0))
+      || (to_combined_insn && cant_combine_insn_p (to_combined_insn))
       || likely_spilled_retval_p (i3))
     return 0;
 
@@ -3325,7 +3385,18 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  rtx old = newpat;
 	  total_sets = 1 + extra_sets;
 	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
-	  XVECEXP (newpat, 0, 0) = old;
+
+	  if (to_combined_insn)
+	    /* This is a hack to match i386 instruction pattern, which
+		is like
+		    (parallel [
+			(set (reg:CCZ 17 flags)
+				...)
+			(set ...)})
+		we have to swap the newpat order of I3 and TO_COMBINED_INSN.
*/
+	    XVECEXP (newpat, 0, --total_sets) = old;
+	  else
+	    XVECEXP (newpat, 0, 0) = old;
 	}
 
       if (added_sets_0)
@@ -3348,6 +3419,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)
 	    t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0,
0);
 
+	  if (to_combined_insn)
+	    {
+	      rtx todest = NULL_RTX, tosrc = NULL_RTX;
+	      if (can_combine_p (i2, to_combined_insn, NULL, NULL,
+				 i3, NULL, &todest, &tosrc))
+		{
+		  rtx pat = copy_rtx (PATTERN (to_combined_insn));
+		  t = subst (pat, todest, tosrc, 0, 0, 0);
+		}
+	      else
+		{
+		  undo_all ();
+		  return 0;
+		}
+	    }
 	  XVECEXP (newpat, 0, --total_sets) = t;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..9c08102
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/61225 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-rtl-combine-details" } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*x)
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-rtl-dump "Successfully matched this instruction"
"combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-12  7:27                     ` Zhenqiang Chen
@ 2014-12-12 11:08                       ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2014-12-12 11:08 UTC (permalink / raw)
  To: Zhenqiang Chen; +Cc: 'Jeff Law', gcc-patches

On Fri, Dec 12, 2014 at 03:27:17PM +0800, Zhenqiang Chen wrote:
> > Presumably you're thinking about a PARALLEL that satisfies single_set_p?
> 
> No. It has nothing to do with single_set_p. I just want to reuse the code to
> match the instruction pattern.
> 
> In common, the new PARALLEL is like
> 
>   Parallel
>     newpat from I3
>     newpat from I2 // if have
>     newpat from I1 // if have
>     newpat from I0 // if have
> 
> For to_combined_insn, i0 is NULL and there should have no
> 
>     newpat from I1
> 
> When handling I1->I2->I3, with normal order, it will get
>   Parallel
>     newpat from I3
> 
> After I2-> to_combined_insn, the parallel will be
>   Parallel
>     newpat from I3
>     newpat from to_combined_insn.
> 
> But this can not match the insn pattern. So I swap the order to.
>   Parallel
>     newpat from to_combined_insn.
>     newpat from I3

Maybe I wasn't clear, sorry.  My concern is you only handle a SET as
newpat, not a PARALLEL.  It can be a PARALLEL just fine, even if it
satisfies single_set (it can have a clobber, it can have multiple sets,
all but one dead).


Thanks for the other changes, much appreciated.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fix PR 61225
  2014-12-10 13:47                     ` Segher Boessenkool
@ 2015-01-14 22:05                       ` Jeff Law
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Law @ 2015-01-14 22:05 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Zhenqiang Chen, gcc-patches

On 12/10/14 06:47, Segher Boessenkool wrote:
> On Tue, Dec 09, 2014 at 12:15:30PM -0700, Jeff Law wrote:
>>>> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
>>>> *i1, rtx_insn *i0,
>>>>   	  rtx old = newpat;
>>>>   	  total_sets = 1 + extra_sets;
>>>>   	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
>>>> -	  XVECEXP (newpat, 0, 0) = old;
>>>> +
>>>> +	  if (to_combined_insn)
>>>> +	    XVECEXP (newpat, 0, --total_sets) = old;
>>>> +	  else
>>>> +	    XVECEXP (newpat, 0, 0) = old;
>>>>   	}
>>>
>>> Is this correct?  If so, it needs a big fat comment, because it is
>>> not exactly obvious :-)
>>>
>>> Also, it doesn't handle at all the case where the new pattern already is
>>> a PARALLEL; can that never happen?
>> I'd convinced myself it was.  But yes, a comment here would be good.
>>
>> Presumably you're thinking about a PARALLEL that satisfies single_set_p?
>
> I wasn't thinking about anything in particular; this code does not handle
> a PARALLEL newpat with to_combined_insn correctly, and it doesn't say it
> cannot happen.
It situations like this where I really need to just put the damn patch 
into my tree and fire up the debugger and poke at it for a while.

Regardless, I got mail from Zhenqiang that he left ARM at the start of 
the year for other opportunities and won't be doing GCC work.

My initial thought is to attach his work to date to the BZ, we can use 
it as a starting point if we want to pursue this missed optimization 
further (it's a regression and thus suitable for stage4 if we're so 
inclined).

Thoughts?

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-01-14 21:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21  9:58 [PATCH] Fix PR 61225 Zhenqiang Chen
2014-05-21 12:44 ` Steven Bosscher
2014-05-22  2:01   ` Zhenqiang Chen
2014-05-22  9:52   ` Zhenqiang Chen
2014-07-18  5:05     ` Jeff Law
2014-08-04  8:24       ` Zhenqiang Chen
2014-12-01 22:10         ` Jeff Law
2014-12-01 23:30           ` Eric Botcazou
2014-12-04  8:43           ` Zhenqiang Chen
2014-12-04 20:50             ` Segher Boessenkool
2014-12-04 20:58               ` Segher Boessenkool
2014-12-05 22:36                 ` Jeff Law
2014-12-06  0:09                   ` Segher Boessenkool
2014-12-07  5:03                     ` Segher Boessenkool
2014-12-06  0:22                 ` Segher Boessenkool
2014-12-05 22:32               ` Jeff Law
2014-12-06  0:16                 ` Segher Boessenkool
2014-12-08 19:07                   ` Jeff Law
2014-12-08 21:29             ` Jeff Law
2014-12-09  9:49               ` Zhenqiang Chen
2014-12-09 18:52                 ` Jeff Law
2014-12-09 19:07                 ` Segher Boessenkool
2014-12-09 19:15                   ` Jeff Law
2014-12-10 13:47                     ` Segher Boessenkool
2015-01-14 22:05                       ` Jeff Law
2014-12-12  7:27                     ` Zhenqiang Chen
2014-12-12 11:08                       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).