public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
@ 2015-12-18 13:07 Kyrill Tkachov
  2015-12-18 13:16 ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2015-12-18 13:07 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]

Hi all,

In this PR we have a THEN and an ELSE block where one uses the condition reg from the preceeding comparison
but the other block has an arithmetic operation that clobbers the CC reg.
ifcvt emits the latter first and dead code elimination later sees this and eliminates the first comparison
because it sees that the CC reg is clobbered between the comparison and its usage.
That is it generates something like:
(insn 22 21 73 7 (set (reg:CCGC 17 flags)
         (compare:CCGC (reg/v:SI 88 [ i ])
             (const_int 3 [0x3]))) file.c:19 7 {*cmpsi_1}
      (nil))
(insn 73 22 74 7 (parallel [
             (set (reg:SI 94)
                 (ashift:SI (reg/v:SI 88 [ i ])
                     (const_int 1 [0x1])))
             (clobber (reg:CC 17 flags))
         ]) file.c:20 525 {*ashlsi3_1}
      (nil))
(insn 74 73 75 7 (set (reg/v:SI 93 [ k ])
         (eq:SI (reg:CCGC 17 flags)
             (const_int 0 [0]))) file.c:20 626 {*setcc_si_1_movzbl}
      (nil))
(insn 75 74 76 7 (set (reg:SI 95)
         (plus:SI (ashift:SI (reg/v:SI 93 [ k ])
                 (const_int 2 [0x2]))
             (const_int 8 [0x8]))) file.c:20 212 {*leasi}
      (nil))
(insn 76 75 77 7 (set (reg:CCGC 17 flags)
         (compare:CCGC (reg/v:SI 88 [ i ])
             (const_int 4 [0x4]))) file.c:20 7 {*cmpsi_1}
      (nil))
(insn 77 76 33 7 (set (reg/v:SI 87 [ k ])
         (if_then_else:SI (lt (reg:CCGC 17 flags)
                 (const_int 0 [0]))
             (reg:SI 95)
             (reg:SI 94))) file.c:20 966 {*movsicc_noc}
      (nil))

Here insn 74 uses the condition register that the comparison at insn 22 generates but
insn 73 is inserted in between them and DCE later deletes insn 22.

The proposed solution in this patch is to look at the two basic blocks and record whether
they use or clobber the condition reg and then emit the block that uses the CC register first
(if possible).

With this patch the testcase works fine for me on x86_64 and I see no codegen differences
otherwise for SPEC2006 on aarch64 and x86_64.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill


2015-12-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (cond_exec_get_condition): Rename to...
     (get_condition_from_jump): ... This.
     (cond_exec_process_if_block): Update callsites.
     (dead_or_predicable): Likewise.
     (noce_record_cc_use_in_bb): New function.
     (noce_try_cmove_arith): Check CC reg usage in both blocks
     and emit them in such an order so as not to clobber the CC reg
     before its use, if possible.

2015-12-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.

[-- Attachment #2: ifcvt-cc-reg.patch --]
[-- Type: text/x-patch, Size: 5096 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2e1fe90e28c4a03b8c1287a45e7f12868aa7684e..e4d695ab72d40f4020b21d9982a9bb5a9b2bae83 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -84,7 +84,7 @@ static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
 static int cond_exec_process_insns (ce_if_block *, rtx_insn *, rtx, rtx, int,
 				    int);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx get_condition_from_jump (rtx_insn *);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
 static int noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
@@ -421,7 +421,7 @@ cond_exec_process_insns (ce_if_block *ce_info ATTRIBUTE_UNUSED,
 /* Return the condition for a jump.  Do not do any special processing.  */
 
 static rtx
-cond_exec_get_condition (rtx_insn *jump)
+get_condition_from_jump (rtx_insn *jump)
 {
   rtx test_if, cond;
 
@@ -493,7 +493,7 @@ cond_exec_process_if_block (ce_if_block * ce_info,
 
   /* Find the conditional jump to the ELSE or JOIN part, and isolate
      the test.  */
-  test_expr = cond_exec_get_condition (BB_END (test_bb));
+  test_expr = get_condition_from_jump (BB_END (test_bb));
   if (! test_expr)
     return FALSE;
 
@@ -652,7 +652,7 @@ cond_exec_process_if_block (ce_if_block * ce_info,
 	    goto fail;
 
 	  /* Find the conditional jump and isolate the test.  */
-	  t = cond_exec_get_condition (BB_END (bb));
+	  t = get_condition_from_jump (BB_END (bb));
 	  if (! t)
 	    goto fail;
 
@@ -2017,6 +2017,29 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
   return true;
 }
 
+/* For basic block BB record in USES_CC and CLOBBERS_CC whether it
+   uses and/or clobbers the condition code register CC.  */
+
+static void
+noce_record_cc_use_in_bb (basic_block bb, rtx cc,
+			  bool *uses_cc, bool *clobbers_cc)
+{
+  rtx_insn *insn = NULL;
+  *uses_cc = false;
+  *clobbers_cc = false;
+  if (!cc)
+    return;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!active_insn_p (insn))
+	continue;
+
+      *clobbers_cc |= set_of (cc, PATTERN (insn)) != NULL_RTX;
+      *uses_cc |= reg_mentioned_p (cc, SET_SRC (single_set (insn)));
+    }
+}
+
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -2202,6 +2225,26 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
+  bool then_uses_cc = false;
+  bool then_clobbers_cc = false;
+  bool else_uses_cc = false;
+  bool else_clobbers_cc = false;
+
+  rtx cc = cc_in_cond (if_info->cond);
+  /* If there is no condition code register in cond
+     (noce_get_condition could have replaced it) look into the original
+     jump condition.  */
+  if (!cc)
+    {
+      rtx jmp_cond = get_condition_from_jump (if_info->jump);
+      if (jmp_cond)
+	cc = cc_in_cond (jmp_cond);
+    }
+  if (then_bb)
+    noce_record_cc_use_in_bb (then_bb, cc, &then_uses_cc, &then_clobbers_cc);
+  if (else_bb)
+    noce_record_cc_use_in_bb (else_bb, cc, &else_uses_cc, &else_clobbers_cc);
+
   modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
   if (tmp_b && then_bb)
     {
@@ -2233,8 +2276,10 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* If insn to set up A clobbers any registers B depends on, try to
      swap insn that sets up A with the one that sets up B.  If even
-     that doesn't help, punt.  */
-  if (modified_in_a && !modified_in_b)
+     that doesn't help, punt.  Make sure one basic block doesn't clobber
+     the condition code register before the other uses it.  */
+  if (modified_in_a && !modified_in_b
+      && !(then_uses_cc && else_clobbers_cc))
     {
       if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
@@ -2242,7 +2287,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
     }
-  else if (!modified_in_a)
+  else if (!modified_in_a
+	   && !(else_uses_cc && then_clobbers_cc))
     {
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -5045,7 +5091,7 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb,
 
       rtx cond;
 
-      cond = cond_exec_get_condition (jump);
+      cond = get_condition_from_jump (jump);
       if (! cond)
 	return FALSE;
 
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2015-12-18 13:07 [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other Kyrill Tkachov
@ 2015-12-18 13:16 ` Bernd Schmidt
  2015-12-18 13:57   ` Kyrill Tkachov
  2015-12-21  9:23   ` Kyrill Tkachov
  0 siblings, 2 replies; 20+ messages in thread
From: Bernd Schmidt @ 2015-12-18 13:16 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
> In this PR we have a THEN and an ELSE block where one uses the condition
> reg from the preceeding comparison
> but the other block has an arithmetic operation that clobbers the CC reg.
> ifcvt emits the latter first and dead code elimination later sees this
> and eliminates the first comparison
> because it sees that the CC reg is clobbered between the comparison and
> its usage.

>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>      and emit them in such an order so as not to clobber the CC reg
>      before its use, if possible.

Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the 
function that already tries to sort out issues like this. Does it maybe 
just need to be extended to see clobbers?


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2015-12-18 13:16 ` Bernd Schmidt
@ 2015-12-18 13:57   ` Kyrill Tkachov
  2015-12-21  9:23   ` Kyrill Tkachov
  1 sibling, 0 replies; 20+ messages in thread
From: Kyrill Tkachov @ 2015-12-18 13:57 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 18/12/15 13:16, Bernd Schmidt wrote:
> On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
>> In this PR we have a THEN and an ELSE block where one uses the condition
>> reg from the preceeding comparison
>> but the other block has an arithmetic operation that clobbers the CC reg.
>> ifcvt emits the latter first and dead code elimination later sees this
>> and eliminates the first comparison
>> because it sees that the CC reg is clobbered between the comparison and
>> its usage.
>
>>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>>      and emit them in such an order so as not to clobber the CC reg
>>      before its use, if possible.
>
> Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the function that already tries to sort out issues like this. Does it maybe just need to be extended to see clobbers?
>

I can do that. A prototype of that approach works for this testacase.
I'll test more extensively...

Thanks,
Kyrill

>
> Bernd
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2015-12-18 13:16 ` Bernd Schmidt
  2015-12-18 13:57   ` Kyrill Tkachov
@ 2015-12-21  9:23   ` Kyrill Tkachov
  2016-01-05 12:04     ` Kyrill Tkachov
  2016-01-05 13:42     ` Bernd Schmidt
  1 sibling, 2 replies; 20+ messages in thread
From: Kyrill Tkachov @ 2015-12-21  9:23 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]


On 18/12/15 13:16, Bernd Schmidt wrote:
> On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
>> In this PR we have a THEN and an ELSE block where one uses the condition
>> reg from the preceeding comparison
>> but the other block has an arithmetic operation that clobbers the CC reg.
>> ifcvt emits the latter first and dead code elimination later sees this
>> and eliminates the first comparison
>> because it sees that the CC reg is clobbered between the comparison and
>> its usage.
>
>>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>>      and emit them in such an order so as not to clobber the CC reg
>>      before its use, if possible.
>
> Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the function that already tries to sort out issues like this. Does it maybe just need to be extended to see clobbers?
>

Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
We might as well do it there since it already iterates over all the instructions in the basic blocks.

Bootstrapped and tested on arm, aarch64, x86_64 and checked that there are no codegen effects on SPEC2006.
How does this look?

Thanks,
Kyrill

2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (cond_exec_get_condition): Rename to...
     (get_condition_from_jump): ... This.
     (cond_exec_process_if_block): Update callsites.
     (dead_or_predicable): Likewise.
     (bbs_ok_for_cmove_arith): Update to record use and clobbers
     of the CC register.
     (noce_try_cmove_arith): Check CC reg usage in both blocks
     and emit them in such an order so as not to clobber the CC reg
     before its use, if possible.

2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.

[-- Attachment #2: ifcvt-cc-reg.patch --]
[-- Type: text/x-patch, Size: 6533 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2e1fe90e28c4a03b8c1287a45e7f12868aa7684e..c71767da3c76b5e8d4ec58cee934e5d064cfbee4 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -84,7 +84,7 @@ static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
 static int cond_exec_process_insns (ce_if_block *, rtx_insn *, rtx, rtx, int,
 				    int);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx get_condition_from_jump (rtx_insn *);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
 static int noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
@@ -421,7 +421,7 @@ cond_exec_process_insns (ce_if_block *ce_info ATTRIBUTE_UNUSED,
 /* Return the condition for a jump.  Do not do any special processing.  */
 
 static rtx
-cond_exec_get_condition (rtx_insn *jump)
+get_condition_from_jump (rtx_insn *jump)
 {
   rtx test_if, cond;
 
@@ -493,7 +493,7 @@ cond_exec_process_if_block (ce_if_block * ce_info,
 
   /* Find the conditional jump to the ELSE or JOIN part, and isolate
      the test.  */
-  test_expr = cond_exec_get_condition (BB_END (test_bb));
+  test_expr = get_condition_from_jump (BB_END (test_bb));
   if (! test_expr)
     return FALSE;
 
@@ -652,7 +652,7 @@ cond_exec_process_if_block (ce_if_block * ce_info,
 	    goto fail;
 
 	  /* Find the conditional jump and isolate the test.  */
-	  t = cond_exec_get_condition (BB_END (bb));
+	  t = get_condition_from_jump (BB_END (bb));
 	  if (! t)
 	    goto fail;
 
@@ -1897,10 +1897,13 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 
 
 /* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+   get used in BB_B.  Set A_CLOBBERS_CC_FOR_B to true if basic block BB_A
+   clobbers condition the code register CC and bb_b uses CC.  Set it
+   to false otherwise.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b,
+			 rtx cc, bool *a_clobbers_cc_for_b)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1908,6 +1911,10 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
   df_ref def;
   df_ref use;
 
+  bool a_clobbers_cc = false;
+  bool b_uses_cc = false;
+  *a_clobbers_cc_for_b = false;
+
   FOR_BB_INSNS (bb_a, a_insn)
     {
       if (!active_insn_p (a_insn))
@@ -1921,6 +1928,9 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  return false;
 	}
 
+      if (cc)
+	a_clobbers_cc |= set_of (cc, PATTERN (a_insn)) != NULL_RTX;
+
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
 	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
@@ -1949,11 +1959,15 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  return false;
 	}
 
+      if (cc)
+	b_uses_cc |= reg_mentioned_p (cc, SET_SRC (sset_b));
+
       /* If the insn uses a reg set in BB_A return false.  */
       FOR_EACH_INSN_USE (use, b_insn)
 	{
 	  if (bitmap_bit_p (bba_sets, DF_REF_REGNO (use)))
 	    {
+	     *a_clobbers_cc_for_b = a_clobbers_cc && b_uses_cc;
 	      BITMAP_FREE (bba_sets);
 	      return false;
 	    }
@@ -1961,6 +1975,7 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 
     }
 
+  *a_clobbers_cc_for_b = a_clobbers_cc && b_uses_cc;
   BITMAP_FREE (bba_sets);
   return true;
 }
@@ -2112,10 +2127,31 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
-    return FALSE;
+  rtx cc = cc_in_cond (if_info->cond);
+  /* If there is no condition code register in cond
+     (noce_get_condition could have replaced it) look into the original
+     jump condition.  */
+  if (!cc)
+    {
+      rtx jmp_cond = get_condition_from_jump (if_info->jump);
+      if (jmp_cond)
+	cc = cc_in_cond (jmp_cond);
+    }
+
+  bool then_clobbers_cc_use_in_else = false;
+  bool else_clobbers_cc_use_in_then = false;
+
+  if (then_bb && else_bb)
+    {
+      bool bbs_ok = bbs_ok_for_cmove_arith (then_bb, else_bb, cc,
+				    &then_clobbers_cc_use_in_else);
+      bbs_ok &= bbs_ok_for_cmove_arith (else_bb, then_bb, cc,
+				      &else_clobbers_cc_use_in_then);
+      /* If one or more of the blocks is simple then the conflict is
+	  on X, which will be replaced with a temp, so we allow it.  */
+      if (!bbs_ok && !a_simple && !b_simple)
+	return FALSE;
+    }
 
   start_sequence ();
 
@@ -2233,8 +2269,10 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* If insn to set up A clobbers any registers B depends on, try to
      swap insn that sets up A with the one that sets up B.  If even
-     that doesn't help, punt.  */
-  if (modified_in_a && !modified_in_b)
+     that doesn't help, punt.  Make sure one basic block doesn't clobber
+     the condition code register before the other uses it.  */
+  if (modified_in_a && !modified_in_b
+      && !else_clobbers_cc_use_in_then)
     {
       if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
@@ -2242,7 +2280,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
     }
-  else if (!modified_in_a)
+  else if (!modified_in_a
+	   && !then_clobbers_cc_use_in_else)
     {
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -5045,7 +5084,7 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb,
 
       rtx cond;
 
-      cond = cond_exec_get_condition (jump);
+      cond = get_condition_from_jump (jump);
       if (! cond)
 	return FALSE;
 
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2015-12-21  9:23   ` Kyrill Tkachov
@ 2016-01-05 12:04     ` Kyrill Tkachov
  2016-01-05 13:42     ` Bernd Schmidt
  1 sibling, 0 replies; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-05 12:04 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01966.html

Thanks,
Kyrill

On 21/12/15 09:23, Kyrill Tkachov wrote:
>
> On 18/12/15 13:16, Bernd Schmidt wrote:
>> On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
>>> In this PR we have a THEN and an ELSE block where one uses the condition
>>> reg from the preceeding comparison
>>> but the other block has an arithmetic operation that clobbers the CC reg.
>>> ifcvt emits the latter first and dead code elimination later sees this
>>> and eliminates the first comparison
>>> because it sees that the CC reg is clobbered between the comparison and
>>> its usage.
>>
>>>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>>>      and emit them in such an order so as not to clobber the CC reg
>>>      before its use, if possible.
>>
>> Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the function that already tries to sort out issues like this. Does it maybe just need to be extended to see clobbers?
>>
>
> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
> We might as well do it there since it already iterates over all the instructions in the basic blocks.
>
> Bootstrapped and tested on arm, aarch64, x86_64 and checked that there are no codegen effects on SPEC2006.
> How does this look?
>
> Thanks,
> Kyrill
>
> 2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR rtl-optimization/68841
>     * ifcvt.c (cond_exec_get_condition): Rename to...
>     (get_condition_from_jump): ... This.
>     (cond_exec_process_if_block): Update callsites.
>     (dead_or_predicable): Likewise.
>     (bbs_ok_for_cmove_arith): Update to record use and clobbers
>     of the CC register.
>     (noce_try_cmove_arith): Check CC reg usage in both blocks
>     and emit them in such an order so as not to clobber the CC reg
>     before its use, if possible.
>
> 2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR rtl-optimization/68841
>     * gcc.dg/pr68841.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2015-12-21  9:23   ` Kyrill Tkachov
  2016-01-05 12:04     ` Kyrill Tkachov
@ 2016-01-05 13:42     ` Bernd Schmidt
  2016-01-05 14:22       ` Kyrill Tkachov
  1 sibling, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-05 13:42 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 12/21/2015 10:23 AM, Kyrill Tkachov wrote:
>
> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
> We might as well do it there since it already iterates over all the
> instructions in the basic blocks.

The patch looks somewhat complicated and asymmetrical to me. I tried to 
debug the issue, and found that bbs_ok_for_cmove_arith wasn't called at 
all. After changing that, like this:

@@ -2082,7 +2082,7 @@ noce_try_cmove_arith (struct noce_if_inf
  	}
      }

-  if (then_bb && else_bb && !a_simple && !b_simple
+  if (then_bb && else_bb
        && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
  	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
      return FALSE;

the testcase passes. Could you investigate whether this is sufficient?

>      * ifcvt.c (cond_exec_get_condition): Rename to...
>      (get_condition_from_jump): ... This.

Please don't.

> +/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */

That's a very specific set of options there, I wonder how Zdenek even 
found that. Maybe we also want a copy in torture/ to test it more broadly?


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-05 13:42     ` Bernd Schmidt
@ 2016-01-05 14:22       ` Kyrill Tkachov
  2016-01-05 16:34         ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-05 14:22 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 05/01/16 13:42, Bernd Schmidt wrote:
> On 12/21/2015 10:23 AM, Kyrill Tkachov wrote:
>>
>> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
>> We might as well do it there since it already iterates over all the
>> instructions in the basic blocks.
>
> The patch looks somewhat complicated and asymmetrical to me. I tried to debug the issue, and found that bbs_ok_for_cmove_arith wasn't called at all. After changing that, like this:
>
> @@ -2082,7 +2082,7 @@ noce_try_cmove_arith (struct noce_if_inf
>      }
>      }
>
> -  if (then_bb && else_bb && !a_simple && !b_simple
> +  if (then_bb && else_bb
>        && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
>        || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
>      return FALSE;
>
> the testcase passes. Could you investigate whether this is sufficient?
>

This works around the issue but we don't want to do perform the check for pairs of
simple basic blocks because then we'll end up rejecting code that does things like:
x = cond ? x + 1 : x - 1
i.e. source of the set in both blocks reads and writes the same register.
We can deal with this safely later on in the function since we rename the destinations
of the two sets, so we don't want to reject this case here.
So your proposal rejects this case on that basis, which is overly restrictive and doesn't
deal with the underlying issue of the condition code being clobbered.
I had tried this approach initially but it caused code quality regressions on SPEC.

>>      * ifcvt.c (cond_exec_get_condition): Rename to...
>>      (get_condition_from_jump): ... This.
>
> Please don't.
>

Ok, I'll leave it as cond_exec_get_condition

>> +/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
>
> That's a very specific set of options there, I wonder how Zdenek even found that. Maybe we also want a copy in torture/ to test it more broadly?
>

Judging from some other reports from Zdenek that I've seen I think he tries a matrix of options, getting some
of the exotic combinations.
So should we have a copy in gcc.dg/ with the explicit options and a clean copy in torture?

Thanks,
Kyrill

>
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-05 14:22       ` Kyrill Tkachov
@ 2016-01-05 16:34         ` Bernd Schmidt
  2016-01-05 17:06           ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-05 16:34 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

On 01/05/2016 03:22 PM, Kyrill Tkachov wrote:
>
> This works around the issue but we don't want to do perform the check
> for pairs of
> simple basic blocks because then we'll end up rejecting code that does
> things like:
> x = cond ? x + 1 : x - 1
> i.e. source of the set in both blocks reads and writes the same register.
> We can deal with this safely later on in the function since we rename
> the destinations
> of the two sets, so we don't want to reject this case here.

So we need to teach bbs_ok_for_cmove_arith that this is going to happen. 
How about the approach below? Still seems to fix the issue, and it looks 
like the CC set is present in the df info so everything should work as 
intended. Right?


Bernd

[-- Attachment #2: 66841a.diff --]
[-- Type: text/x-patch, Size: 1678 bytes --]

Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 232056)
+++ gcc/ifcvt.c	(working copy)
@@ -1866,11 +1866,13 @@ insn_valid_noce_process_p (rtx_insn *ins
 }
 
 
-/* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+/* Return true iff the registers that the insns in BB_A set do not get
+   used in BB_B.  WILL_RENAME is true if we expect that BB_A consists
+   of a single single_set insn with a destination that the caller will
+   rename to a new pseudo.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b, bool will_rename)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1887,13 +1889,15 @@ bbs_ok_for_cmove_arith (basic_block bb_a
 
       if (!sset_a)
 	{
+	  gcc_assert (!will_rename);
 	  BITMAP_FREE (bba_sets);
 	  return false;
 	}
 
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
-	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
+	if (!will_rename || DF_REF_REG (def) != SET_DEST (sset_a))
+	  bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
     }
 
   rtx_insn *b_insn;
@@ -2082,9 +2086,9 @@ noce_try_cmove_arith (struct noce_if_inf
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+  if (then_bb && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb, a_simple)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb, b_simple)))
     return FALSE;
 
   start_sequence ();

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-05 16:34         ` Bernd Schmidt
@ 2016-01-05 17:06           ` Kyrill Tkachov
  2016-01-05 18:19             ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-05 17:06 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 05/01/16 16:34, Bernd Schmidt wrote:
> On 01/05/2016 03:22 PM, Kyrill Tkachov wrote:
>>
>> This works around the issue but we don't want to do perform the check
>> for pairs of
>> simple basic blocks because then we'll end up rejecting code that does
>> things like:
>> x = cond ? x + 1 : x - 1
>> i.e. source of the set in both blocks reads and writes the same register.
>> We can deal with this safely later on in the function since we rename
>> the destinations
>> of the two sets, so we don't want to reject this case here.
>
> So we need to teach bbs_ok_for_cmove_arith that this is going to happen. How about the approach below? Still seems to fix the issue, and it looks like the CC set is present in the df info so everything should work as intended. Right?
>

Yeah, this looks like it works.
However, now we reject if-conversion whereas with my patch we still tried switching the order in which
the blocks were emitted, which allowed for a bit more aggressive if-conversion.
I don't know if this approach is overly restrictive yet.
I'll try its effects on codegen quality on SPEC as soon as I get some cycles.
But this approach does look appealing to me.

Thanks for the help,
Kyrill

>
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-05 17:06           ` Kyrill Tkachov
@ 2016-01-05 18:19             ` Kyrill Tkachov
  2016-01-06 10:36               ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-05 18:19 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 05/01/16 17:06, Kyrill Tkachov wrote:
>
> On 05/01/16 16:34, Bernd Schmidt wrote:
>> On 01/05/2016 03:22 PM, Kyrill Tkachov wrote:
>>>
>>> This works around the issue but we don't want to do perform the check
>>> for pairs of
>>> simple basic blocks because then we'll end up rejecting code that does
>>> things like:
>>> x = cond ? x + 1 : x - 1
>>> i.e. source of the set in both blocks reads and writes the same register.
>>> We can deal with this safely later on in the function since we rename
>>> the destinations
>>> of the two sets, so we don't want to reject this case here.
>>
>> So we need to teach bbs_ok_for_cmove_arith that this is going to happen. How about the approach below? Still seems to fix the issue, and it looks like the CC set is present in the df info so everything should work as intended. Right?
>>
>
> Yeah, this looks like it works.
> However, now we reject if-conversion whereas with my patch we still tried switching the order in which
> the blocks were emitted, which allowed for a bit more aggressive if-conversion.
> I don't know if this approach is overly restrictive yet.
> I'll try its effects on codegen quality on SPEC as soon as I get some cycles.
> But this approach does look appealing to me.
>

Hmm, from a first look at SPEC, it seems to still overly restrict ifconversion in the
x = cond ? x + 1 : x - 1 case.
I'll look deeper tomorrow as to what's going on there.

Kyrill

> Thanks for the help,
> Kyrill
>
>>
>> Bernd
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-05 18:19             ` Kyrill Tkachov
@ 2016-01-06 10:36               ` Kyrill Tkachov
  2016-01-07  9:22                 ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-06 10:36 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

Hi Bernd,

On 05/01/16 18:19, Kyrill Tkachov wrote:
>
> On 05/01/16 17:06, Kyrill Tkachov wrote:
>>
>> On 05/01/16 16:34, Bernd Schmidt wrote:
>>> On 01/05/2016 03:22 PM, Kyrill Tkachov wrote:
>>>>
>>>> This works around the issue but we don't want to do perform the check
>>>> for pairs of
>>>> simple basic blocks because then we'll end up rejecting code that does
>>>> things like:
>>>> x = cond ? x + 1 : x - 1
>>>> i.e. source of the set in both blocks reads and writes the same register.
>>>> We can deal with this safely later on in the function since we rename
>>>> the destinations
>>>> of the two sets, so we don't want to reject this case here.
>>>
>>> So we need to teach bbs_ok_for_cmove_arith that this is going to happen. How about the approach below? Still seems to fix the issue, and it looks like the CC set is present in the df info so everything should work as intended. Right?
>>>
>>
>> Yeah, this looks like it works.
>> However, now we reject if-conversion whereas with my patch we still tried switching the order in which
>> the blocks were emitted, which allowed for a bit more aggressive if-conversion.
>> I don't know if this approach is overly restrictive yet.
>> I'll try its effects on codegen quality on SPEC as soon as I get some cycles.
>> But this approach does look appealing to me.
>>
>
> Hmm, from a first look at SPEC, it seems to still overly restrict ifconversion in the
> x = cond ? x + 1 : x - 1 case.
> I'll look deeper tomorrow as to what's going on there.
>

Ok, found the problem.
bbs_ok_for_cmove_arith also checks that we don't perform any stores in the basic block, which kills opportunities
to convert operations of the form [addr] = c ? a : b. Since with your patch we now call this even for
simple basic blocks we miss these opportunities.

bb_valid_for_noce_process_p called earlier should have already ensured that for complex
blocks we allow only the last set to be a store, so we should be able to relax the restriction in
bbs_ok_for_cmove_arith. That change in combination with your patch fixes the testcase and has no code quality
fallout that I can see (it even slightly increases if-conversion opportunities). I'll test more thoroughly
and post a patch in due time.

Thanks,
Kyrill

> Kyrill
>
>> Thanks for the help,
>> Kyrill
>>
>>>
>>> Bernd
>>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-06 10:36               ` Kyrill Tkachov
@ 2016-01-07  9:22                 ` Kyrill Tkachov
  2016-01-07 11:52                   ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-07  9:22 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3660 bytes --]

Hi Bernd,

On 06/01/16 10:36, Kyrill Tkachov wrote:
> Hi Bernd,
>
> On 05/01/16 18:19, Kyrill Tkachov wrote:
>>
>> On 05/01/16 17:06, Kyrill Tkachov wrote:
>>>
>>> On 05/01/16 16:34, Bernd Schmidt wrote:
>>>> On 01/05/2016 03:22 PM, Kyrill Tkachov wrote:
>>>>>
>>>>> This works around the issue but we don't want to do perform the check
>>>>> for pairs of
>>>>> simple basic blocks because then we'll end up rejecting code that does
>>>>> things like:
>>>>> x = cond ? x + 1 : x - 1
>>>>> i.e. source of the set in both blocks reads and writes the same register.
>>>>> We can deal with this safely later on in the function since we rename
>>>>> the destinations
>>>>> of the two sets, so we don't want to reject this case here.
>>>>
>>>> So we need to teach bbs_ok_for_cmove_arith that this is going to happen. How about the approach below? Still seems to fix the issue, and it looks like the CC set is present in the df info so everything should work as intended. Right?
>>>>
>>>
>>> Yeah, this looks like it works.
>>> However, now we reject if-conversion whereas with my patch we still tried switching the order in which
>>> the blocks were emitted, which allowed for a bit more aggressive if-conversion.
>>> I don't know if this approach is overly restrictive yet.
>>> I'll try its effects on codegen quality on SPEC as soon as I get some cycles.
>>> But this approach does look appealing to me.
>>>
>>
>> Hmm, from a first look at SPEC, it seems to still overly restrict ifconversion in the
>> x = cond ? x + 1 : x - 1 case.
>> I'll look deeper tomorrow as to what's going on there.
>>
>
> Ok, found the problem.
> bbs_ok_for_cmove_arith also checks that we don't perform any stores in the basic block, which kills opportunities
> to convert operations of the form [addr] = c ? a : b. Since with your patch we now call this even for
> simple basic blocks we miss these opportunities.
>
> bb_valid_for_noce_process_p called earlier should have already ensured that for complex
> blocks we allow only the last set to be a store, so we should be able to relax the restriction in
> bbs_ok_for_cmove_arith. That change in combination with your patch fixes the testcase and has no code quality
> fallout that I can see (it even slightly increases if-conversion opportunities). I'll test more thoroughly
> and post a patch in due time.
>


And here is the updated patch.
It is a modified version of the patch with the restriction on stores in the basic block lifted.
Also I chose instead to pass the register 'x' on which we should not be recording conflicts
rather than a boolean will_remain.

I found if we don't do this we will pessimise cases where
one basic block is something like:
t1 = x + y;
x = t1 + 2;

and the other is something like:
x = x + 1.

We want to ignore the conflict on x in both invocations of bbs_ok_for_cmove_arith.

With this patch the testcase passes and there is no codegen fallout on SPEC.
Bootstrapped and tested on arm, aarch64, x86_64.

How does this look?

Thanks,
Kyrill

2016-01-06  Bernd Schmidt  <bschmidt@redhat.com>
             Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (bbs_ok_for_cmove_arith): Add to_rename parameter.
     Don't record conflicts on to_rename if it's present.
     Allow memory destinations in sets.
     (noce_try_cmove_arith): Call bbs_ok_for_cmove_arith even on simple
     blocks.

2016-01-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.
     * gcc.c-torture/execute/pr68841.c: New test.

> Thanks,
> Kyrill
>
>> Kyrill
>>
>>> Thanks for the help,
>>> Kyrill
>>>
>>>>
>>>> Bernd
>>>
>>
>


[-- Attachment #2: ifcvt-tmp.patch --]
[-- Type: text/x-patch, Size: 3812 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 5812ce30151ed7425780890c66e7763f5758df7e..96957aac5481acbc7ac5f186a653fe63ad3e5f51 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1896,11 +1896,13 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 }
 
 
-/* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+/* Return true iff the registers that the insns in BB_A set do not get
+   used in BB_B.  If TO_RENAME is non NULL then it is a REG that will be
+   renamed later by the caller and so conflicts on it should be ignored
+   in this function.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b, rtx to_rename)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1920,10 +1922,10 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  BITMAP_FREE (bba_sets);
 	  return false;
 	}
-
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
-	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
+	if (!(to_rename && DF_REF_REG (def) == to_rename))
+	  bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
     }
 
   rtx_insn *b_insn;
@@ -1942,8 +1944,12 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	}
 
       /* Make sure this is a REG and not some instance
-	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
-      if (!REG_P (SET_DEST (sset_b)))
+	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.
+	 If we have a memory destination then we have a pair of simple
+	 basic blocks performing an operation of the form [addr] = c ? a : b.
+	 bb_valid_for_noce_process_p will have ensured that these are
+	 the only stores present.  */
+      if (!REG_P (SET_DEST (sset_b)) && !MEM_P (SET_DEST (sset_b)))
 	{
 	  BITMAP_FREE (bba_sets);
 	  return false;
@@ -2112,9 +2118,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+  if (then_bb && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb, x)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb, x)))
     return FALSE;
 
   start_sequence ();
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68841.c b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..15a27e7dc382d97398ca05427f431f5ecd3b89da
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
@@ -0,0 +1,31 @@
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-07  9:22                 ` Kyrill Tkachov
@ 2016-01-07 11:52                   ` Bernd Schmidt
  2016-01-07 13:45                     ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-07 11:52 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 01/07/2016 10:22 AM, Kyrill Tkachov wrote:
> +/* Return true iff the registers that the insns in BB_A set do not get
> +   used in BB_B.  If TO_RENAME is non NULL then it is a REG that will be
> +   renamed later by the caller and so conflicts on it should be ignored
> +   in this function.  */

Typical spelling is "non-NULL".

> @@ -1942,8 +1944,12 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
>   	}
>
>         /* Make sure this is a REG and not some instance
> -	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
> -      if (!REG_P (SET_DEST (sset_b)))
> +	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.
> +	 If we have a memory destination then we have a pair of simple
> +	 basic blocks performing an operation of the form [addr] = c ? a : b.
> +	 bb_valid_for_noce_process_p will have ensured that these are
> +	 the only stores present.  */

I looked and this is indeed the case. Still slightly scary. Should the 
MEM be rtx_equal_p to TO_RENAME? It would be good to test or at least 
assert this.


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-07 11:52                   ` Bernd Schmidt
@ 2016-01-07 13:45                     ` Kyrill Tkachov
  2016-01-07 13:47                       ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-07 13:45 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 07/01/16 11:52, Bernd Schmidt wrote:
> On 01/07/2016 10:22 AM, Kyrill Tkachov wrote:
>> +/* Return true iff the registers that the insns in BB_A set do not get
>> +   used in BB_B.  If TO_RENAME is non NULL then it is a REG that will be
>> +   renamed later by the caller and so conflicts on it should be ignored
>> +   in this function.  */
>
> Typical spelling is "non-NULL".
>
>> @@ -1942,8 +1944,12 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
>>       }
>>
>>         /* Make sure this is a REG and not some instance
>> -     of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
>> -      if (!REG_P (SET_DEST (sset_b)))
>> +     of ZERO_EXTRACT or SUBREG or other dangerous stuff.
>> +     If we have a memory destination then we have a pair of simple
>> +     basic blocks performing an operation of the form [addr] = c ? a : b.
>> +     bb_valid_for_noce_process_p will have ensured that these are
>> +     the only stores present.  */
>
> I looked and this is indeed the case. Still slightly scary. Should the MEM be rtx_equal_p to TO_RENAME? It would be good to test or at least assert this.
>

I tried asserting that and it caused trouble because a bit further up in noce_process_if_block it does:
   /* Only operate on register destinations, and even then avoid extending
      the lifetime of hard registers on small register class machines.  */
   orig_x = x;
   if (!REG_P (x)
       || (HARD_REGISTER_P (x)
       && targetm.small_register_classes_for_mode_p (GET_MODE (x))))
     {
       if (GET_MODE (x) == BLKmode)
         return FALSE;

       if (GET_CODE (x) == ZERO_EXTRACT
           && (!CONST_INT_P (XEXP (x, 1))
               || !CONST_INT_P (XEXP (x, 2))))
         return FALSE;

       x = gen_reg_rtx (GET_MODE (GET_CODE (x) == STRICT_LOW_PART
                  ? XEXP (x, 0) : x));
     }

It changes X to a register and after noce_try_cmove_arith it emits the store.
So, while the basic blocks that we inspect in bbs_ok_for_cmove_arith contain memory
destinations, the move to x will be a register move.
move will be performed t

Kyrill

>
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-07 13:45                     ` Kyrill Tkachov
@ 2016-01-07 13:47                       ` Bernd Schmidt
  2016-01-07 14:00                         ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-07 13:47 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 01/07/2016 02:45 PM, Kyrill Tkachov wrote:
>
> On 07/01/16 11:52, Bernd Schmidt wrote:
>> I looked and this is indeed the case. Still slightly scary. Should the
>> MEM be rtx_equal_p to TO_RENAME? It would be good to test or at least
>> assert this.
>>
>
> I tried asserting that and it caused trouble because a bit further up in
> noce_process_if_block it does:
>    /* Only operate on register destinations, and even then avoid extending
>       the lifetime of hard registers on small register class machines.  */
>    orig_x = x;
>    if (!REG_P (x)
>        || (HARD_REGISTER_P (x)
>        && targetm.small_register_classes_for_mode_p (GET_MODE (x))))
>      {
>        if (GET_MODE (x) == BLKmode)
>          return FALSE;
>
>        if (GET_CODE (x) == ZERO_EXTRACT
>            && (!CONST_INT_P (XEXP (x, 1))
>                || !CONST_INT_P (XEXP (x, 2))))
>          return FALSE;
>
>        x = gen_reg_rtx (GET_MODE (GET_CODE (x) == STRICT_LOW_PART
>                   ? XEXP (x, 0) : x));
>      }
>
> It changes X to a register and after noce_try_cmove_arith it emits the
> store.

This suggests that orig_x needs to be passed to bbs_ok_for_cmove_arith, 
even disregarding the question of using it to assert that only expected 
MEMs are being modified.


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-07 13:47                       ` Bernd Schmidt
@ 2016-01-07 14:00                         ` Kyrill Tkachov
  2016-01-08 13:12                           ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-07 14:00 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches


On 07/01/16 13:47, Bernd Schmidt wrote:
> On 01/07/2016 02:45 PM, Kyrill Tkachov wrote:
>>
>> On 07/01/16 11:52, Bernd Schmidt wrote:
>>> I looked and this is indeed the case. Still slightly scary. Should the
>>> MEM be rtx_equal_p to TO_RENAME? It would be good to test or at least
>>> assert this.
>>>
>>
>> I tried asserting that and it caused trouble because a bit further up in
>> noce_process_if_block it does:
>>    /* Only operate on register destinations, and even then avoid extending
>>       the lifetime of hard registers on small register class machines.  */
>>    orig_x = x;
>>    if (!REG_P (x)
>>        || (HARD_REGISTER_P (x)
>>        && targetm.small_register_classes_for_mode_p (GET_MODE (x))))
>>      {
>>        if (GET_MODE (x) == BLKmode)
>>          return FALSE;
>>
>>        if (GET_CODE (x) == ZERO_EXTRACT
>>            && (!CONST_INT_P (XEXP (x, 1))
>>                || !CONST_INT_P (XEXP (x, 2))))
>>          return FALSE;
>>
>>        x = gen_reg_rtx (GET_MODE (GET_CODE (x) == STRICT_LOW_PART
>>                   ? XEXP (x, 0) : x));
>>      }
>>
>> It changes X to a register and after noce_try_cmove_arith it emits the
>> store.
>
> This suggests that orig_x needs to be passed to bbs_ok_for_cmove_arith, even disregarding the question of using it to assert that only expected MEMs are being modified.
>

Yes, which should be the original destinations of insn_a and insn_b of if_info.
I'll work on that. Though if we execute the above path then x will be a fresh new pseudo and so
will not cause any conflicts anyway.

Kyrill

>
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-07 14:00                         ` Kyrill Tkachov
@ 2016-01-08 13:12                           ` Kyrill Tkachov
  2016-01-08 13:23                             ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-08 13:12 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]


On 07/01/16 14:00, Kyrill Tkachov wrote:
>
> On 07/01/16 13:47, Bernd Schmidt wrote:
>> On 01/07/2016 02:45 PM, Kyrill Tkachov wrote:
>>>
>>> On 07/01/16 11:52, Bernd Schmidt wrote:
>>>> I looked and this is indeed the case. Still slightly scary. Should the
>>>> MEM be rtx_equal_p to TO_RENAME? It would be good to test or at least
>>>> assert this.
>>>>
>>>
>>> I tried asserting that and it caused trouble because a bit further up in
>>> noce_process_if_block it does:
>>>    /* Only operate on register destinations, and even then avoid extending
>>>       the lifetime of hard registers on small register class machines.  */
>>>    orig_x = x;
>>>    if (!REG_P (x)
>>>        || (HARD_REGISTER_P (x)
>>>        && targetm.small_register_classes_for_mode_p (GET_MODE (x))))
>>>      {
>>>        if (GET_MODE (x) == BLKmode)
>>>          return FALSE;
>>>
>>>        if (GET_CODE (x) == ZERO_EXTRACT
>>>            && (!CONST_INT_P (XEXP (x, 1))
>>>                || !CONST_INT_P (XEXP (x, 2))))
>>>          return FALSE;
>>>
>>>        x = gen_reg_rtx (GET_MODE (GET_CODE (x) == STRICT_LOW_PART
>>>                   ? XEXP (x, 0) : x));
>>>      }
>>>
>>> It changes X to a register and after noce_try_cmove_arith it emits the
>>> store.
>>
>> This suggests that orig_x needs to be passed to bbs_ok_for_cmove_arith, even disregarding the question of using it to assert that only expected MEMs are being modified.
>>
>
> Yes, which should be the original destinations of insn_a and insn_b of if_info.
> I'll work on that. Though if we execute the above path then x will be a fresh new pseudo and so
> will not cause any conflicts anyway.
>

Here is the latest version.
I passed down the original set destination to bbs_ok_for_cmove_arith and
asserted that it's the only memory store if a store occurs.

Bootstrapped and tested on arm, aarch64, x86_64.

How's this?

Thanks,
Kyrill

2016-01-08  Bernd Schmidt  <bschmidt@redhat.com>
             Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (bbs_ok_for_cmove_arith): Add to_rename parameter.
     Don't record conflicts on to_rename if it's present.
     Allow memory destinations in sets.
     (noce_try_cmove_arith): Call bbs_ok_for_cmove_arith even on simple
     blocks.

2016-01-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.
     * gcc.c-torture/execute/pr68841.c: New test.


[-- Attachment #2: ifcvt-tmp.patch --]
[-- Type: text/x-patch, Size: 4193 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 5812ce30151ed7425780890c66e7763f5758df7e..50fe69d13294ca1ddbac17cccecf436f7f9d231a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1896,11 +1896,13 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 }
 
 
-/* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+/* Return true iff the registers that the insns in BB_A set do not get
+   used in BB_B.  If TO_RENAME is non-NULL then it is a location that will be
+   renamed later by the caller and so conflicts on it should be ignored
+   in this function.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b, rtx to_rename)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1920,10 +1922,10 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  BITMAP_FREE (bba_sets);
 	  return false;
 	}
-
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
-	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
+	if (!(to_rename && DF_REF_REG (def) == to_rename))
+	  bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
     }
 
   rtx_insn *b_insn;
@@ -1942,8 +1944,15 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	}
 
       /* Make sure this is a REG and not some instance
-	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
-      if (!REG_P (SET_DEST (sset_b)))
+	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.
+	 If we have a memory destination then we have a pair of simple
+	 basic blocks performing an operation of the form [addr] = c ? a : b.
+	 bb_valid_for_noce_process_p will have ensured that these are
+	 the only stores present.  In that case [addr] should be the location
+	 to be renamed.  Assert that the callers set this up properly.  */
+      if (MEM_P (SET_DEST (sset_b)))
+	gcc_assert (rtx_equal_p (SET_DEST (sset_b), to_rename));
+      else if (!REG_P (SET_DEST (sset_b)))
 	{
 	  BITMAP_FREE (bba_sets);
 	  return false;
@@ -2112,9 +2121,16 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+  rtx orig_x = x;
+  if (then_bb && else_bb)
+    {
+      orig_x = SET_DEST (single_set (insn_a));
+      gcc_assert (rtx_equal_p (orig_x, SET_DEST (single_set (insn_b))));
+    }
+
+  if (then_bb && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb, orig_x)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb, orig_x)))
     return FALSE;
 
   start_sequence ();
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68841.c b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..15a27e7dc382d97398ca05427f431f5ecd3b89da
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
@@ -0,0 +1,31 @@
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-08 13:12                           ` Kyrill Tkachov
@ 2016-01-08 13:23                             ` Bernd Schmidt
  2016-01-11 11:10                               ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-08 13:23 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 01/08/2016 02:11 PM, Kyrill Tkachov wrote:
> How's this?

Hmm. Almost there, but...

> -  if (then_bb && else_bb && !a_simple && !b_simple
> -      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
> -	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
> +  rtx orig_x = x;
> +  if (then_bb && else_bb)
> +    {
> +      orig_x = SET_DEST (single_set (insn_a));
> +      gcc_assert (rtx_equal_p (orig_x, SET_DEST (single_set (insn_b))));
> +    }
> +
> +  if (then_bb && else_bb
> +      && (!bbs_ok_for_cmove_arith (then_bb, else_bb, orig_x)
> +	  || !bbs_ok_for_cmove_arith (else_bb, then_bb, orig_x)))
>      return FALSE;

This can be condensed to a single if statement (not repeating the 
then_bb && else_bb test), and orig_x can then be declared locally inside it.

The remaining question is whether it's safe to call single_set in this 
way. I thought it was only guaranteed to be safe if then_simple and 
else_simple. I had assumed that orig_x was available in this function 
but I now see that it's actually part of noce_process_if_block. I think 
it might be best to extend the if_info structure to also have an orig_x 
field. With that, I think we can even skip this particular assert 
(keeping the one in bbs_ok_for_cmove_arith however).


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-08 13:23                             ` Bernd Schmidt
@ 2016-01-11 11:10                               ` Kyrill Tkachov
  2016-01-11 11:58                                 ` Bernd Schmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2016-01-11 11:10 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2227 bytes --]

Hi Bernd,

On 08/01/16 13:22, Bernd Schmidt wrote:
> On 01/08/2016 02:11 PM, Kyrill Tkachov wrote:
>> How's this?
>
> Hmm. Almost there, but...
>
>> -  if (then_bb && else_bb && !a_simple && !b_simple
>> -      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
>> -      || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
>> +  rtx orig_x = x;
>> +  if (then_bb && else_bb)
>> +    {
>> +      orig_x = SET_DEST (single_set (insn_a));
>> +      gcc_assert (rtx_equal_p (orig_x, SET_DEST (single_set (insn_b))));
>> +    }
>> +
>> +  if (then_bb && else_bb
>> +      && (!bbs_ok_for_cmove_arith (then_bb, else_bb, orig_x)
>> +      || !bbs_ok_for_cmove_arith (else_bb, then_bb, orig_x)))
>>      return FALSE;
>
> This can be condensed to a single if statement (not repeating the then_bb && else_bb test), and orig_x can then be declared locally inside it.
>
> The remaining question is whether it's safe to call single_set in this way. I thought it was only guaranteed to be safe if then_simple and else_simple. I had assumed that orig_x was available in this function but I now see that it's 
> actually part of noce_process_if_block. I think it might be best to extend the if_info structure to also have an orig_x field. With that, I think we can even skip this particular assert (keeping the one in bbs_ok_for_cmove_arith however).
>

Ok, this works too. Here is a version that adds orig_x to the if_info struct and passes it down
to bbs_ok_for_cmove_arith.

This passes bootstrap and test on arm, aarch64, x86_64.

Thanks,
Kyrill

2016-01-11  Bernd Schmidt  <bschmidt@redhat.com>
             Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (struct noce_if_info): Add orig_x field.
     (bbs_ok_for_cmove_arith): Add to_rename parameter.
     Don't record conflicts on to_rename if it's present.
     Allow memory destinations in sets.
     (noce_try_cmove_arith): Call bbs_ok_for_cmove_arith even on simple
     blocks, passing orig_x to the checks.
     (noce_process_if_block): Set if_info->orig_x appropriately.

2016-01-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.
     * gcc.c-torture/execute/pr68841.c: New test.



[-- Attachment #2: ifcvt-tmp.patch --]
[-- Type: text/x-patch, Size: 4843 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 5812ce30151ed7425780890c66e7763f5758df7e..76561d2fa03590462ea880d79377844d5d6b53f1 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -792,6 +792,9 @@ struct noce_if_info
   /* The SET_DEST of INSN_A.  */
   rtx x;
 
+  /* The original set destination that the THEN and ELSE basic blocks finally
+     write their result to.  */
+  rtx orig_x;
   /* True if this if block is not canonical.  In the canonical form of
      if blocks, the THEN_BB is the block reached via the fallthru edge
      from TEST_BB.  For the noce transformations, we allow the symmetric
@@ -1896,11 +1899,13 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 }
 
 
-/* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+/* Return true iff the registers that the insns in BB_A set do not get
+   used in BB_B.  If TO_RENAME is non-NULL then it is a location that will be
+   renamed later by the caller and so conflicts on it should be ignored
+   in this function.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b, rtx to_rename)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1920,10 +1925,10 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  BITMAP_FREE (bba_sets);
 	  return false;
 	}
-
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
-	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
+	if (!(to_rename && DF_REF_REG (def) == to_rename))
+	  bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
     }
 
   rtx_insn *b_insn;
@@ -1942,8 +1947,15 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	}
 
       /* Make sure this is a REG and not some instance
-	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
-      if (!REG_P (SET_DEST (sset_b)))
+	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.
+	 If we have a memory destination then we have a pair of simple
+	 basic blocks performing an operation of the form [addr] = c ? a : b.
+	 bb_valid_for_noce_process_p will have ensured that these are
+	 the only stores present.  In that case [addr] should be the location
+	 to be renamed.  Assert that the callers set this up properly.  */
+      if (MEM_P (SET_DEST (sset_b)))
+	gcc_assert (rtx_equal_p (SET_DEST (sset_b), to_rename));
+      else if (!REG_P (SET_DEST (sset_b)))
 	{
 	  BITMAP_FREE (bba_sets);
 	  return false;
@@ -2112,9 +2124,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+  if (then_bb && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb,  if_info->orig_x)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb,  if_info->orig_x)))
     return FALSE;
 
   start_sequence ();
@@ -3430,6 +3442,7 @@ noce_process_if_block (struct noce_if_info *if_info)
   /* Only operate on register destinations, and even then avoid extending
      the lifetime of hard registers on small register class machines.  */
   orig_x = x;
+  if_info->orig_x = orig_x;
   if (!REG_P (x)
       || (HARD_REGISTER_P (x)
 	  && targetm.small_register_classes_for_mode_p (GET_MODE (x))))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68841.c b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..15a27e7dc382d97398ca05427f431f5ecd3b89da
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68841.c
@@ -0,0 +1,31 @@
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other
  2016-01-11 11:10                               ` Kyrill Tkachov
@ 2016-01-11 11:58                                 ` Bernd Schmidt
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schmidt @ 2016-01-11 11:58 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

> Ok, this works too. Here is a version that adds orig_x to the if_info
> struct and passes it down
> to bbs_ok_for_cmove_arith.
>
> This passes bootstrap and test on arm, aarch64, x86_64.

I think this is ok.


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-11 11:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18 13:07 [PATCH][RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other Kyrill Tkachov
2015-12-18 13:16 ` Bernd Schmidt
2015-12-18 13:57   ` Kyrill Tkachov
2015-12-21  9:23   ` Kyrill Tkachov
2016-01-05 12:04     ` Kyrill Tkachov
2016-01-05 13:42     ` Bernd Schmidt
2016-01-05 14:22       ` Kyrill Tkachov
2016-01-05 16:34         ` Bernd Schmidt
2016-01-05 17:06           ` Kyrill Tkachov
2016-01-05 18:19             ` Kyrill Tkachov
2016-01-06 10:36               ` Kyrill Tkachov
2016-01-07  9:22                 ` Kyrill Tkachov
2016-01-07 11:52                   ` Bernd Schmidt
2016-01-07 13:45                     ` Kyrill Tkachov
2016-01-07 13:47                       ` Bernd Schmidt
2016-01-07 14:00                         ` Kyrill Tkachov
2016-01-08 13:12                           ` Kyrill Tkachov
2016-01-08 13:23                             ` Bernd Schmidt
2016-01-11 11:10                               ` Kyrill Tkachov
2016-01-11 11:58                                 ` Bernd Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).