* [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
@ 2013-05-29 17:15 Meador Inge
2013-06-05 20:36 ` Meador Inge
2013-06-06 13:11 ` Richard Earnshaw
0 siblings, 2 replies; 7+ messages in thread
From: Meador Inge @ 2013-05-29 17:15 UTC (permalink / raw)
To: gcc-patches; +Cc: ramana.radhakrishnan, richard.earnshaw
Hi All,
This patch fixes a bug in one of the ARM peephole2 optimizations. The
peephole2 optimization in question was changed to use the CC-updating
form for all of the instructions produced by the peephole so that the
encoding will be smaller when compiling for thumb [1]. However, I don't
think that is always safe.
For example, the CC register might be used by something *after* the
peephole window. The current peephole will transform:
(insn:TI 7 49 18 2 (set (reg:CC 24 cc)
(compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
(const_int 0 [0]))) repro.c:5 212 {*arm_cmpsi_insn}
(nil))
(insn:TI 18 7 11 2 (cond_exec (ne (reg:CC 24 cc)
(const_int 0 [0]))
(set (reg:SI 3 r3 [140])
(const_int 0 [0]))) repro.c:8 3366 {*p *arm_movsi_vfp}
(expr_list:REG_EQUIV (const_int 0 [0])
(nil)))
(insn 11 18 19 2 (cond_exec (eq (reg:CC 24 cc)
(const_int 0 [0]))
(set (reg:SI 3 r3 [138])
(const_int 1 [0x1]))) repro.c:6 3366 {*p *arm_movsi_vfp}
(expr_list:REG_EQUIV (const_int 1 [0x1])
(nil)))
(insn:TI 19 11 12 2 (cond_exec (ne (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
(reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
(expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
(nil)))
(insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
(reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
(nil))
(insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
(reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
(expr_list:REG_DEAD (reg:CC 24 cc)
(expr_list:REG_DEAD (reg:QI 3 r3 [140])
(expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
(nil)))))
into the following:
(insn 59 49 60 2 (parallel [
(set (reg:CC 24 cc)
(compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
(const_int 0 [0])))
(set (reg:SI 1 r1)
(minus:SI (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
(const_int 0 [0])))
]) repro.c:6 -1
(nil))
(insn 60 59 61 2 (parallel [
(set (reg:CC 24 cc)
(compare:CC (const_int 0 [0])
(reg:SI 1 r1)))
(set (reg:SI 3 r3 [140])
(minus:SI (const_int 0 [0])
(reg:SI 1 r1)))
]) repro.c:6 -1
(nil))
(insn 61 60 19 2 (parallel [
(set (reg:SI 3 r3 [140])
(plus:SI (plus:SI (reg:SI 3 r3 [140])
(reg:SI 1 r1))
(geu:SI (reg:CC 24 cc)
(const_int 0 [0]))))
(clobber (reg:CC 24 cc))
]) repro.c:6 -1
(nil))
(insn:TI 19 61 12 2 (cond_exec (ne (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
(reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
(nil))
(insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
(reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
(expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
(nil)))
(insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
(const_int 0 [0]))
(set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
(reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
(expr_list:REG_DEAD (reg:CC 24 cc)
(expr_list:REG_DEAD (reg:QI 3 r3 [140])
(expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
(nil)))))
This gets compiled into the incorrect sequence:
ldrb r3, [r0, #0]
ldr r2, .L4
subs r1, r3, #0
rsbs r3, r1, #0
adcs r3, r3, r1
strne r3, [r2, #0]
streq r3, [r2, #0]
strneb r3, [r0, #0]
The conditional stores are now dealing with an incorrect condition state.
This patch fixes the problem by ensuring that the CC reg is dead after the
peephole window for the current peephole definition and falls back on the
original pre-PR46975 peephole when it is live. Unfortunately I had trouble
coming up with a reproduction case against mainline. I only noticed the bug
while working with some local changes that exposed it.
Built and tested a full ARM GNU/Linux toolchain. No regressions in the GCC
test suite.
OK?
gcc/
2013-05-29 Meador Inge <meadori@codesourcery.com>
* config/arm/arm.md (conditional move peephole2): Only clobber CC
register when it is dead after the peephole window.
[1] http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01336.html
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md (revision 199414)
+++ gcc/config/arm/arm.md (working copy)
@@ -9978,29 +9978,48 @@
;; Attempt to improve the sequence generated by the compare_scc splitters
;; not to use conditional execution.
(define_peephole2
- [(set (reg:CC CC_REGNUM)
+ [(set (match_operand 0 "cc_register" "")
(compare:CC (match_operand:SI 1 "register_operand" "")
(match_operand:SI 2 "arm_rhs_operand" "")))
(cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
- (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
+ (set (match_operand:SI 3 "register_operand" "") (const_int 0)))
(cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
- (set (match_dup 0) (const_int 1)))
- (match_scratch:SI 3 "r")]
- "TARGET_32BIT"
+ (set (match_dup 3) (const_int 1)))
+ (match_scratch:SI 4 "r")]
+ "TARGET_32BIT && peep2_reg_dead_p (3, operands[0])"
[(parallel
[(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 1) (match_dup 2)))
- (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
+ (set (match_dup 4) (minus:SI (match_dup 1) (match_dup 2)))])
(parallel
[(set (reg:CC CC_REGNUM)
- (compare:CC (const_int 0) (match_dup 3)))
- (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
+ (compare:CC (const_int 0) (match_dup 4)))
+ (set (match_dup 3) (minus:SI (const_int 0) (match_dup 4)))])
(parallel
- [(set (match_dup 0)
- (plus:SI (plus:SI (match_dup 0) (match_dup 3))
+ [(set (match_dup 3)
+ (plus:SI (plus:SI (match_dup 3) (match_dup 4))
(geu:SI (reg:CC CC_REGNUM) (const_int 0))))
(clobber (reg:CC CC_REGNUM))])])
+(define_peephole2
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (match_operand:SI 0 "register_operand" "")
+ (match_operand:SI 1 "arm_rhs_operand" "")))
+ (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
+ (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_dup 2) (const_int 1)))
+ (match_scratch:SI 3 "r")]
+ "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
+ [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
+ (parallel
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (const_int 0) (match_dup 3)))
+ (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
+ (set (match_dup 2)
+ (plus:SI (plus:SI (match_dup 2) (match_dup 3))
+ (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
+
(define_insn "*cond_move"
[(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
(if_then_else:SI (match_operator 3 "equality_operator"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-05-29 17:15 [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window Meador Inge
@ 2013-06-05 20:36 ` Meador Inge
2013-06-06 13:11 ` Richard Earnshaw
1 sibling, 0 replies; 7+ messages in thread
From: Meador Inge @ 2013-06-05 20:36 UTC (permalink / raw)
To: gcc-patches; +Cc: ramana.radhakrishnan, richard.earnshaw
Ping.
On 05/29/2013 12:15 PM, Meador Inge wrote:
> Hi All,
>
> This patch fixes a bug in one of the ARM peephole2 optimizations. The
> peephole2 optimization in question was changed to use the CC-updating
> form for all of the instructions produced by the peephole so that the
> encoding will be smaller when compiling for thumb [1]. However, I don't
> think that is always safe.
>
> For example, the CC register might be used by something *after* the
> peephole window. The current peephole will transform:
>
>
> (insn:TI 7 49 18 2 (set (reg:CC 24 cc)
> (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0]))) repro.c:5 212 {*arm_cmpsi_insn}
> (nil))
>
> (insn:TI 18 7 11 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (reg:SI 3 r3 [140])
> (const_int 0 [0]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_EQUIV (const_int 0 [0])
> (nil)))
>
> (insn 11 18 19 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (reg:SI 3 r3 [138])
> (const_int 1 [0x1]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_EQUIV (const_int 1 [0x1])
> (nil)))
>
> (insn:TI 19 11 12 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
> (nil)))
>
> (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (nil))
>
> (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
> (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
> (expr_list:REG_DEAD (reg:CC 24 cc)
> (expr_list:REG_DEAD (reg:QI 3 r3 [140])
> (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
> (nil)))))
>
> into the following:
>
>
> (insn 59 49 60 2 (parallel [
> (set (reg:CC 24 cc)
> (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0])))
> (set (reg:SI 1 r1)
> (minus:SI (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0])))
> ]) repro.c:6 -1
> (nil))
>
> (insn 60 59 61 2 (parallel [
> (set (reg:CC 24 cc)
> (compare:CC (const_int 0 [0])
> (reg:SI 1 r1)))
> (set (reg:SI 3 r3 [140])
> (minus:SI (const_int 0 [0])
> (reg:SI 1 r1)))
> ]) repro.c:6 -1
> (nil))
>
> (insn 61 60 19 2 (parallel [
> (set (reg:SI 3 r3 [140])
> (plus:SI (plus:SI (reg:SI 3 r3 [140])
> (reg:SI 1 r1))
> (geu:SI (reg:CC 24 cc)
> (const_int 0 [0]))))
> (clobber (reg:CC 24 cc))
> ]) repro.c:6 -1
> (nil))
>
> (insn:TI 19 61 12 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (nil))
>
> (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
> (nil)))
>
> (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
> (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
> (expr_list:REG_DEAD (reg:CC 24 cc)
> (expr_list:REG_DEAD (reg:QI 3 r3 [140])
> (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
> (nil)))))
>
>
> This gets compiled into the incorrect sequence:
>
>
> ldrb r3, [r0, #0]
> ldr r2, .L4
> subs r1, r3, #0
> rsbs r3, r1, #0
> adcs r3, r3, r1
> strne r3, [r2, #0]
> streq r3, [r2, #0]
> strneb r3, [r0, #0]
>
>
> The conditional stores are now dealing with an incorrect condition state.
>
> This patch fixes the problem by ensuring that the CC reg is dead after the
> peephole window for the current peephole definition and falls back on the
> original pre-PR46975 peephole when it is live. Unfortunately I had trouble
> coming up with a reproduction case against mainline. I only noticed the bug
> while working with some local changes that exposed it.
>
> Built and tested a full ARM GNU/Linux toolchain. No regressions in the GCC
> test suite.
>
> OK?
>
> gcc/
>
> 2013-05-29 Meador Inge <meadori@codesourcery.com>
>
> * config/arm/arm.md (conditional move peephole2): Only clobber CC
> register when it is dead after the peephole window.
>
> [1] http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01336.html
>
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md (revision 199414)
> +++ gcc/config/arm/arm.md (working copy)
> @@ -9978,29 +9978,48 @@
> ;; Attempt to improve the sequence generated by the compare_scc splitters
> ;; not to use conditional execution.
> (define_peephole2
> - [(set (reg:CC CC_REGNUM)
> + [(set (match_operand 0 "cc_register" "")
> (compare:CC (match_operand:SI 1 "register_operand" "")
> (match_operand:SI 2 "arm_rhs_operand" "")))
> (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
> - (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
> + (set (match_operand:SI 3 "register_operand" "") (const_int 0)))
> (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
> - (set (match_dup 0) (const_int 1)))
> - (match_scratch:SI 3 "r")]
> - "TARGET_32BIT"
> + (set (match_dup 3) (const_int 1)))
> + (match_scratch:SI 4 "r")]
> + "TARGET_32BIT && peep2_reg_dead_p (3, operands[0])"
> [(parallel
> [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 1) (match_dup 2)))
> - (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
> + (set (match_dup 4) (minus:SI (match_dup 1) (match_dup 2)))])
> (parallel
> [(set (reg:CC CC_REGNUM)
> - (compare:CC (const_int 0) (match_dup 3)))
> - (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
> + (compare:CC (const_int 0) (match_dup 4)))
> + (set (match_dup 3) (minus:SI (const_int 0) (match_dup 4)))])
> (parallel
> - [(set (match_dup 0)
> - (plus:SI (plus:SI (match_dup 0) (match_dup 3))
> + [(set (match_dup 3)
> + (plus:SI (plus:SI (match_dup 3) (match_dup 4))
> (geu:SI (reg:CC CC_REGNUM) (const_int 0))))
> (clobber (reg:CC CC_REGNUM))])])
>
> +(define_peephole2
> + [(set (reg:CC CC_REGNUM)
> + (compare:CC (match_operand:SI 0 "register_operand" "")
> + (match_operand:SI 1 "arm_rhs_operand" "")))
> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
> + (set (match_dup 2) (const_int 1)))
> + (match_scratch:SI 3 "r")]
> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
> + (parallel
> + [(set (reg:CC CC_REGNUM)
> + (compare:CC (const_int 0) (match_dup 3)))
> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
> + (set (match_dup 2)
> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
> +
> (define_insn "*cond_move"
> [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
> (if_then_else:SI (match_operator 3 "equality_operator"
>
--
Meador Inge
CodeSourcery / Mentor Embedded
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-05-29 17:15 [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window Meador Inge
2013-06-05 20:36 ` Meador Inge
@ 2013-06-06 13:11 ` Richard Earnshaw
2013-06-06 18:23 ` Meador Inge
1 sibling, 1 reply; 7+ messages in thread
From: Richard Earnshaw @ 2013-06-06 13:11 UTC (permalink / raw)
To: Meador Inge; +Cc: gcc-patches, Ramana Radhakrishnan
On 29/05/13 18:15, Meador Inge wrote:
> Hi All,
>
> This patch fixes a bug in one of the ARM peephole2 optimizations. The
> peephole2 optimization in question was changed to use the CC-updating
> form for all of the instructions produced by the peephole so that the
> encoding will be smaller when compiling for thumb [1]. However, I don't
> think that is always safe.
>
> For example, the CC register might be used by something *after* the
> peephole window. The current peephole will transform:
>
>
> (insn:TI 7 49 18 2 (set (reg:CC 24 cc)
> (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0]))) repro.c:5 212 {*arm_cmpsi_insn}
> (nil))
>
> (insn:TI 18 7 11 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (reg:SI 3 r3 [140])
> (const_int 0 [0]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_EQUIV (const_int 0 [0])
> (nil)))
>
> (insn 11 18 19 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (reg:SI 3 r3 [138])
> (const_int 1 [0x1]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_EQUIV (const_int 1 [0x1])
> (nil)))
>
> (insn:TI 19 11 12 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
> (nil)))
>
> (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (nil))
>
> (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
> (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
> (expr_list:REG_DEAD (reg:CC 24 cc)
> (expr_list:REG_DEAD (reg:QI 3 r3 [140])
> (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
> (nil)))))
>
> into the following:
>
>
> (insn 59 49 60 2 (parallel [
> (set (reg:CC 24 cc)
> (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0])))
> (set (reg:SI 1 r1)
> (minus:SI (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136])
> (const_int 0 [0])))
> ]) repro.c:6 -1
> (nil))
>
> (insn 60 59 61 2 (parallel [
> (set (reg:CC 24 cc)
> (compare:CC (const_int 0 [0])
> (reg:SI 1 r1)))
> (set (reg:SI 3 r3 [140])
> (minus:SI (const_int 0 [0])
> (reg:SI 1 r1)))
> ]) repro.c:6 -1
> (nil))
>
> (insn 61 60 19 2 (parallel [
> (set (reg:SI 3 r3 [140])
> (plus:SI (plus:SI (reg:SI 3 r3 [140])
> (reg:SI 1 r1))
> (geu:SI (reg:CC 24 cc)
> (const_int 0 [0]))))
> (clobber (reg:CC 24 cc))
> ]) repro.c:6 -1
> (nil))
>
> (insn:TI 19 61 12 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp}
> (nil))
>
> (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32])
> (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp}
> (expr_list:REG_DEAD (reg/f:SI 2 r2 [143])
> (nil)))
>
> (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc)
> (const_int 0 [0]))
> (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8])
> (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn}
> (expr_list:REG_DEAD (reg:CC 24 cc)
> (expr_list:REG_DEAD (reg:QI 3 r3 [140])
> (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135])
> (nil)))))
>
>
> This gets compiled into the incorrect sequence:
>
>
> ldrb r3, [r0, #0]
> ldr r2, .L4
> subs r1, r3, #0
> rsbs r3, r1, #0
> adcs r3, r3, r1
> strne r3, [r2, #0]
> streq r3, [r2, #0]
> strneb r3, [r0, #0]
>
>
> The conditional stores are now dealing with an incorrect condition state.
>
> This patch fixes the problem by ensuring that the CC reg is dead after the
> peephole window for the current peephole definition and falls back on the
> original pre-PR46975 peephole when it is live. Unfortunately I had trouble
> coming up with a reproduction case against mainline. I only noticed the bug
> while working with some local changes that exposed it.
>
> Built and tested a full ARM GNU/Linux toolchain. No regressions in the GCC
> test suite.
>
> OK?
>
> gcc/
>
> 2013-05-29 Meador Inge <meadori@codesourcery.com>
>
> * config/arm/arm.md (conditional move peephole2): Only clobber CC
> register when it is dead after the peephole window.
>
> [1] http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01336.html
>
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md (revision 199414)
> +++ gcc/config/arm/arm.md (working copy)
> @@ -9978,29 +9978,48 @@
> ;; Attempt to improve the sequence generated by the compare_scc splitters
> ;; not to use conditional execution.
> (define_peephole2
> - [(set (reg:CC CC_REGNUM)
> + [(set (match_operand 0 "cc_register" "")
> (compare:CC (match_operand:SI 1 "register_operand" "")
> (match_operand:SI 2 "arm_rhs_operand" "")))
> (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
> - (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
> + (set (match_operand:SI 3 "register_operand" "") (const_int 0)))
> (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
> - (set (match_dup 0) (const_int 1)))
> - (match_scratch:SI 3 "r")]
> - "TARGET_32BIT"
> + (set (match_dup 3) (const_int 1)))
> + (match_scratch:SI 4 "r")]
> + "TARGET_32BIT && peep2_reg_dead_p (3, operands[0])"
> [(parallel
> [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 1) (match_dup 2)))
> - (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
> + (set (match_dup 4) (minus:SI (match_dup 1) (match_dup 2)))])
> (parallel
> [(set (reg:CC CC_REGNUM)
> - (compare:CC (const_int 0) (match_dup 3)))
> - (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
> + (compare:CC (const_int 0) (match_dup 4)))
> + (set (match_dup 3) (minus:SI (const_int 0) (match_dup 4)))])
> (parallel
> - [(set (match_dup 0)
> - (plus:SI (plus:SI (match_dup 0) (match_dup 3))
> + [(set (match_dup 3)
> + (plus:SI (plus:SI (match_dup 3) (match_dup 4))
> (geu:SI (reg:CC CC_REGNUM) (const_int 0))))
> (clobber (reg:CC CC_REGNUM))])])
>
I understand (and agree with) this bit...
> +(define_peephole2
> + [(set (reg:CC CC_REGNUM)
> + (compare:CC (match_operand:SI 0 "register_operand" "")
> + (match_operand:SI 1 "arm_rhs_operand" "")))
> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
> + (set (match_dup 2) (const_int 1)))
> + (match_scratch:SI 3 "r")]
> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
> + (parallel
> + [(set (reg:CC CC_REGNUM)
> + (compare:CC (const_int 0) (match_dup 3)))
> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
> + (set (match_dup 2)
> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
> +
... but what's this bit about?
R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-06-06 13:11 ` Richard Earnshaw
@ 2013-06-06 18:23 ` Meador Inge
2013-06-11 4:47 ` Meador Inge
2013-06-18 16:23 ` Meador Inge
0 siblings, 2 replies; 7+ messages in thread
From: Meador Inge @ 2013-06-06 18:23 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: gcc-patches, Ramana Radhakrishnan
On 06/06/2013 08:11 AM, Richard Earnshaw wrote:
> I understand (and agree with) this bit...
>
>> +(define_peephole2
>> + [(set (reg:CC CC_REGNUM)
>> + (compare:CC (match_operand:SI 0 "register_operand" "")
>> + (match_operand:SI 1 "arm_rhs_operand" "")))
>> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
>> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
>> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
>> + (set (match_dup 2) (const_int 1)))
>> + (match_scratch:SI 3 "r")]
>> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
>> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
>> + (parallel
>> + [(set (reg:CC CC_REGNUM)
>> + (compare:CC (const_int 0) (match_dup 3)))
>> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
>> + (set (match_dup 2)
>> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
>> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
>> +
>
> ... but what's this bit about?
The original intent was to revert back to the original peephole pattern
(pre-PR 46975) when the CC reg is still live, but that doesn't properly
maintain the CC state either (it just happened to pass in the test
case I was looking at because I only cared about the Z flag, which is
maintained the same).
OK with the above bit left out?
--
Meador Inge
CodeSourcery / Mentor Embedded
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-06-06 18:23 ` Meador Inge
@ 2013-06-11 4:47 ` Meador Inge
2013-06-18 16:23 ` Meador Inge
1 sibling, 0 replies; 7+ messages in thread
From: Meador Inge @ 2013-06-11 4:47 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: gcc-patches, Ramana Radhakrishnan
On 06/06/2013 01:23 PM, Meador Inge wrote:
> On 06/06/2013 08:11 AM, Richard Earnshaw wrote:
>
>> I understand (and agree with) this bit...
>>
>>> +(define_peephole2
>>> + [(set (reg:CC CC_REGNUM)
>>> + (compare:CC (match_operand:SI 0 "register_operand" "")
>>> + (match_operand:SI 1 "arm_rhs_operand" "")))
>>> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
>>> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
>>> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
>>> + (set (match_dup 2) (const_int 1)))
>>> + (match_scratch:SI 3 "r")]
>>> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
>>> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
>>> + (parallel
>>> + [(set (reg:CC CC_REGNUM)
>>> + (compare:CC (const_int 0) (match_dup 3)))
>>> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
>>> + (set (match_dup 2)
>>> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
>>> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
>>> +
>>
>> ... but what's this bit about?
>
> The original intent was to revert back to the original peephole pattern
> (pre-PR 46975) when the CC reg is still live, but that doesn't properly
> maintain the CC state either (it just happened to pass in the test
> case I was looking at because I only cared about the Z flag, which is
> maintained the same).
>
> OK with the above bit left out?
OK?
--
Meador Inge
CodeSourcery / Mentor Embedded
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-06-06 18:23 ` Meador Inge
2013-06-11 4:47 ` Meador Inge
@ 2013-06-18 16:23 ` Meador Inge
2013-06-19 12:33 ` Richard Earnshaw
1 sibling, 1 reply; 7+ messages in thread
From: Meador Inge @ 2013-06-18 16:23 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: gcc-patches, Ramana Radhakrishnan
Ping.
On 06/06/2013 01:23 PM, Meador Inge wrote:
> On 06/06/2013 08:11 AM, Richard Earnshaw wrote:
>
>> I understand (and agree with) this bit...
>>
>>> +(define_peephole2
>>> + [(set (reg:CC CC_REGNUM)
>>> + (compare:CC (match_operand:SI 0 "register_operand" "")
>>> + (match_operand:SI 1 "arm_rhs_operand" "")))
>>> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
>>> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
>>> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
>>> + (set (match_dup 2) (const_int 1)))
>>> + (match_scratch:SI 3 "r")]
>>> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
>>> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
>>> + (parallel
>>> + [(set (reg:CC CC_REGNUM)
>>> + (compare:CC (const_int 0) (match_dup 3)))
>>> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
>>> + (set (match_dup 2)
>>> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
>>> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
>>> +
>>
>> ... but what's this bit about?
>
> The original intent was to revert back to the original peephole pattern
> (pre-PR 46975) when the CC reg is still live, but that doesn't properly
> maintain the CC state either (it just happened to pass in the test
> case I was looking at because I only cared about the Z flag, which is
> maintained the same).
>
> OK with the above bit left out?
>
--
Meador Inge
CodeSourcery / Mentor Embedded
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
2013-06-18 16:23 ` Meador Inge
@ 2013-06-19 12:33 ` Richard Earnshaw
0 siblings, 0 replies; 7+ messages in thread
From: Richard Earnshaw @ 2013-06-19 12:33 UTC (permalink / raw)
To: Meador Inge; +Cc: gcc-patches, Ramana Radhakrishnan
[-- Attachment #1: Type: text/plain, Size: 2682 bytes --]
On 18/06/13 17:22, Meador Inge wrote:
> Ping.
>
> On 06/06/2013 01:23 PM, Meador Inge wrote:
>> On 06/06/2013 08:11 AM, Richard Earnshaw wrote:
>>
>>> I understand (and agree with) this bit...
>>>
>>>> +(define_peephole2
>>>> + [(set (reg:CC CC_REGNUM)
>>>> + (compare:CC (match_operand:SI 0 "register_operand" "")
>>>> + (match_operand:SI 1 "arm_rhs_operand" "")))
>>>> + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
>>>> + (set (match_operand:SI 2 "register_operand" "") (const_int 0)))
>>>> + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
>>>> + (set (match_dup 2) (const_int 1)))
>>>> + (match_scratch:SI 3 "r")]
>>>> + "TARGET_32BIT && !peep2_reg_dead_p (3, operands[0])"
>>>> + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1)))
>>>> + (parallel
>>>> + [(set (reg:CC CC_REGNUM)
>>>> + (compare:CC (const_int 0) (match_dup 3)))
>>>> + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))])
>>>> + (set (match_dup 2)
>>>> + (plus:SI (plus:SI (match_dup 2) (match_dup 3))
>>>> + (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
>>>> +
>>>
>>> ... but what's this bit about?
>>
>> The original intent was to revert back to the original peephole pattern
>> (pre-PR 46975) when the CC reg is still live, but that doesn't properly
>> maintain the CC state either (it just happened to pass in the test
>> case I was looking at because I only cared about the Z flag, which is
>> maintained the same).
>>
>> OK with the above bit left out?
>>
>
>
Sorry for the delay, I've been sidetracked onto other things.
Having looked at this patch I realized that we were missing a trick on
ARMv5 and later, when a more efficient sequence exists, particularly for
Cortex-A15. By using CLZ we can avoid the need to set the condition
code register at all, which gives us far more scheduling freedom. It's
also best not to unnecessarily clobber the condition code register even
if there are other instructions in the sequence that do set/use the
flags (the peepholer pass right at the end will do this optimization
when it is useful), so I've tweaked some of the existing alternatives as
well.
Finally, we can use peep2_regno_dead_p (rather than peep2_reg_dead_p) to
avoid having to create extra match_operand values.
The result is that I've now committed the patch below.
R.
2013-06-19 Richard Earnshaw <rearnsha@arm.com>
arm.md (split for eq(reg, 0)): Add variants for ARMv5 and
Thumb2.
(peepholes for eq(reg, not-0)): Ensure condition register is
dead after pattern. Use more efficient sequences on ARMv5 and
Thumb2.
[-- Attachment #2: gcc-eq.patch --]
[-- Type: text/plain, Size: 4475 bytes --]
--- gcc/config/arm/arm.md (revision 200187)
+++ gcc/config/arm/arm.md (local)
@@ -10021,6 +10021,16 @@ (define_split
(eq:SI (match_operand:SI 1 "s_register_operand" "")
(const_int 0)))
(clobber (reg:CC CC_REGNUM))]
+ "arm_arch5 && TARGET_32BIT"
+ [(set (match_dup 0) (clz:SI (match_dup 1)))
+ (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 5)))]
+)
+
+(define_split
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (eq:SI (match_operand:SI 1 "s_register_operand" "")
+ (const_int 0)))
+ (clobber (reg:CC CC_REGNUM))]
"TARGET_32BIT && reload_completed"
[(parallel
[(set (reg:CC CC_REGNUM)
@@ -10090,29 +10100,87 @@ (define_insn_and_split "*compare_scc"
;; Attempt to improve the sequence generated by the compare_scc splitters
;; not to use conditional execution.
+
+;; Rd = (eq (reg1) (const_int0)) // ARMv5
+;; clz Rd, reg1
+;; lsr Rd, Rd, #5
(define_peephole2
[(set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 1 "register_operand" "")
- (match_operand:SI 2 "arm_rhs_operand" "")))
+ (const_int 0)))
+ (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
+ (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_dup 0) (const_int 1)))]
+ "arm_arch5 && TARGET_32BIT && peep2_regno_dead_p (3, CC_REGNUM)"
+ [(set (match_dup 0) (clz:SI (match_dup 1)))
+ (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 5)))]
+)
+
+;; Rd = (eq (reg1) (const_int0)) // !ARMv5
+;; negs Rd, reg1
+;; adc Rd, Rd, reg1
+(define_peephole2
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (match_operand:SI 1 "register_operand" "")
+ (const_int 0)))
(cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
(set (match_operand:SI 0 "register_operand" "") (const_int 0)))
(cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
(set (match_dup 0) (const_int 1)))
- (match_scratch:SI 3 "r")]
- "TARGET_32BIT"
+ (match_scratch:SI 2 "r")]
+ "TARGET_32BIT && peep2_regno_dead_p (3, CC_REGNUM)"
[(parallel
[(set (reg:CC CC_REGNUM)
- (compare:CC (match_dup 1) (match_dup 2)))
- (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
+ (compare:CC (const_int 0) (match_dup 1)))
+ (set (match_dup 2) (minus:SI (const_int 0) (match_dup 1)))])
+ (set (match_dup 0)
+ (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+ (geu:SI (reg:CC CC_REGNUM) (const_int 0))))]
+)
+
+;; Rd = (eq (reg1) (reg2/imm)) // ARMv5
+;; sub Rd, Reg1, reg2
+;; clz Rd, Rd
+;; lsr Rd, Rd, #5
+(define_peephole2
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (match_operand:SI 1 "register_operand" "")
+ (match_operand:SI 2 "arm_rhs_operand" "")))
+ (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
+ (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_dup 0) (const_int 1)))]
+ "arm_arch5 && TARGET_32BIT && peep2_regno_dead_p (3, CC_REGNUM)"
+ [(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))
+ (set (match_dup 0) (clz:SI (match_dup 0)))
+ (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 5)))]
+)
+
+
+;; Rd = (eq (reg1) (reg2/imm)) // ! ARMv5
+;; sub T1, Reg1, reg2
+;; negs Rd, T1
+;; adc Rd, Rd, T1
+(define_peephole2
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (match_operand:SI 1 "register_operand" "")
+ (match_operand:SI 2 "arm_rhs_operand" "")))
+ (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_operand:SI 0 "register_operand" "") (const_int 0)))
+ (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_dup 0) (const_int 1)))
+ (match_scratch:SI 3 "r")]
+ "TARGET_32BIT && peep2_regno_dead_p (3, CC_REGNUM)"
+ [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
(parallel
[(set (reg:CC CC_REGNUM)
(compare:CC (const_int 0) (match_dup 3)))
(set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
- (parallel
- [(set (match_dup 0)
- (plus:SI (plus:SI (match_dup 0) (match_dup 3))
- (geu:SI (reg:CC CC_REGNUM) (const_int 0))))
- (clobber (reg:CC CC_REGNUM))])])
+ (set (match_dup 0)
+ (plus:SI (plus:SI (match_dup 0) (match_dup 3))
+ (geu:SI (reg:CC CC_REGNUM) (const_int 0))))]
+)
(define_insn "*cond_move"
[(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-06-19 12:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-29 17:15 [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window Meador Inge
2013-06-05 20:36 ` Meador Inge
2013-06-06 13:11 ` Richard Earnshaw
2013-06-06 18:23 ` Meador Inge
2013-06-11 4:47 ` Meador Inge
2013-06-18 16:23 ` Meador Inge
2013-06-19 12:33 ` Richard Earnshaw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).