* ARM patch (PR42172): Rework the {sign,zero}-extend patterns
@ 2010-06-08 22:48 Bernd Schmidt
2010-06-18 0:37 ` Ping: " Bernd Schmidt
2010-07-01 13:11 ` Richard Earnshaw
0 siblings, 2 replies; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-08 22:48 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2873 bytes --]
PR42172 is a case where we generate terrible code for bitfield
assignments on Thumb1. Part of the problem is that we fail to optimize
strb r3, [r0]
ldrb r3, [r0]
(insn 11 (set (mem:QI (reg:SI 133) (subreg:QI (reg:SI 136) 0))
(insn 12 (set (reg:SI 140) (zero_extend:SI (mem:QI (reg/v/f:SI 133)))))
cse will of course try to replace the second MEM with the previous value
in a register, but the ARM backend (if !arch_v6) does not accept
zero_extend from a reg, only from a mem. This makes some sense at first
glance as there isn't a specific instruction, so the expander creates a
sequence of shifts when given a REG input.
However, to make CSE do its job, I believe it's best to make the ARM
extension patterns accept registers. We should avoid creating such
patterns when possible, both at expand time (generating the sequence of
shifts as before), and in the combiner (by setting appropriate
rtx_costs), since the two-shift sequence can often be optimized further.
The following patch does that, and it also fixes a few other problems:
* The expanders are a little ugly:
[(set (match_dup 2)
(ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
(const_int 16)))
This is invalid RTL; it works because we modify operand[1], but I've
changed the patterns to avoid this (and to make them a lot smaller,
and, IMO, clearer)
* Slightly broken Thumb1 patterns:
if (which_alternative == 0)
return \"sxtb\\t%0, %1\";
[....]
if (which_alternative == 0)
return \"ldrsb\\t%0, %1\";
* Removed massive duplication of Thumb1 patterns for v6 and !v6
* Removed apparently useless code to handle LABEL_REFs in addresses
in Thumb1 extend patterns.
Here are some examples of the code generation changes:
strb r3, [r4, #4]
ldrb r2, [r8, #4] @ zero_extendqisi2
- ldrb r3, [r4, #4] @ zero_extendqisi2
+ and r3, r3, #255
====
- ldrb r3, [sp, #36] @ zero_extendqisi2
- bic r3, r3, #176
- orr r3, r3, #64
- strb r3, [sp, #36]
- ldrb r3, [sp, #36] @ zero_extendqisi2
- bic r3, r3, #10
- orr r3, r3, #5
+ mov r3, #69
====
- strb r2, [ip, #4]
- ldrb r2, [ip, #4] @ zero_extendqisi2
bic r2, r2, #2
strb r2, [ip, #4]
====
- ldrh r3, [r5, #2]
- lsl r3, r3, #16
- asr r3, r3, #16
- add r2, r3, #1
+ mov r0, #2
+ ldrsh r3, [r5, r0]
+ add r1, r3, #1
On the whole things looked very good in the testcases I looked at.
Unfortunately PR42172 isn't completely solved by this, but it gets a lot
better.
Regression tested on:
Target is arm-none-linux-gnueabi
Host is i686-pc-linux-gnu
Schedule of variations:
qemu-system-armv7/arch=armv7-a/thumb
qemu-system-armv7/thumb
qemu-system-armv7
Ok?
Bernd
[-- Attachment #2: pr42172.diff --]
[-- Type: text/plain, Size: 30081 bytes --]
* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
and ZERO_EXTEND.
(arm_rtx_costs_1): Likewise.
(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
* config/arm/arm.md (is_arch6): New attribute.
(zero_extendhisi2, zero_extendqisi2, extendhisi2,
extendqisi2): Tighten the code somewhat, avoiding invalid
RTL to occur in the expander patterns.
(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
(thumb1_zero_extendhisi2_v6): Delete.
(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
(thumb1_extendhisi2_v6): Delete.
(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
(thumb1_extendqisi2_v6): Delete.
(zero_extendhisi2 for register input splitter): New.
(zero_extendqisi2 for register input splitter): New.
(thumb1_extendhisi2 for register input splitter): New.
(extendhisi2 for register input splitter): New.
(extendqisi2 for register input splitter): New.
(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
and add support for a register alternative requiring a split.
(thumb1_zero_extendqisi2): Likewise.
(arm_zero_extendqisi2): Likewise.
(arm_extendhisi2): Likewise.
(arm_extendqisi2): Likewise.
* gcc.target/arm/pr42172-1.c: New test.
Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c (revision 160260)
+++ config/arm/arm.c (working copy)
@@ -6193,6 +6193,7 @@ static inline int
thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
{
enum machine_mode mode = GET_MODE (x);
+ int total;
switch (code)
{
@@ -6291,24 +6292,20 @@ thumb1_rtx_costs (rtx x, enum rtx_code c
return 14;
return 2;
+ case SIGN_EXTEND:
case ZERO_EXTEND:
- /* XXX still guessing. */
- switch (GET_MODE (XEXP (x, 0)))
- {
- case QImode:
- return (1 + (mode == DImode ? 4 : 0)
- + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+ total = mode == DImode ? COSTS_N_INSNS (1) : 0;
+ total += thumb1_rtx_costs (XEXP (x, 0), GET_CODE (XEXP (x, 0)), code);
- case HImode:
- return (4 + (mode == DImode ? 4 : 0)
- + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+ if (mode == SImode)
+ return total;
- case SImode:
- return (1 + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+ if (arm_arch6)
+ return total + COSTS_N_INSNS (1);
- default:
- return 99;
- }
+ /* Assume a two-shift sequence. Increase the cost slightly so
+ we prefer actual shifts over an extend operation. */
+ return total + 1 + COSTS_N_INSNS (2);
default:
return 99;
@@ -6794,44 +6791,39 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
return false;
case SIGN_EXTEND:
- if (GET_MODE_CLASS (mode) == MODE_INT)
- {
- *total = 0;
- if (mode == DImode)
- *total += COSTS_N_INSNS (1);
-
- if (GET_MODE (XEXP (x, 0)) != SImode)
- {
- if (arm_arch6)
- {
- if (GET_CODE (XEXP (x, 0)) != MEM)
- *total += COSTS_N_INSNS (1);
- }
- else if (!arm_arch4 || GET_CODE (XEXP (x, 0)) != MEM)
- *total += COSTS_N_INSNS (2);
- }
-
- return false;
- }
-
- /* Fall through */
case ZERO_EXTEND:
*total = 0;
if (GET_MODE_CLASS (mode) == MODE_INT)
{
+ rtx op = XEXP (x, 0);
+ enum machine_mode opmode = GET_MODE (op);
+
if (mode == DImode)
*total += COSTS_N_INSNS (1);
- if (GET_MODE (XEXP (x, 0)) != SImode)
+ if (opmode != SImode)
{
- if (arm_arch6)
+ if (MEM_P (op))
{
- if (GET_CODE (XEXP (x, 0)) != MEM)
- *total += COSTS_N_INSNS (1);
+ /* If !arm_arch4, we use one of the extendhisi2_mem
+ or movhi_bytes patterns for HImode. For a QImode
+ sign extension, we first zero-extend from memory
+ and then perform a shift sequence. */
+ if (!arm_arch4 && (opmode != QImode || code == SIGN_EXTEND))
+ *total += COSTS_N_INSNS (2);
}
- else if (!arm_arch4 || GET_CODE (XEXP (x, 0)) != MEM)
- *total += COSTS_N_INSNS (GET_MODE (XEXP (x, 0)) == QImode ?
- 1 : 2);
+ else if (arm_arch6)
+ *total += COSTS_N_INSNS (1);
+
+ /* We don't have the necessary insn, so we need to perform some
+ other operation. */
+ else if (TARGET_ARM && code == ZERO_EXTEND && mode == QImode)
+ /* An and with constant 255. */
+ *total += COSTS_N_INSNS (1);
+ else
+ /* A shift sequence. Increase costs slightly to avoid
+ combining two shifts into an extend operation. */
+ *total += COSTS_N_INSNS (2) + 1;
}
return false;
@@ -7187,41 +7179,8 @@ arm_size_rtx_costs (rtx x, enum rtx_code
return false;
case SIGN_EXTEND:
- *total = 0;
- if (GET_MODE_SIZE (GET_MODE (XEXP (x, 0))) < 4)
- {
- if (!(arm_arch4 && MEM_P (XEXP (x, 0))))
- *total += COSTS_N_INSNS (arm_arch6 ? 1 : 2);
- }
- if (mode == DImode)
- *total += COSTS_N_INSNS (1);
- return false;
-
case ZERO_EXTEND:
- *total = 0;
- if (!(arm_arch4 && MEM_P (XEXP (x, 0))))
- {
- switch (GET_MODE (XEXP (x, 0)))
- {
- case QImode:
- *total += COSTS_N_INSNS (1);
- break;
-
- case HImode:
- *total += COSTS_N_INSNS (arm_arch6 ? 1 : 2);
-
- case SImode:
- break;
-
- default:
- *total += COSTS_N_INSNS (2);
- }
- }
-
- if (mode == DImode)
- *total += COSTS_N_INSNS (1);
-
- return false;
+ return arm_rtx_costs_1 (x, outer_code, total, 0);
case CONST_INT:
if (const_ok_for_arm (INTVAL (x)))
Index: config/arm/ldmstm.md
===================================================================
Index: config/arm/arm-ldmstm.ml
===================================================================
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md (revision 160260)
+++ config/arm/arm.md (working copy)
@@ -148,6 +148,9 @@ (define_constants
; patterns that share the same RTL in both ARM and Thumb code.
(define_attr "is_thumb" "no,yes" (const (symbol_ref "thumb_code")))
+; IS_ARCH6 is set to 'yes' when we are generating code form ARMv6.
+(define_attr "is_arch6" "no,yes" (const (symbol_ref "arm_arch6")))
+
;; Operand number of an input operand that is shifted. Zero if the
;; given instruction does not shift one of its input operands.
(define_attr "shift" "" (const_int 0))
@@ -3958,93 +3961,46 @@ (define_insn "*arm_extendsidi2"
)
(define_expand "zero_extendhisi2"
- [(set (match_dup 2)
- (ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
- (const_int 16)))
- (set (match_operand:SI 0 "s_register_operand" "")
- (lshiftrt:SI (match_dup 2) (const_int 16)))]
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
"TARGET_EITHER"
- "
- {
- if ((TARGET_THUMB1 || arm_arch4) && GET_CODE (operands[1]) == MEM)
- {
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_ZERO_EXTEND (SImode, operands[1])));
- DONE;
- }
-
- if (TARGET_ARM && GET_CODE (operands[1]) == MEM)
- {
- emit_insn (gen_movhi_bytes (operands[0], operands[1]));
- DONE;
- }
-
- if (!s_register_operand (operands[1], HImode))
- operands[1] = copy_to_mode_reg (HImode, operands[1]);
-
- if (arm_arch6)
- {
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_ZERO_EXTEND (SImode, operands[1])));
- DONE;
- }
-
- operands[1] = gen_lowpart (SImode, operands[1]);
- operands[2] = gen_reg_rtx (SImode);
- }"
-)
-
-(define_insn "*thumb1_zero_extendhisi2"
- [(set (match_operand:SI 0 "register_operand" "=l")
- (zero_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
- "TARGET_THUMB1 && !arm_arch6"
- "*
- rtx mem = XEXP (operands[1], 0);
-
- if (GET_CODE (mem) == CONST)
- mem = XEXP (mem, 0);
-
- if (GET_CODE (mem) == LABEL_REF)
- return \"ldr\\t%0, %1\";
-
- if (GET_CODE (mem) == PLUS)
+{
+ if (TARGET_ARM && !arm_arch4 && MEM_P (operands[1]))
{
- rtx a = XEXP (mem, 0);
- rtx b = XEXP (mem, 1);
-
- /* This can happen due to bugs in reload. */
- if (GET_CODE (a) == REG && REGNO (a) == SP_REGNUM)
- {
- rtx ops[2];
- ops[0] = operands[0];
- ops[1] = a;
-
- output_asm_insn (\"mov %0, %1\", ops);
-
- XEXP (mem, 0) = operands[0];
- }
-
- else if ( GET_CODE (a) == LABEL_REF
- && GET_CODE (b) == CONST_INT)
- return \"ldr\\t%0, %1\";
+ emit_insn (gen_movhi_bytes (operands[0], operands[1]));
+ DONE;
}
-
- return \"ldrh\\t%0, %1\";
- "
- [(set_attr "length" "4")
- (set_attr "type" "load_byte")
- (set_attr "pool_range" "60")]
-)
+ if (!arm_arch6 && !MEM_P (operands[1]))
+ {
+ rtx t = gen_lowpart (SImode, operands[1]);
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (16)));
+ emit_insn (gen_lshrsi3 (operands[0], tmp, GEN_INT (16)));
+ DONE;
+ }
+})
-(define_insn "*thumb1_zero_extendhisi2_v6"
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (zero_extend:SI (match_operand:HI 1 "register_operand" "l,m")))]
+ "!TARGET_THUMB2 && !arm_arch6"
+ [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+ (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 16)))]
+{
+ operands[2] = gen_lowpart (SImode, operands[1]);
+})
+
+(define_insn "*thumb1_zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=l,l")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "l,m")))]
- "TARGET_THUMB1 && arm_arch6"
+ "TARGET_THUMB1"
"*
rtx mem;
- if (which_alternative == 0)
+ if (which_alternative == 0 && arm_arch6)
return \"uxth\\t%0, %1\";
+ if (which_alternative == 0)
+ return \"#\";
mem = XEXP (operands[1], 0);
@@ -4078,20 +4034,25 @@ (define_insn "*thumb1_zero_extendhisi2_v
return \"ldrh\\t%0, %1\";
"
- [(set_attr "length" "2,4")
+ [(set_attr_alternative "length"
+ [(if_then_else (eq_attr "is_arch6" "yes")
+ (const_int 2) (const_int 4))
+ (const_int 4)])
(set_attr "type" "alu_shift,load_byte")
(set_attr "pool_range" "*,60")]
)
(define_insn "*arm_zero_extendhisi2"
- [(set (match_operand:SI 0 "s_register_operand" "=r")
- (zero_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+ (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
"TARGET_ARM && arm_arch4 && !arm_arch6"
- "ldr%(h%)\\t%0, %1"
- [(set_attr "type" "load_byte")
+ "@
+ #
+ ldr%(h%)\\t%0, %1"
+ [(set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
- (set_attr "pool_range" "256")
- (set_attr "neg_pool_range" "244")]
+ (set_attr "pool_range" "*,256")
+ (set_attr "neg_pool_range" "*,244")]
)
(define_insn "*arm_zero_extendhisi2_v6"
@@ -4121,50 +4082,49 @@ (define_expand "zero_extendqisi2"
[(set (match_operand:SI 0 "s_register_operand" "")
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))]
"TARGET_EITHER"
- "
- if (!arm_arch6 && GET_CODE (operands[1]) != MEM)
+{
+ if (TARGET_ARM && !arm_arch6 && GET_CODE (operands[1]) != MEM)
{
- if (TARGET_ARM)
- {
- emit_insn (gen_andsi3 (operands[0],
- gen_lowpart (SImode, operands[1]),
- GEN_INT (255)));
- }
- else /* TARGET_THUMB */
- {
- rtx temp = gen_reg_rtx (SImode);
- rtx ops[3];
-
- operands[1] = copy_to_mode_reg (QImode, operands[1]);
- operands[1] = gen_lowpart (SImode, operands[1]);
-
- ops[0] = temp;
- ops[1] = operands[1];
- ops[2] = GEN_INT (24);
-
- emit_insn (gen_rtx_SET (VOIDmode, ops[0],
- gen_rtx_ASHIFT (SImode, ops[1], ops[2])));
-
- ops[0] = operands[0];
- ops[1] = temp;
- ops[2] = GEN_INT (24);
+ emit_insn (gen_andsi3 (operands[0],
+ gen_lowpart (SImode, operands[1]),
+ GEN_INT (255)));
+ DONE;
+ }
+ if (!arm_arch6 && !MEM_P (operands[1]))
+ {
+ rtx t = gen_lowpart (SImode, operands[1]);
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (24)));
+ emit_insn (gen_lshrsi3 (operands[0], tmp, GEN_INT (24)));
+ DONE;
+ }
+})
- emit_insn (gen_rtx_SET (VOIDmode, ops[0],
- gen_rtx_LSHIFTRT (SImode, ops[1], ops[2])));
- }
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (zero_extend:SI (match_operand:QI 1 "register_operand" "")))]
+ "!arm_arch6"
+ [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 24)))
+ (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 24)))]
+{
+ operands[2] = simplify_gen_subreg (SImode, operands[1], QImode, 0);
+ if (TARGET_ARM)
+ {
+ emit_insn (gen_andsi3 (operands[0], operands[2], GEN_INT (255)));
DONE;
}
- "
-)
+})
(define_insn "*thumb1_zero_extendqisi2"
- [(set (match_operand:SI 0 "register_operand" "=l")
- (zero_extend:SI (match_operand:QI 1 "memory_operand" "m")))]
+ [(set (match_operand:SI 0 "register_operand" "=l,l")
+ (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "l,m")))]
"TARGET_THUMB1 && !arm_arch6"
- "ldrb\\t%0, %1"
- [(set_attr "length" "2")
- (set_attr "type" "load_byte")
- (set_attr "pool_range" "32")]
+ "@
+ #
+ ldrb\\t%0, %1"
+ [(set_attr "length" "4,2")
+ (set_attr "type" "alu_shift,load_byte")
+ (set_attr "pool_range" "*,32")]
)
(define_insn "*thumb1_zero_extendqisi2_v6"
@@ -4180,14 +4140,17 @@ (define_insn "*thumb1_zero_extendqisi2_v
)
(define_insn "*arm_zero_extendqisi2"
- [(set (match_operand:SI 0 "s_register_operand" "=r")
- (zero_extend:SI (match_operand:QI 1 "memory_operand" "m")))]
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+ (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
"TARGET_ARM && !arm_arch6"
- "ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2"
- [(set_attr "type" "load_byte")
+ "@
+ #
+ ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2"
+ [(set_attr "length" "8,4")
+ (set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
- (set_attr "pool_range" "4096")
- (set_attr "neg_pool_range" "4084")]
+ (set_attr "pool_range" "*,4096")
+ (set_attr "neg_pool_range" "*,4084")]
)
(define_insn "*arm_zero_extendqisi2_v6"
@@ -4266,108 +4229,42 @@ (define_insn "*compareqi_eq0"
)
(define_expand "extendhisi2"
- [(set (match_dup 2)
- (ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
- (const_int 16)))
- (set (match_operand:SI 0 "s_register_operand" "")
- (ashiftrt:SI (match_dup 2)
- (const_int 16)))]
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
"TARGET_EITHER"
- "
- {
- if (GET_CODE (operands[1]) == MEM)
- {
- if (TARGET_THUMB1)
- {
- emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
- DONE;
- }
- else if (arm_arch4)
- {
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_SIGN_EXTEND (SImode, operands[1])));
- DONE;
- }
- }
-
- if (TARGET_ARM && GET_CODE (operands[1]) == MEM)
- {
- emit_insn (gen_extendhisi2_mem (operands[0], operands[1]));
- DONE;
- }
-
- if (!s_register_operand (operands[1], HImode))
- operands[1] = copy_to_mode_reg (HImode, operands[1]);
-
- if (arm_arch6)
- {
- if (TARGET_THUMB1)
- emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
- else
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_SIGN_EXTEND (SImode, operands[1])));
-
- DONE;
- }
-
- operands[1] = gen_lowpart (SImode, operands[1]);
- operands[2] = gen_reg_rtx (SImode);
- }"
-)
-
-(define_insn "thumb1_extendhisi2"
- [(set (match_operand:SI 0 "register_operand" "=l")
- (sign_extend:SI (match_operand:HI 1 "memory_operand" "m")))
- (clobber (match_scratch:SI 2 "=&l"))]
- "TARGET_THUMB1 && !arm_arch6"
- "*
- {
- rtx ops[4];
- rtx mem = XEXP (operands[1], 0);
-
- /* This code used to try to use 'V', and fix the address only if it was
- offsettable, but this fails for e.g. REG+48 because 48 is outside the
- range of QImode offsets, and offsettable_address_p does a QImode
- address check. */
-
- if (GET_CODE (mem) == CONST)
- mem = XEXP (mem, 0);
-
- if (GET_CODE (mem) == LABEL_REF)
- return \"ldr\\t%0, %1\";
-
- if (GET_CODE (mem) == PLUS)
- {
- rtx a = XEXP (mem, 0);
- rtx b = XEXP (mem, 1);
-
- if (GET_CODE (a) == LABEL_REF
- && GET_CODE (b) == CONST_INT)
- return \"ldr\\t%0, %1\";
-
- if (GET_CODE (b) == REG)
- return \"ldrsh\\t%0, %1\";
-
- ops[1] = a;
- ops[2] = b;
- }
- else
- {
- ops[1] = mem;
- ops[2] = const0_rtx;
- }
+{
+ if (TARGET_THUMB1)
+ {
+ emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
+ DONE;
+ }
+ if (MEM_P (operands[1]) && TARGET_ARM && !arm_arch4)
+ {
+ emit_insn (gen_extendhisi2_mem (operands[0], operands[1]));
+ DONE;
+ }
- gcc_assert (GET_CODE (ops[1]) == REG);
+ if (!arm_arch6 && !MEM_P (operands[1]))
+ {
+ rtx t = gen_lowpart (SImode, operands[1]);
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (16)));
+ emit_insn (gen_ashrsi3 (operands[0], tmp, GEN_INT (16)));
+ DONE;
+ }
+})
- ops[0] = operands[0];
- ops[3] = operands[2];
- output_asm_insn (\"mov\\t%3, %2\;ldrsh\\t%0, [%1, %3]\", ops);
- return \"\";
- }"
- [(set_attr "length" "4")
- (set_attr "type" "load_byte")
- (set_attr "pool_range" "1020")]
-)
+(define_split
+ [(parallel
+ [(set (match_operand:SI 0 "register_operand" "")
+ (sign_extend:SI (match_operand:HI 1 "register_operand" "")))
+ (clobber (match_scratch:SI 2 ""))])]
+ "!arm_arch6"
+ [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+ (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 16)))]
+{
+ operands[2] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
+})
;; We used to have an early-clobber on the scratch register here.
;; However, there's a bug somewhere in reload which means that this
@@ -4376,16 +4273,18 @@ (define_insn "thumb1_extendhisi2"
;; we try to verify the operands. Fortunately, we don't really need
;; the early-clobber: we can always use operand 0 if operand 2
;; overlaps the address.
-(define_insn "*thumb1_extendhisi2_insn_v6"
+(define_insn "thumb1_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=l,l")
(sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "l,m")))
(clobber (match_scratch:SI 2 "=X,l"))]
- "TARGET_THUMB1 && arm_arch6"
+ "TARGET_THUMB1"
"*
{
rtx ops[4];
rtx mem;
+ if (which_alternative == 0 && !arm_arch6)
+ return \"#\";
if (which_alternative == 0)
return \"sxth\\t%0, %1\";
@@ -4433,7 +4332,10 @@ (define_insn "*thumb1_extendhisi2_insn_v
output_asm_insn (\"mov\\t%3, %2\;ldrsh\\t%0, [%1, %3]\", ops);
return \"\";
}"
- [(set_attr "length" "2,4")
+ [(set_attr_alternative "length"
+ [(if_then_else (eq_attr "is_arch6" "yes")
+ (const_int 2) (const_int 4))
+ (const_int 4)])
(set_attr "type" "alu_shift,load_byte")
(set_attr "pool_range" "*,1020")]
)
@@ -4474,15 +4376,28 @@ (define_expand "extendhisi2_mem"
}"
)
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (sign_extend:SI (match_operand:HI 1 "register_operand" "")))]
+ "!arm_arch6"
+ [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+ (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 16)))]
+{
+ operands[2] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
+})
+
(define_insn "*arm_extendhisi2"
- [(set (match_operand:SI 0 "s_register_operand" "=r")
- (sign_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+ (sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
"TARGET_ARM && arm_arch4 && !arm_arch6"
- "ldr%(sh%)\\t%0, %1"
- [(set_attr "type" "load_byte")
+ "@
+ #
+ ldr%(sh%)\\t%0, %1"
+ [(set_attr "length" "8,4")
+ (set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
- (set_attr "pool_range" "256")
- (set_attr "neg_pool_range" "244")]
+ (set_attr "pool_range" "*,256")
+ (set_attr "neg_pool_range" "*,244")]
)
;; ??? Check Thumb-2 pool range
@@ -4544,46 +4459,45 @@ (define_insn "*arm_extendqihi_insn"
)
(define_expand "extendqisi2"
- [(set (match_dup 2)
- (ashift:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "")
- (const_int 24)))
- (set (match_operand:SI 0 "s_register_operand" "")
- (ashiftrt:SI (match_dup 2)
- (const_int 24)))]
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (sign_extend:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "")))]
"TARGET_EITHER"
- "
- {
- if ((TARGET_THUMB || arm_arch4) && GET_CODE (operands[1]) == MEM)
- {
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_SIGN_EXTEND (SImode, operands[1])));
- DONE;
- }
-
- if (!s_register_operand (operands[1], QImode))
- operands[1] = copy_to_mode_reg (QImode, operands[1]);
+{
+ if (!arm_arch4 && MEM_P (operands[1]))
+ operands[1] = copy_to_mode_reg (QImode, operands[1]);
- if (arm_arch6)
- {
- emit_insn (gen_rtx_SET (VOIDmode, operands[0],
- gen_rtx_SIGN_EXTEND (SImode, operands[1])));
- DONE;
- }
+ if (!arm_arch6 && !MEM_P (operands[1]))
+ {
+ rtx t = gen_lowpart (SImode, operands[1]);
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (24)));
+ emit_insn (gen_ashrsi3 (operands[0], tmp, GEN_INT (24)));
+ DONE;
+ }
+})
- operands[1] = gen_lowpart (SImode, operands[1]);
- operands[2] = gen_reg_rtx (SImode);
- }"
-)
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (sign_extend:SI (match_operand:QI 1 "register_operand" "")))]
+ "!arm_arch6"
+ [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 24)))
+ (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 24)))]
+{
+ operands[2] = simplify_gen_subreg (SImode, operands[1], QImode, 0);
+})
(define_insn "*arm_extendqisi"
- [(set (match_operand:SI 0 "s_register_operand" "=r")
- (sign_extend:SI (match_operand:QI 1 "arm_extendqisi_mem_op" "Uq")))]
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+ (sign_extend:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "r,Uq")))]
"TARGET_ARM && arm_arch4 && !arm_arch6"
- "ldr%(sb%)\\t%0, %1"
- [(set_attr "type" "load_byte")
+ "@
+ #
+ ldr%(sb%)\\t%0, %1"
+ [(set_attr "length" "8,4")
+ (set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
- (set_attr "pool_range" "256")
- (set_attr "neg_pool_range" "244")]
+ (set_attr "pool_range" "*,256")
+ (set_attr "neg_pool_range" "*,244")]
)
(define_insn "*arm_extendqisi_v6"
@@ -4611,162 +4525,78 @@ (define_insn "*arm_extendqisi2addsi"
(set_attr "predicable" "yes")]
)
-(define_insn "*thumb1_extendqisi2"
- [(set (match_operand:SI 0 "register_operand" "=l,l")
- (sign_extend:SI (match_operand:QI 1 "memory_operand" "V,m")))]
- "TARGET_THUMB1 && !arm_arch6"
- "*
- {
- rtx ops[3];
- rtx mem = XEXP (operands[1], 0);
-
- if (GET_CODE (mem) == CONST)
- mem = XEXP (mem, 0);
-
- if (GET_CODE (mem) == LABEL_REF)
- return \"ldr\\t%0, %1\";
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (sign_extend:SI (match_operand:QI 1 "memory_operand" "")))]
+ "TARGET_THUMB1 && reload_completed"
+ [(set (match_dup 0) (match_dup 2))
+ (set (match_dup 0) (sign_extend:SI (match_dup 3)))]
+{
+ rtx addr = XEXP (operands[1], 0);
- if (GET_CODE (mem) == PLUS
- && GET_CODE (XEXP (mem, 0)) == LABEL_REF)
- return \"ldr\\t%0, %1\";
-
- if (which_alternative == 0)
- return \"ldrsb\\t%0, %1\";
-
- ops[0] = operands[0];
-
- if (GET_CODE (mem) == PLUS)
- {
- rtx a = XEXP (mem, 0);
- rtx b = XEXP (mem, 1);
-
- ops[1] = a;
- ops[2] = b;
+ if (GET_CODE (addr) == CONST)
+ addr = XEXP (addr, 0);
+ if (GET_CODE (addr) == PLUS
+ && REG_P (XEXP (addr, 0)) && REG_P (XEXP (addr, 1)))
+ /* No split necessary. */
+ FAIL;
+ if (GET_CODE (addr) == PLUS
+ && !REG_P (XEXP (addr, 0)) && !REG_P (XEXP (addr, 1)))
+ FAIL;
- if (GET_CODE (a) == REG)
- {
- if (GET_CODE (b) == REG)
- output_asm_insn (\"ldrsb\\t%0, [%1, %2]\", ops);
- else if (REGNO (a) == REGNO (ops[0]))
- {
- output_asm_insn (\"ldrb\\t%0, [%1, %2]\", ops);
- output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
- output_asm_insn (\"asr\\t%0, %0, #24\", ops);
- }
- else
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- else
- {
- gcc_assert (GET_CODE (b) == REG);
- if (REGNO (b) == REGNO (ops[0]))
- {
- output_asm_insn (\"ldrb\\t%0, [%2, %1]\", ops);
- output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
- output_asm_insn (\"asr\\t%0, %0, #24\", ops);
- }
- else
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- }
- else if (GET_CODE (mem) == REG && REGNO (ops[0]) == REGNO (mem))
- {
- output_asm_insn (\"ldrb\\t%0, [%0, #0]\", ops);
- output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
- output_asm_insn (\"asr\\t%0, %0, #24\", ops);
- }
- else
- {
- ops[1] = mem;
- ops[2] = const0_rtx;
-
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- return \"\";
- }"
- [(set_attr "length" "2,6")
- (set_attr "type" "load_byte,load_byte")
- (set_attr "pool_range" "32,32")]
-)
+ if (reg_overlap_mentioned_p (operands[0], addr))
+ {
+ rtx t = gen_lowpart (QImode, operands[0]);
+ emit_move_insn (t, operands[1]);
+ emit_insn (gen_thumb1_extendqisi2 (operands[0], t));
+ DONE;
+ }
+ if (REG_P (addr))
+ {
+ addr = gen_rtx_PLUS (Pmode, addr, operands[0]);
+ operands[2] = const0_rtx;
+ }
+ else if (GET_CODE (addr) != PLUS)
+ FAIL;
+ else if (REG_P (XEXP (addr, 0)))
+ {
+ operands[2] = XEXP (addr, 1);
+ addr = gen_rtx_PLUS (Pmode, XEXP (addr, 0), operands[0]);
+ }
+ else
+ {
+ operands[2] = XEXP (addr, 0);
+ addr = gen_rtx_PLUS (Pmode, XEXP (addr, 1), operands[0]);
+ }
+ operands[3] = change_address (operands[1], QImode, addr);
+})
-(define_insn "*thumb1_extendqisi2_v6"
+(define_insn "thumb1_extendqisi2"
[(set (match_operand:SI 0 "register_operand" "=l,l,l")
(sign_extend:SI (match_operand:QI 1 "nonimmediate_operand" "l,V,m")))]
- "TARGET_THUMB1 && arm_arch6"
- "*
- {
- rtx ops[3];
- rtx mem;
-
- if (which_alternative == 0)
- return \"sxtb\\t%0, %1\";
+ "TARGET_THUMB1"
+{
+ rtx addr;
- mem = XEXP (operands[1], 0);
-
- if (GET_CODE (mem) == CONST)
- mem = XEXP (mem, 0);
-
- if (GET_CODE (mem) == LABEL_REF)
- return \"ldr\\t%0, %1\";
+ if (which_alternative == 0 && arm_arch6)
+ return "sxtb\\t%0, %1";
+ if (which_alternative == 0)
+ return "#";
- if (GET_CODE (mem) == PLUS
- && GET_CODE (XEXP (mem, 0)) == LABEL_REF)
- return \"ldr\\t%0, %1\";
-
- if (which_alternative == 0)
- return \"ldrsb\\t%0, %1\";
+ addr = XEXP (operands[1], 0);
+ if (GET_CODE (addr) == PLUS
+ && REG_P (XEXP (addr, 0)) && REG_P (XEXP (addr, 1)))
+ return "ldrsb\\t%0, %1";
- ops[0] = operands[0];
-
- if (GET_CODE (mem) == PLUS)
- {
- rtx a = XEXP (mem, 0);
- rtx b = XEXP (mem, 1);
-
- ops[1] = a;
- ops[2] = b;
-
- if (GET_CODE (a) == REG)
- {
- if (GET_CODE (b) == REG)
- output_asm_insn (\"ldrsb\\t%0, [%1, %2]\", ops);
- else if (REGNO (a) == REGNO (ops[0]))
- {
- output_asm_insn (\"ldrb\\t%0, [%1, %2]\", ops);
- output_asm_insn (\"sxtb\\t%0, %0\", ops);
- }
- else
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- else
- {
- gcc_assert (GET_CODE (b) == REG);
- if (REGNO (b) == REGNO (ops[0]))
- {
- output_asm_insn (\"ldrb\\t%0, [%2, %1]\", ops);
- output_asm_insn (\"sxtb\\t%0, %0\", ops);
- }
- else
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- }
- else if (GET_CODE (mem) == REG && REGNO (ops[0]) == REGNO (mem))
- {
- output_asm_insn (\"ldrb\\t%0, [%0, #0]\", ops);
- output_asm_insn (\"sxtb\\t%0, %0\", ops);
- }
- else
- {
- ops[1] = mem;
- ops[2] = const0_rtx;
-
- output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
- }
- return \"\";
- }"
- [(set_attr "length" "2,2,4")
- (set_attr "type" "alu_shift,load_byte,load_byte")
- (set_attr "pool_range" "*,32,32")]
+ return "#";
+}
+ [(set_attr_alternative "length"
+ [(if_then_else (eq_attr "is_arch6" "yes")
+ (const_int 2) (const_int 4))
+ (const_int 2)
+ (if_then_else (eq_attr "is_arch6" "yes")
+ (const_int 4) (const_int 6))])
+ (set_attr "type" "alu_shift,load_byte,load_byte")]
)
(define_expand "extendsfdf2"
Index: testsuite/gcc.target/arm/pr42172-1.c
===================================================================
--- testsuite/gcc.target/arm/pr42172-1.c (revision 0)
+++ testsuite/gcc.target/arm/pr42172-1.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-options "-O2" } */
+
+struct A {
+ unsigned int f1 : 3;
+ unsigned int f2 : 3;
+ unsigned int f3 : 1;
+ unsigned int f4 : 1;
+
+};
+
+void init_A (struct A *this)
+{
+ this->f1 = 0;
+ this->f2 = 1;
+ this->f3 = 0;
+ this->f4 = 0;
+}
+
+/* { dg-final { scan-assembler-times "ldr" 1 } } */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Ping: ARM patch (PR42172): Rework the {sign,zero}-extend patterns
2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
@ 2010-06-18 0:37 ` Bernd Schmidt
2010-06-21 14:17 ` Ping^2 " Bernd Schmidt
2010-07-01 13:11 ` Richard Earnshaw
1 sibling, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-18 0:37 UTC (permalink / raw)
To: GCC Patches
Ping.
Bernd
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html
* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
and ZERO_EXTEND.
(arm_rtx_costs_1): Likewise.
(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
* config/arm/arm.md (is_arch6): New attribute.
(zero_extendhisi2, zero_extendqisi2, extendhisi2,
extendqisi2): Tighten the code somewhat, avoiding invalid
RTL to occur in the expander patterns.
(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
(thumb1_zero_extendhisi2_v6): Delete.
(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
(thumb1_extendhisi2_v6): Delete.
(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
(thumb1_extendqisi2_v6): Delete.
(zero_extendhisi2 for register input splitter): New.
(zero_extendqisi2 for register input splitter): New.
(thumb1_extendhisi2 for register input splitter): New.
(extendhisi2 for register input splitter): New.
(extendqisi2 for register input splitter): New.
(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
and add support for a register alternative requiring a split.
(thumb1_zero_extendqisi2): Likewise.
(arm_zero_extendqisi2): Likewise.
(arm_extendhisi2): Likewise.
(arm_extendqisi2): Likewise.
* gcc.target/arm/pr42172-1.c: New test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Ping^2 ARM patch (PR42172): Rework the {sign,zero}-extend patterns
2010-06-18 0:37 ` Ping: " Bernd Schmidt
@ 2010-06-21 14:17 ` Bernd Schmidt
2010-06-28 12:26 ` Ping^3 " Bernd Schmidt
0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-21 14:17 UTC (permalink / raw)
To: GCC Patches
Ping^2
Bernd
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html
>
> * config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
> and ZERO_EXTEND.
> (arm_rtx_costs_1): Likewise.
> (arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
> * config/arm/arm.md (is_arch6): New attribute.
> (zero_extendhisi2, zero_extendqisi2, extendhisi2,
> extendqisi2): Tighten the code somewhat, avoiding invalid
> RTL to occur in the expander patterns.
> (thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
> (thumb1_zero_extendhisi2_v6): Delete.
> (thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
> (thumb1_extendhisi2_v6): Delete.
> (thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
> (thumb1_extendqisi2_v6): Delete.
> (zero_extendhisi2 for register input splitter): New.
> (zero_extendqisi2 for register input splitter): New.
> (thumb1_extendhisi2 for register input splitter): New.
> (extendhisi2 for register input splitter): New.
> (extendqisi2 for register input splitter): New.
> (TARGET_THUMB1 extendqisi2 for memory input splitter): New.
> (arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
> and add support for a register alternative requiring a split.
> (thumb1_zero_extendqisi2): Likewise.
> (arm_zero_extendqisi2): Likewise.
> (arm_extendhisi2): Likewise.
> (arm_extendqisi2): Likewise.
>
> * gcc.target/arm/pr42172-1.c: New test.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Ping^3 ARM patch (PR42172): Rework the {sign,zero}-extend patterns
2010-06-21 14:17 ` Ping^2 " Bernd Schmidt
@ 2010-06-28 12:26 ` Bernd Schmidt
0 siblings, 0 replies; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-28 12:26 UTC (permalink / raw)
To: GCC Patches; +Cc: Richard Earnshaw
Ping^3
Bernd
>> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html
>>
>> * config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
>> and ZERO_EXTEND.
>> (arm_rtx_costs_1): Likewise.
>> (arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
>> * config/arm/arm.md (is_arch6): New attribute.
>> (zero_extendhisi2, zero_extendqisi2, extendhisi2,
>> extendqisi2): Tighten the code somewhat, avoiding invalid
>> RTL to occur in the expander patterns.
>> (thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
>> (thumb1_zero_extendhisi2_v6): Delete.
>> (thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
>> (thumb1_extendhisi2_v6): Delete.
>> (thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
>> (thumb1_extendqisi2_v6): Delete.
>> (zero_extendhisi2 for register input splitter): New.
>> (zero_extendqisi2 for register input splitter): New.
>> (thumb1_extendhisi2 for register input splitter): New.
>> (extendhisi2 for register input splitter): New.
>> (extendqisi2 for register input splitter): New.
>> (TARGET_THUMB1 extendqisi2 for memory input splitter): New.
>> (arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
>> and add support for a register alternative requiring a split.
>> (thumb1_zero_extendqisi2): Likewise.
>> (arm_zero_extendqisi2): Likewise.
>> (arm_extendhisi2): Likewise.
>> (arm_extendqisi2): Likewise.
>>
>> * gcc.target/arm/pr42172-1.c: New test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM patch (PR42172): Rework the {sign,zero}-extend patterns
2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
2010-06-18 0:37 ` Ping: " Bernd Schmidt
@ 2010-07-01 13:11 ` Richard Earnshaw
1 sibling, 0 replies; 5+ messages in thread
From: Richard Earnshaw @ 2010-07-01 13:11 UTC (permalink / raw)
To: Bernd Schmidt; +Cc: GCC Patches
On Wed, 2010-06-09 at 00:34 +0200, Bernd Schmidt wrote:
> PR42172 is a case where we generate terrible code for bitfield
> assignments on Thumb1. Part of the problem is that we fail to optimize
>
> strb r3, [r0]
> ldrb r3, [r0]
>
> (insn 11 (set (mem:QI (reg:SI 133) (subreg:QI (reg:SI 136) 0))
> (insn 12 (set (reg:SI 140) (zero_extend:SI (mem:QI (reg/v/f:SI 133)))))
>
> cse will of course try to replace the second MEM with the previous value
> in a register, but the ARM backend (if !arch_v6) does not accept
> zero_extend from a reg, only from a mem. This makes some sense at first
> glance as there isn't a specific instruction, so the expander creates a
> sequence of shifts when given a REG input.
>
> However, to make CSE do its job, I believe it's best to make the ARM
> extension patterns accept registers. We should avoid creating such
> patterns when possible, both at expand time (generating the sequence of
> shifts as before), and in the combiner (by setting appropriate
> rtx_costs), since the two-shift sequence can often be optimized further.
>
> The following patch does that, and it also fixes a few other problems:
>
> * The expanders are a little ugly:
> [(set (match_dup 2)
> (ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
> (const_int 16)))
> This is invalid RTL; it works because we modify operand[1], but I've
> changed the patterns to avoid this (and to make them a lot smaller,
> and, IMO, clearer)
> * Slightly broken Thumb1 patterns:
> if (which_alternative == 0)
> return \"sxtb\\t%0, %1\";
> [....]
> if (which_alternative == 0)
> return \"ldrsb\\t%0, %1\";
> * Removed massive duplication of Thumb1 patterns for v6 and !v6
> * Removed apparently useless code to handle LABEL_REFs in addresses
> in Thumb1 extend patterns.
>
> Here are some examples of the code generation changes:
> strb r3, [r4, #4]
> ldrb r2, [r8, #4] @ zero_extendqisi2
> - ldrb r3, [r4, #4] @ zero_extendqisi2
> + and r3, r3, #255
> ====
> - ldrb r3, [sp, #36] @ zero_extendqisi2
> - bic r3, r3, #176
> - orr r3, r3, #64
> - strb r3, [sp, #36]
> - ldrb r3, [sp, #36] @ zero_extendqisi2
> - bic r3, r3, #10
> - orr r3, r3, #5
> + mov r3, #69
> ====
> - strb r2, [ip, #4]
> - ldrb r2, [ip, #4] @ zero_extendqisi2
> bic r2, r2, #2
> strb r2, [ip, #4]
> ====
> - ldrh r3, [r5, #2]
> - lsl r3, r3, #16
> - asr r3, r3, #16
> - add r2, r3, #1
> + mov r0, #2
> + ldrsh r3, [r5, r0]
> + add r1, r3, #1
>
> On the whole things looked very good in the testcases I looked at.
> Unfortunately PR42172 isn't completely solved by this, but it gets a lot
> better.
>
> Regression tested on:
> Target is arm-none-linux-gnueabi
> Host is i686-pc-linux-gnu
>
> Schedule of variations:
> qemu-system-armv7/arch=armv7-a/thumb
> qemu-system-armv7/thumb
> qemu-system-armv7
>
This is ok.
Minor coding style nit, however:
+ if (reg_overlap_mentioned_p (operands[0], addr))
+ {
+ ...
+ }
+ if (REG_P (addr))
There should be a blank line at the end of each if...[else ...]
sequence.
R.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-07-01 13:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
2010-06-18 0:37 ` Ping: " Bernd Schmidt
2010-06-21 14:17 ` Ping^2 " Bernd Schmidt
2010-06-28 12:26 ` Ping^3 " Bernd Schmidt
2010-07-01 13:11 ` Richard Earnshaw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).