public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* ARM patch (PR42172): Rework the {sign,zero}-extend patterns
@ 2010-06-08 22:48 Bernd Schmidt
  2010-06-18  0:37 ` Ping: " Bernd Schmidt
  2010-07-01 13:11 ` Richard Earnshaw
  0 siblings, 2 replies; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-08 22:48 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2873 bytes --]

PR42172 is a case where we generate terrible code for bitfield
assignments on Thumb1.  Part of the problem is that we fail to optimize

	strb	r3, [r0]
	ldrb	r3, [r0]

(insn 11 (set (mem:QI (reg:SI 133) (subreg:QI (reg:SI 136) 0))
(insn 12 (set (reg:SI 140) (zero_extend:SI (mem:QI (reg/v/f:SI 133)))))

cse will of course try to replace the second MEM with the previous value
in a register, but the ARM backend (if !arch_v6) does not accept
zero_extend from a reg, only from a mem.  This makes some sense at first
glance as there isn't a specific instruction, so the expander creates a
sequence of shifts when given a REG input.

However, to make CSE do its job, I believe it's best to make the ARM
extension patterns accept registers.  We should avoid creating such
patterns when possible, both at expand time (generating the sequence of
shifts as before), and in the combiner (by setting appropriate
rtx_costs), since the two-shift sequence can often be optimized further.

The following patch does that, and it also fixes a few other problems:

* The expanders are a little ugly:
  [(set (match_dup 2)
	(ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
		   (const_int 16)))
  This is invalid RTL; it works because we modify operand[1], but I've
  changed the patterns to avoid this (and to make them a lot smaller,
  and, IMO, clearer)
* Slightly broken Thumb1 patterns:
    if (which_alternative == 0)
      return \"sxtb\\t%0, %1\";
    [....]
    if (which_alternative == 0)
      return \"ldrsb\\t%0, %1\";
* Removed massive duplication of Thumb1 patterns for v6 and !v6
* Removed apparently useless code to handle LABEL_REFs in addresses
  in Thumb1 extend patterns.

Here are some examples of the code generation changes:
        strb    r3, [r4, #4]
        ldrb    r2, [r8, #4]    @ zero_extendqisi2
-       ldrb    r3, [r4, #4]    @ zero_extendqisi2
+       and     r3, r3, #255
====
-       ldrb    r3, [sp, #36]   @ zero_extendqisi2
-       bic     r3, r3, #176
-       orr     r3, r3, #64
-       strb    r3, [sp, #36]
-       ldrb    r3, [sp, #36]   @ zero_extendqisi2
-       bic     r3, r3, #10
-       orr     r3, r3, #5
+       mov     r3, #69
====
-       strb    r2, [ip, #4]
-       ldrb    r2, [ip, #4]    @ zero_extendqisi2
        bic     r2, r2, #2
        strb    r2, [ip, #4]
====
-       ldrh    r3, [r5, #2]
-       lsl     r3, r3, #16
-       asr     r3, r3, #16
-       add     r2, r3, #1
+       mov     r0, #2
+       ldrsh   r3, [r5, r0]
+       add     r1, r3, #1

On the whole things looked very good in the testcases I looked at.
Unfortunately PR42172 isn't completely solved by this, but it gets a lot
better.

Regression tested on:
Target is arm-none-linux-gnueabi
Host   is i686-pc-linux-gnu

Schedule of variations:
    qemu-system-armv7/arch=armv7-a/thumb
    qemu-system-armv7/thumb
    qemu-system-armv7


Ok?


Bernd

[-- Attachment #2: pr42172.diff --]
[-- Type: text/plain, Size: 30081 bytes --]

	* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
	and ZERO_EXTEND.
	(arm_rtx_costs_1): Likewise.
	(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
	* config/arm/arm.md (is_arch6): New attribute.
	(zero_extendhisi2, zero_extendqisi2, extendhisi2,
	extendqisi2): Tighten the code somewhat, avoiding invalid
	RTL to occur in the expander patterns.
	(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
	(thumb1_zero_extendhisi2_v6): Delete.
	(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
	(thumb1_extendhisi2_v6): Delete.
	(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
	(thumb1_extendqisi2_v6): Delete.
	(zero_extendhisi2 for register input splitter): New.
	(zero_extendqisi2 for register input splitter): New.
	(thumb1_extendhisi2 for register input splitter): New.
	(extendhisi2 for register input splitter): New.
	(extendqisi2 for register input splitter): New.
	(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
	(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
	and add support for a register alternative requiring a split.
	(thumb1_zero_extendqisi2): Likewise.
	(arm_zero_extendqisi2): Likewise.
	(arm_extendhisi2): Likewise.
	(arm_extendqisi2): Likewise.

	* gcc.target/arm/pr42172-1.c: New test.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 160260)
+++ config/arm/arm.c	(working copy)
@@ -6193,6 +6193,7 @@ static inline int
 thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 {
   enum machine_mode mode = GET_MODE (x);
+  int total;
 
   switch (code)
     {
@@ -6291,24 +6292,20 @@ thumb1_rtx_costs (rtx x, enum rtx_code c
 	return 14;
       return 2;
 
+    case SIGN_EXTEND:
     case ZERO_EXTEND:
-      /* XXX still guessing.  */
-      switch (GET_MODE (XEXP (x, 0)))
-	{
-	case QImode:
-	  return (1 + (mode == DImode ? 4 : 0)
-		  + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+      total = mode == DImode ? COSTS_N_INSNS (1) : 0;
+      total += thumb1_rtx_costs (XEXP (x, 0), GET_CODE (XEXP (x, 0)), code);
 
-	case HImode:
-	  return (4 + (mode == DImode ? 4 : 0)
-		  + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+      if (mode == SImode)
+	return total;
 
-	case SImode:
-	  return (1 + (GET_CODE (XEXP (x, 0)) == MEM ? 10 : 0));
+      if (arm_arch6)
+	return total + COSTS_N_INSNS (1);
 
-	default:
-	  return 99;
-	}
+      /* Assume a two-shift sequence.  Increase the cost slightly so
+	 we prefer actual shifts over an extend operation.  */
+      return total + 1 + COSTS_N_INSNS (2);
 
     default:
       return 99;
@@ -6794,44 +6791,39 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
       return false;
 
     case SIGN_EXTEND:
-      if (GET_MODE_CLASS (mode) == MODE_INT)
-	{
-	  *total = 0;
-	  if (mode == DImode)
-	    *total += COSTS_N_INSNS (1);
-
-	  if (GET_MODE (XEXP (x, 0)) != SImode)
-	    {
-	      if (arm_arch6)
-		{
-		  if (GET_CODE (XEXP (x, 0)) != MEM)
-		    *total += COSTS_N_INSNS (1);
-		}
-	      else if (!arm_arch4 || GET_CODE (XEXP (x, 0)) != MEM)
-		*total += COSTS_N_INSNS (2);
-	    }
-
-	  return false;
-	}
-
-      /* Fall through */
     case ZERO_EXTEND:
       *total = 0;
       if (GET_MODE_CLASS (mode) == MODE_INT)
 	{
+	  rtx op = XEXP (x, 0);
+	  enum machine_mode opmode = GET_MODE (op);
+
 	  if (mode == DImode)
 	    *total += COSTS_N_INSNS (1);
 
-	  if (GET_MODE (XEXP (x, 0)) != SImode)
+	  if (opmode != SImode)
 	    {
-	      if (arm_arch6)
+	      if (MEM_P (op))
 		{
-		  if (GET_CODE (XEXP (x, 0)) != MEM)
-		    *total += COSTS_N_INSNS (1);
+		  /* If !arm_arch4, we use one of the extendhisi2_mem
+		     or movhi_bytes patterns for HImode.  For a QImode
+		     sign extension, we first zero-extend from memory
+		     and then perform a shift sequence.  */
+		  if (!arm_arch4 && (opmode != QImode || code == SIGN_EXTEND))
+		    *total += COSTS_N_INSNS (2);
 		}
-	      else if (!arm_arch4 || GET_CODE (XEXP (x, 0)) != MEM)
-		*total += COSTS_N_INSNS (GET_MODE (XEXP (x, 0)) == QImode ?
-					 1 : 2);
+	      else if (arm_arch6)
+		*total += COSTS_N_INSNS (1);
+
+	      /* We don't have the necessary insn, so we need to perform some
+		 other operation.  */
+	      else if (TARGET_ARM && code == ZERO_EXTEND && mode == QImode)
+		/* An and with constant 255.  */
+		*total += COSTS_N_INSNS (1);
+	      else
+		/* A shift sequence.  Increase costs slightly to avoid
+		   combining two shifts into an extend operation.  */
+		*total += COSTS_N_INSNS (2) + 1;
 	    }
 
 	  return false;
@@ -7187,41 +7179,8 @@ arm_size_rtx_costs (rtx x, enum rtx_code
       return false;
 
     case SIGN_EXTEND:
-      *total = 0;
-      if (GET_MODE_SIZE (GET_MODE (XEXP (x, 0))) < 4)
-	{
-	  if (!(arm_arch4 && MEM_P (XEXP (x, 0))))
-	    *total += COSTS_N_INSNS (arm_arch6 ? 1 : 2);
-	}
-      if (mode == DImode)
-	*total += COSTS_N_INSNS (1);
-      return false;
-
     case ZERO_EXTEND:
-      *total = 0;
-      if (!(arm_arch4 && MEM_P (XEXP (x, 0))))
-	{
-	  switch (GET_MODE (XEXP (x, 0)))
-	    {
-	    case QImode:
-	      *total += COSTS_N_INSNS (1);
-	      break;
-
-	    case HImode:
-	      *total += COSTS_N_INSNS (arm_arch6 ? 1 : 2);
-
-	    case SImode:
-	      break;
-
-	    default:
-	      *total += COSTS_N_INSNS (2);
-	    }
-	}
-
-      if (mode == DImode)
-	*total += COSTS_N_INSNS (1);
-
-      return false;
+      return arm_rtx_costs_1 (x, outer_code, total, 0);
 
     case CONST_INT:
       if (const_ok_for_arm (INTVAL (x)))
Index: config/arm/ldmstm.md
===================================================================
Index: config/arm/arm-ldmstm.ml
===================================================================
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 160260)
+++ config/arm/arm.md	(working copy)
@@ -148,6 +148,9 @@ (define_constants
 ; patterns that share the same RTL in both ARM and Thumb code.
 (define_attr "is_thumb" "no,yes" (const (symbol_ref "thumb_code")))
 
+; IS_ARCH6 is set to 'yes' when we are generating code form ARMv6.
+(define_attr "is_arch6" "no,yes" (const (symbol_ref "arm_arch6")))
+
 ;; Operand number of an input operand that is shifted.  Zero if the
 ;; given instruction does not shift one of its input operands.
 (define_attr "shift" "" (const_int 0))
@@ -3958,93 +3961,46 @@ (define_insn "*arm_extendsidi2"
 )
 
 (define_expand "zero_extendhisi2"
-  [(set (match_dup 2)
-	(ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
-		   (const_int 16)))
-   (set (match_operand:SI 0 "s_register_operand" "")
-	(lshiftrt:SI (match_dup 2) (const_int 16)))]
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
   "TARGET_EITHER"
-  "
-  {
-    if ((TARGET_THUMB1 || arm_arch4) && GET_CODE (operands[1]) == MEM)
-      {
-	emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-				gen_rtx_ZERO_EXTEND (SImode, operands[1])));
-	DONE;
-      }
-
-    if (TARGET_ARM && GET_CODE (operands[1]) == MEM)
-      {
-	emit_insn (gen_movhi_bytes (operands[0], operands[1]));
-	DONE;
-      }
-
-    if (!s_register_operand (operands[1], HImode))
-      operands[1] = copy_to_mode_reg (HImode, operands[1]);
-
-    if (arm_arch6)
-      {
-	emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-				gen_rtx_ZERO_EXTEND (SImode, operands[1])));
-	DONE;
-      }
-
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[2] = gen_reg_rtx (SImode);
-  }"
-)
-
-(define_insn "*thumb1_zero_extendhisi2"
-  [(set (match_operand:SI 0 "register_operand" "=l")
-	(zero_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
-  "TARGET_THUMB1 && !arm_arch6"
-  "*
-  rtx mem = XEXP (operands[1], 0);
-
-  if (GET_CODE (mem) == CONST)
-    mem = XEXP (mem, 0);
-    
-  if (GET_CODE (mem) == LABEL_REF)
-    return \"ldr\\t%0, %1\";
-    
-  if (GET_CODE (mem) == PLUS)
+{
+  if (TARGET_ARM && !arm_arch4 && MEM_P (operands[1]))
     {
-      rtx a = XEXP (mem, 0);
-      rtx b = XEXP (mem, 1);
-
-      /* This can happen due to bugs in reload.  */
-      if (GET_CODE (a) == REG && REGNO (a) == SP_REGNUM)
-        {
-          rtx ops[2];
-          ops[0] = operands[0];
-          ops[1] = a;
-      
-          output_asm_insn (\"mov	%0, %1\", ops);
-
-          XEXP (mem, 0) = operands[0];
-       }
-
-      else if (   GET_CODE (a) == LABEL_REF
-	       && GET_CODE (b) == CONST_INT)
-        return \"ldr\\t%0, %1\";
+      emit_insn (gen_movhi_bytes (operands[0], operands[1]));
+      DONE;
     }
-    
-  return \"ldrh\\t%0, %1\";
-  "
-  [(set_attr "length" "4")
-   (set_attr "type" "load_byte")
-   (set_attr "pool_range" "60")]
-)
+  if (!arm_arch6 && !MEM_P (operands[1]))
+    {
+      rtx t = gen_lowpart (SImode, operands[1]);
+      rtx tmp = gen_reg_rtx (SImode);
+      emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (16)));
+      emit_insn (gen_lshrsi3 (operands[0], tmp, GEN_INT (16)));
+      DONE;
+    }
+})
 
-(define_insn "*thumb1_zero_extendhisi2_v6"
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(zero_extend:SI (match_operand:HI 1 "register_operand" "l,m")))]
+  "!TARGET_THUMB2 && !arm_arch6"
+  [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 16)))]
+{
+  operands[2] = gen_lowpart (SImode, operands[1]);
+})
+
+(define_insn "*thumb1_zero_extendhisi2"
   [(set (match_operand:SI 0 "register_operand" "=l,l")
 	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "l,m")))]
-  "TARGET_THUMB1 && arm_arch6"
+  "TARGET_THUMB1"
   "*
   rtx mem;
 
-  if (which_alternative == 0)
+  if (which_alternative == 0 && arm_arch6)
     return \"uxth\\t%0, %1\";
+  if (which_alternative == 0)
+    return \"#\";
 
   mem = XEXP (operands[1], 0);
 
@@ -4078,20 +4034,25 @@ (define_insn "*thumb1_zero_extendhisi2_v
     
   return \"ldrh\\t%0, %1\";
   "
-  [(set_attr "length" "2,4")
+  [(set_attr_alternative "length"
+			 [(if_then_else (eq_attr "is_arch6" "yes")
+				       (const_int 2) (const_int 4))
+			 (const_int 4)])
    (set_attr "type" "alu_shift,load_byte")
    (set_attr "pool_range" "*,60")]
 )
 
 (define_insn "*arm_zero_extendhisi2"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(zero_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
   "TARGET_ARM && arm_arch4 && !arm_arch6"
-  "ldr%(h%)\\t%0, %1"
-  [(set_attr "type" "load_byte")
+  "@
+   #
+   ldr%(h%)\\t%0, %1"
+  [(set_attr "type" "alu_shift,load_byte")
    (set_attr "predicable" "yes")
-   (set_attr "pool_range" "256")
-   (set_attr "neg_pool_range" "244")]
+   (set_attr "pool_range" "*,256")
+   (set_attr "neg_pool_range" "*,244")]
 )
 
 (define_insn "*arm_zero_extendhisi2_v6"
@@ -4121,50 +4082,49 @@ (define_expand "zero_extendqisi2"
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))]
   "TARGET_EITHER"
-  "
-  if (!arm_arch6 && GET_CODE (operands[1]) != MEM)
+{
+  if (TARGET_ARM && !arm_arch6 && GET_CODE (operands[1]) != MEM)
     {
-      if (TARGET_ARM)
-        {
-          emit_insn (gen_andsi3 (operands[0],
-				 gen_lowpart (SImode, operands[1]),
-			         GEN_INT (255)));
-        }
-      else /* TARGET_THUMB */
-        {
-          rtx temp = gen_reg_rtx (SImode);
-	  rtx ops[3];
-	  
-          operands[1] = copy_to_mode_reg (QImode, operands[1]);
-          operands[1] = gen_lowpart (SImode, operands[1]);
-
-	  ops[0] = temp;
-	  ops[1] = operands[1];
-	  ops[2] = GEN_INT (24);
-
-	  emit_insn (gen_rtx_SET (VOIDmode, ops[0],
-				  gen_rtx_ASHIFT (SImode, ops[1], ops[2])));
-	  
-          ops[0] = operands[0];
-	  ops[1] = temp;
-	  ops[2] = GEN_INT (24);
+      emit_insn (gen_andsi3 (operands[0],
+			     gen_lowpart (SImode, operands[1]),
+					  GEN_INT (255)));
+      DONE;
+    }
+  if (!arm_arch6 && !MEM_P (operands[1]))
+    {
+      rtx t = gen_lowpart (SImode, operands[1]);
+      rtx tmp = gen_reg_rtx (SImode);
+      emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (24)));
+      emit_insn (gen_lshrsi3 (operands[0], tmp, GEN_INT (24)));
+      DONE;
+    }
+})
 
-	  emit_insn (gen_rtx_SET (VOIDmode, ops[0],
-				  gen_rtx_LSHIFTRT (SImode, ops[1], ops[2])));
-	}
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(zero_extend:SI (match_operand:QI 1 "register_operand" "")))]
+  "!arm_arch6"
+  [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 24)))
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 24)))]
+{
+  operands[2] = simplify_gen_subreg (SImode, operands[1], QImode, 0);
+  if (TARGET_ARM)
+    {
+      emit_insn (gen_andsi3 (operands[0], operands[2], GEN_INT (255)));
       DONE;
     }
-  "
-)
+})
 
 (define_insn "*thumb1_zero_extendqisi2"
-  [(set (match_operand:SI 0 "register_operand" "=l")
-	(zero_extend:SI (match_operand:QI 1 "memory_operand" "m")))]
+  [(set (match_operand:SI 0 "register_operand" "=l,l")
+	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "l,m")))]
   "TARGET_THUMB1 && !arm_arch6"
-  "ldrb\\t%0, %1"
-  [(set_attr "length" "2")
-   (set_attr "type" "load_byte")
-   (set_attr "pool_range" "32")]
+  "@
+   #
+   ldrb\\t%0, %1"
+  [(set_attr "length" "4,2")
+   (set_attr "type" "alu_shift,load_byte")
+   (set_attr "pool_range" "*,32")]
 )
 
 (define_insn "*thumb1_zero_extendqisi2_v6"
@@ -4180,14 +4140,17 @@ (define_insn "*thumb1_zero_extendqisi2_v
 )
 
 (define_insn "*arm_zero_extendqisi2"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(zero_extend:SI (match_operand:QI 1 "memory_operand" "m")))]
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
   "TARGET_ARM && !arm_arch6"
-  "ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2"
-  [(set_attr "type" "load_byte")
+  "@
+   #
+   ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2"
+  [(set_attr "length" "8,4")
+   (set_attr "type" "alu_shift,load_byte")
    (set_attr "predicable" "yes")
-   (set_attr "pool_range" "4096")
-   (set_attr "neg_pool_range" "4084")]
+   (set_attr "pool_range" "*,4096")
+   (set_attr "neg_pool_range" "*,4084")]
 )
 
 (define_insn "*arm_zero_extendqisi2_v6"
@@ -4266,108 +4229,42 @@ (define_insn "*compareqi_eq0"
 )
 
 (define_expand "extendhisi2"
-  [(set (match_dup 2)
-	(ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
-		   (const_int 16)))
-   (set (match_operand:SI 0 "s_register_operand" "")
-	(ashiftrt:SI (match_dup 2)
-		     (const_int 16)))]
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
   "TARGET_EITHER"
-  "
-  {
-    if (GET_CODE (operands[1]) == MEM)
-      {
-	if (TARGET_THUMB1)
-	  {
-	    emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
-	    DONE;
-          }
-	else if (arm_arch4)
-	  {
-	    emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-		       gen_rtx_SIGN_EXTEND (SImode, operands[1])));
-	    DONE;
-	  }
-      }
-
-    if (TARGET_ARM && GET_CODE (operands[1]) == MEM)
-      {
-        emit_insn (gen_extendhisi2_mem (operands[0], operands[1]));
-        DONE;
-      }
-
-    if (!s_register_operand (operands[1], HImode))
-      operands[1] = copy_to_mode_reg (HImode, operands[1]);
-
-    if (arm_arch6)
-      {
-	if (TARGET_THUMB1)
-	  emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
-	else
-	  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-		     gen_rtx_SIGN_EXTEND (SImode, operands[1])));
-
-	DONE;
-      }
-
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[2] = gen_reg_rtx (SImode);
-  }"
-)
-
-(define_insn "thumb1_extendhisi2"
-  [(set (match_operand:SI 0 "register_operand" "=l")
-	(sign_extend:SI (match_operand:HI 1 "memory_operand" "m")))
-   (clobber (match_scratch:SI 2 "=&l"))]
-  "TARGET_THUMB1 && !arm_arch6"
-  "*
-  {
-    rtx ops[4];
-    rtx mem = XEXP (operands[1], 0);
-
-    /* This code used to try to use 'V', and fix the address only if it was
-       offsettable, but this fails for e.g. REG+48 because 48 is outside the
-       range of QImode offsets, and offsettable_address_p does a QImode
-       address check.  */
-       
-    if (GET_CODE (mem) == CONST)
-      mem = XEXP (mem, 0);
-    
-    if (GET_CODE (mem) == LABEL_REF)
-      return \"ldr\\t%0, %1\";
-    
-    if (GET_CODE (mem) == PLUS)
-      {
-        rtx a = XEXP (mem, 0);
-        rtx b = XEXP (mem, 1);
-
-        if (GET_CODE (a) == LABEL_REF
-	    && GET_CODE (b) == CONST_INT)
-          return \"ldr\\t%0, %1\";
-
-        if (GET_CODE (b) == REG)
-          return \"ldrsh\\t%0, %1\";
-	  
-        ops[1] = a;
-        ops[2] = b;
-      }
-    else
-      {
-        ops[1] = mem;
-        ops[2] = const0_rtx;
-      }
+{
+  if (TARGET_THUMB1)
+    {
+      emit_insn (gen_thumb1_extendhisi2 (operands[0], operands[1]));
+      DONE;
+    }
+  if (MEM_P (operands[1]) && TARGET_ARM && !arm_arch4)
+    {
+      emit_insn (gen_extendhisi2_mem (operands[0], operands[1]));
+      DONE;
+    }
 
-    gcc_assert (GET_CODE (ops[1]) == REG);
+  if (!arm_arch6 && !MEM_P (operands[1]))
+    {
+      rtx t = gen_lowpart (SImode, operands[1]);
+      rtx tmp = gen_reg_rtx (SImode);
+      emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (16)));
+      emit_insn (gen_ashrsi3 (operands[0], tmp, GEN_INT (16)));
+      DONE;
+    }
+})
 
-    ops[0] = operands[0];
-    ops[3] = operands[2];
-    output_asm_insn (\"mov\\t%3, %2\;ldrsh\\t%0, [%1, %3]\", ops);
-    return \"\";
-  }"
-  [(set_attr "length" "4")
-   (set_attr "type" "load_byte")
-   (set_attr "pool_range" "1020")]
-)
+(define_split
+  [(parallel
+    [(set (match_operand:SI 0 "register_operand" "")
+	  (sign_extend:SI (match_operand:HI 1 "register_operand" "")))
+     (clobber (match_scratch:SI 2 ""))])]
+  "!arm_arch6"
+  [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+   (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 16)))]
+{
+  operands[2] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
+})
 
 ;; We used to have an early-clobber on the scratch register here.
 ;; However, there's a bug somewhere in reload which means that this
@@ -4376,16 +4273,18 @@ (define_insn "thumb1_extendhisi2"
 ;; we try to verify the operands.  Fortunately, we don't really need
 ;; the early-clobber: we can always use operand 0 if operand 2
 ;; overlaps the address.
-(define_insn "*thumb1_extendhisi2_insn_v6"
+(define_insn "thumb1_extendhisi2"
   [(set (match_operand:SI 0 "register_operand" "=l,l")
 	(sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "l,m")))
    (clobber (match_scratch:SI 2 "=X,l"))]
-  "TARGET_THUMB1 && arm_arch6"
+  "TARGET_THUMB1"
   "*
   {
     rtx ops[4];
     rtx mem;
 
+    if (which_alternative == 0 && !arm_arch6)
+      return \"#\";
     if (which_alternative == 0)
       return \"sxth\\t%0, %1\";
 
@@ -4433,7 +4332,10 @@ (define_insn "*thumb1_extendhisi2_insn_v
     output_asm_insn (\"mov\\t%3, %2\;ldrsh\\t%0, [%1, %3]\", ops);
     return \"\";
   }"
-  [(set_attr "length" "2,4")
+  [(set_attr_alternative "length"
+			 [(if_then_else (eq_attr "is_arch6" "yes")
+					(const_int 2) (const_int 4))
+			  (const_int 4)])
    (set_attr "type" "alu_shift,load_byte")
    (set_attr "pool_range" "*,1020")]
 )
@@ -4474,15 +4376,28 @@ (define_expand "extendhisi2_mem"
   }"
 )
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(sign_extend:SI (match_operand:HI 1 "register_operand" "")))]
+  "!arm_arch6"
+  [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 16)))
+   (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 16)))]
+{
+  operands[2] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
+})
+
 (define_insn "*arm_extendhisi2"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(sign_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
   "TARGET_ARM && arm_arch4 && !arm_arch6"
-  "ldr%(sh%)\\t%0, %1"
-  [(set_attr "type" "load_byte")
+  "@
+   #
+   ldr%(sh%)\\t%0, %1"
+  [(set_attr "length" "8,4")
+   (set_attr "type" "alu_shift,load_byte")
    (set_attr "predicable" "yes")
-   (set_attr "pool_range" "256")
-   (set_attr "neg_pool_range" "244")]
+   (set_attr "pool_range" "*,256")
+   (set_attr "neg_pool_range" "*,244")]
 )
 
 ;; ??? Check Thumb-2 pool range
@@ -4544,46 +4459,45 @@ (define_insn "*arm_extendqihi_insn"
 )
 
 (define_expand "extendqisi2"
-  [(set (match_dup 2)
-	(ashift:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "")
-		   (const_int 24)))
-   (set (match_operand:SI 0 "s_register_operand" "")
-	(ashiftrt:SI (match_dup 2)
-		     (const_int 24)))]
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extend:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "")))]
   "TARGET_EITHER"
-  "
-  {
-    if ((TARGET_THUMB || arm_arch4) && GET_CODE (operands[1]) == MEM)
-      {
-        emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-			        gen_rtx_SIGN_EXTEND (SImode, operands[1])));
-        DONE;
-      }
-
-    if (!s_register_operand (operands[1], QImode))
-      operands[1] = copy_to_mode_reg (QImode, operands[1]);
+{
+  if (!arm_arch4 && MEM_P (operands[1]))
+    operands[1] = copy_to_mode_reg (QImode, operands[1]);
 
-    if (arm_arch6)
-      {
-        emit_insn (gen_rtx_SET (VOIDmode, operands[0],
-			        gen_rtx_SIGN_EXTEND (SImode, operands[1])));
-        DONE;
-      }
+  if (!arm_arch6 && !MEM_P (operands[1]))
+    {
+      rtx t = gen_lowpart (SImode, operands[1]);
+      rtx tmp = gen_reg_rtx (SImode);
+      emit_insn (gen_ashlsi3 (tmp, t, GEN_INT (24)));
+      emit_insn (gen_ashrsi3 (operands[0], tmp, GEN_INT (24)));
+      DONE;
+    }
+})
 
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[2] = gen_reg_rtx (SImode);
-  }"
-)
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(sign_extend:SI (match_operand:QI 1 "register_operand" "")))]
+  "!arm_arch6"
+  [(set (match_dup 0) (ashift:SI (match_dup 2) (const_int 24)))
+   (set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 24)))]
+{
+  operands[2] = simplify_gen_subreg (SImode, operands[1], QImode, 0);
+})
 
 (define_insn "*arm_extendqisi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(sign_extend:SI (match_operand:QI 1 "arm_extendqisi_mem_op" "Uq")))]
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(sign_extend:SI (match_operand:QI 1 "arm_reg_or_extendqisi_mem_op" "r,Uq")))]
   "TARGET_ARM && arm_arch4 && !arm_arch6"
-  "ldr%(sb%)\\t%0, %1"
-  [(set_attr "type" "load_byte")
+  "@
+   #
+   ldr%(sb%)\\t%0, %1"
+  [(set_attr "length" "8,4")
+   (set_attr "type" "alu_shift,load_byte")
    (set_attr "predicable" "yes")
-   (set_attr "pool_range" "256")
-   (set_attr "neg_pool_range" "244")]
+   (set_attr "pool_range" "*,256")
+   (set_attr "neg_pool_range" "*,244")]
 )
 
 (define_insn "*arm_extendqisi_v6"
@@ -4611,162 +4525,78 @@ (define_insn "*arm_extendqisi2addsi"
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*thumb1_extendqisi2"
-  [(set (match_operand:SI 0 "register_operand" "=l,l")
-	(sign_extend:SI (match_operand:QI 1 "memory_operand" "V,m")))]
-  "TARGET_THUMB1 && !arm_arch6"
-  "*
-  {
-    rtx ops[3];
-    rtx mem = XEXP (operands[1], 0);
-    
-    if (GET_CODE (mem) == CONST)
-      mem = XEXP (mem, 0);
-    
-    if (GET_CODE (mem) == LABEL_REF)
-      return \"ldr\\t%0, %1\";
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(sign_extend:SI (match_operand:QI 1 "memory_operand" "")))]
+  "TARGET_THUMB1 && reload_completed"
+  [(set (match_dup 0) (match_dup 2))
+   (set (match_dup 0) (sign_extend:SI (match_dup 3)))]
+{
+  rtx addr = XEXP (operands[1], 0);
 
-    if (GET_CODE (mem) == PLUS
-        && GET_CODE (XEXP (mem, 0)) == LABEL_REF)
-      return \"ldr\\t%0, %1\";
-      
-    if (which_alternative == 0)
-      return \"ldrsb\\t%0, %1\";
-      
-    ops[0] = operands[0];
-    
-    if (GET_CODE (mem) == PLUS)
-      {
-        rtx a = XEXP (mem, 0);
-	rtx b = XEXP (mem, 1);
-	
-        ops[1] = a;
-        ops[2] = b;
+  if (GET_CODE (addr) == CONST)
+    addr = XEXP (addr, 0);
+  if (GET_CODE (addr) == PLUS
+      && REG_P (XEXP (addr, 0)) && REG_P (XEXP (addr, 1)))
+    /* No split necessary.  */
+    FAIL;
+  if (GET_CODE (addr) == PLUS
+      && !REG_P (XEXP (addr, 0)) && !REG_P (XEXP (addr, 1)))
+    FAIL;
 
-        if (GET_CODE (a) == REG)
-	  {
-	    if (GET_CODE (b) == REG)
-              output_asm_insn (\"ldrsb\\t%0, [%1, %2]\", ops);
-            else if (REGNO (a) == REGNO (ops[0]))
-	      {
-                output_asm_insn (\"ldrb\\t%0, [%1, %2]\", ops);
-		output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
-		output_asm_insn (\"asr\\t%0, %0, #24\", ops);
-	      }
-	    else
-              output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-	  }
-	else
-          {
-	    gcc_assert (GET_CODE (b) == REG);
-            if (REGNO (b) == REGNO (ops[0]))
-	      {
-                output_asm_insn (\"ldrb\\t%0, [%2, %1]\", ops);
-		output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
-		output_asm_insn (\"asr\\t%0, %0, #24\", ops);
-	      }
-	    else
-              output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-          }
-      }
-    else if (GET_CODE (mem) == REG && REGNO (ops[0]) == REGNO (mem))
-      {
-        output_asm_insn (\"ldrb\\t%0, [%0, #0]\", ops);
-	output_asm_insn (\"lsl\\t%0, %0, #24\", ops);
-	output_asm_insn (\"asr\\t%0, %0, #24\", ops);
-      }
-    else
-      {
-        ops[1] = mem;
-        ops[2] = const0_rtx;
-	
-        output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-      }
-    return \"\";
-  }"
-  [(set_attr "length" "2,6")
-   (set_attr "type" "load_byte,load_byte")
-   (set_attr "pool_range" "32,32")]
-)
+  if (reg_overlap_mentioned_p (operands[0], addr))
+    {
+      rtx t = gen_lowpart (QImode, operands[0]);
+      emit_move_insn (t, operands[1]);
+      emit_insn (gen_thumb1_extendqisi2 (operands[0], t));
+      DONE;
+    }
+  if (REG_P (addr))
+    {
+      addr = gen_rtx_PLUS (Pmode, addr, operands[0]);
+      operands[2] = const0_rtx;
+    }
+  else if (GET_CODE (addr) != PLUS)
+    FAIL;
+  else if (REG_P (XEXP (addr, 0)))
+    {
+      operands[2] = XEXP (addr, 1);
+      addr = gen_rtx_PLUS (Pmode, XEXP (addr, 0), operands[0]);
+    }
+  else
+    {
+      operands[2] = XEXP (addr, 0);
+      addr = gen_rtx_PLUS (Pmode, XEXP (addr, 1), operands[0]);
+    }
+  operands[3] = change_address (operands[1], QImode, addr);
+})
 
-(define_insn "*thumb1_extendqisi2_v6"
+(define_insn "thumb1_extendqisi2"
   [(set (match_operand:SI 0 "register_operand" "=l,l,l")
 	(sign_extend:SI (match_operand:QI 1 "nonimmediate_operand" "l,V,m")))]
-  "TARGET_THUMB1 && arm_arch6"
-  "*
-  {
-    rtx ops[3];
-    rtx mem;
-
-    if (which_alternative == 0)
-      return \"sxtb\\t%0, %1\";
+  "TARGET_THUMB1"
+{
+  rtx addr;
 
-    mem = XEXP (operands[1], 0);
-    
-    if (GET_CODE (mem) == CONST)
-      mem = XEXP (mem, 0);
-    
-    if (GET_CODE (mem) == LABEL_REF)
-      return \"ldr\\t%0, %1\";
+  if (which_alternative == 0 && arm_arch6)
+    return "sxtb\\t%0, %1";
+  if (which_alternative == 0)
+    return "#";
 
-    if (GET_CODE (mem) == PLUS
-        && GET_CODE (XEXP (mem, 0)) == LABEL_REF)
-      return \"ldr\\t%0, %1\";
-      
-    if (which_alternative == 0)
-      return \"ldrsb\\t%0, %1\";
+  addr = XEXP (operands[1], 0);
+  if (GET_CODE (addr) == PLUS
+      && REG_P (XEXP (addr, 0)) && REG_P (XEXP (addr, 1)))
+    return "ldrsb\\t%0, %1";
       
-    ops[0] = operands[0];
-    
-    if (GET_CODE (mem) == PLUS)
-      {
-        rtx a = XEXP (mem, 0);
-	rtx b = XEXP (mem, 1);
-	
-        ops[1] = a;
-        ops[2] = b;
-
-        if (GET_CODE (a) == REG)
-	  {
-	    if (GET_CODE (b) == REG)
-              output_asm_insn (\"ldrsb\\t%0, [%1, %2]\", ops);
-            else if (REGNO (a) == REGNO (ops[0]))
-	      {
-                output_asm_insn (\"ldrb\\t%0, [%1, %2]\", ops);
-		output_asm_insn (\"sxtb\\t%0, %0\", ops);
-	      }
-	    else
-              output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-	  }
-	else
-          {
-	    gcc_assert (GET_CODE (b) == REG);
-            if (REGNO (b) == REGNO (ops[0]))
-	      {
-                output_asm_insn (\"ldrb\\t%0, [%2, %1]\", ops);
-		output_asm_insn (\"sxtb\\t%0, %0\", ops);
-	      }
-	    else
-              output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-          }
-      }
-    else if (GET_CODE (mem) == REG && REGNO (ops[0]) == REGNO (mem))
-      {
-        output_asm_insn (\"ldrb\\t%0, [%0, #0]\", ops);
-	output_asm_insn (\"sxtb\\t%0, %0\", ops);
-      }
-    else
-      {
-        ops[1] = mem;
-        ops[2] = const0_rtx;
-	
-        output_asm_insn (\"mov\\t%0, %2\;ldrsb\\t%0, [%1, %0]\", ops);
-      }
-    return \"\";
-  }"
-  [(set_attr "length" "2,2,4")
-   (set_attr "type" "alu_shift,load_byte,load_byte")
-   (set_attr "pool_range" "*,32,32")]
+  return "#";
+}
+  [(set_attr_alternative "length"
+			 [(if_then_else (eq_attr "is_arch6" "yes")
+					(const_int 2) (const_int 4))
+			  (const_int 2)
+			  (if_then_else (eq_attr "is_arch6" "yes")
+					(const_int 4) (const_int 6))])
+   (set_attr "type" "alu_shift,load_byte,load_byte")]
 )
 
 (define_expand "extendsfdf2"
Index: testsuite/gcc.target/arm/pr42172-1.c
===================================================================
--- testsuite/gcc.target/arm/pr42172-1.c	(revision 0)
+++ testsuite/gcc.target/arm/pr42172-1.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-options "-O2" }  */
+
+struct A {
+  unsigned int f1 : 3;
+  unsigned int f2 : 3;
+  unsigned int f3 : 1;
+  unsigned int f4 : 1;
+
+};
+
+void init_A (struct A *this)
+{
+  this->f1 = 0;
+  this->f2 = 1;
+  this->f3 = 0;
+  this->f4 = 0;
+}
+
+/* { dg-final { scan-assembler-times "ldr" 1 } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Ping: ARM patch (PR42172): Rework the {sign,zero}-extend patterns
  2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
@ 2010-06-18  0:37 ` Bernd Schmidt
  2010-06-21 14:17   ` Ping^2 " Bernd Schmidt
  2010-07-01 13:11 ` Richard Earnshaw
  1 sibling, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-18  0:37 UTC (permalink / raw)
  To: GCC Patches

Ping.


Bernd

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html

	* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
	and ZERO_EXTEND.
	(arm_rtx_costs_1): Likewise.
	(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
	* config/arm/arm.md (is_arch6): New attribute.
	(zero_extendhisi2, zero_extendqisi2, extendhisi2,
	extendqisi2): Tighten the code somewhat, avoiding invalid
	RTL to occur in the expander patterns.
	(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
	(thumb1_zero_extendhisi2_v6): Delete.
	(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
	(thumb1_extendhisi2_v6): Delete.
	(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
	(thumb1_extendqisi2_v6): Delete.
	(zero_extendhisi2 for register input splitter): New.
	(zero_extendqisi2 for register input splitter): New.
	(thumb1_extendhisi2 for register input splitter): New.
	(extendhisi2 for register input splitter): New.
	(extendqisi2 for register input splitter): New.
	(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
	(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
	and add support for a register alternative requiring a split.
	(thumb1_zero_extendqisi2): Likewise.
	(arm_zero_extendqisi2): Likewise.
	(arm_extendhisi2): Likewise.
	(arm_extendqisi2): Likewise.

	* gcc.target/arm/pr42172-1.c: New test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Ping^2 ARM patch (PR42172): Rework the {sign,zero}-extend patterns
  2010-06-18  0:37 ` Ping: " Bernd Schmidt
@ 2010-06-21 14:17   ` Bernd Schmidt
  2010-06-28 12:26     ` Ping^3 " Bernd Schmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-21 14:17 UTC (permalink / raw)
  To: GCC Patches

Ping^2


Bernd

> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html
> 
> 	* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
> 	and ZERO_EXTEND.
> 	(arm_rtx_costs_1): Likewise.
> 	(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
> 	* config/arm/arm.md (is_arch6): New attribute.
> 	(zero_extendhisi2, zero_extendqisi2, extendhisi2,
> 	extendqisi2): Tighten the code somewhat, avoiding invalid
> 	RTL to occur in the expander patterns.
> 	(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
> 	(thumb1_zero_extendhisi2_v6): Delete.
> 	(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
> 	(thumb1_extendhisi2_v6): Delete.
> 	(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
> 	(thumb1_extendqisi2_v6): Delete.
> 	(zero_extendhisi2 for register input splitter): New.
> 	(zero_extendqisi2 for register input splitter): New.
> 	(thumb1_extendhisi2 for register input splitter): New.
> 	(extendhisi2 for register input splitter): New.
> 	(extendqisi2 for register input splitter): New.
> 	(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
> 	(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
> 	and add support for a register alternative requiring a split.
> 	(thumb1_zero_extendqisi2): Likewise.
> 	(arm_zero_extendqisi2): Likewise.
> 	(arm_extendhisi2): Likewise.
> 	(arm_extendqisi2): Likewise.
> 
> 	* gcc.target/arm/pr42172-1.c: New test.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Ping^3 ARM patch (PR42172): Rework the {sign,zero}-extend patterns
  2010-06-21 14:17   ` Ping^2 " Bernd Schmidt
@ 2010-06-28 12:26     ` Bernd Schmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Bernd Schmidt @ 2010-06-28 12:26 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Earnshaw

Ping^3


Bernd

>> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00832.html
>>
>> 	* config/arm/arm.c (thumb1_rtx_costs): Improve support for SIGN_EXTEND
>> 	and ZERO_EXTEND.
>> 	(arm_rtx_costs_1): Likewise.
>> 	(arm_size_rtx_costs): Use arm_rtx_costs_1 for these codes.
>> 	* config/arm/arm.md (is_arch6): New attribute.
>> 	(zero_extendhisi2, zero_extendqisi2, extendhisi2,
>> 	extendqisi2): Tighten the code somewhat, avoiding invalid
>> 	RTL to occur in the expander patterns.
>> 	(thumb1_zero_extendhisi2): Merge with thumb1_zero_extendhisi2_v6.
>> 	(thumb1_zero_extendhisi2_v6): Delete.
>> 	(thumb1_extendhisi2): Merge with thumb1_extendhisi2_v6.
>> 	(thumb1_extendhisi2_v6): Delete.
>> 	(thumb1_extendqisi2): Merge with thumb1_extendhisi2_v6.
>> 	(thumb1_extendqisi2_v6): Delete.
>> 	(zero_extendhisi2 for register input splitter): New.
>> 	(zero_extendqisi2 for register input splitter): New.
>> 	(thumb1_extendhisi2 for register input splitter): New.
>> 	(extendhisi2 for register input splitter): New.
>> 	(extendqisi2 for register input splitter): New.
>> 	(TARGET_THUMB1 extendqisi2 for memory input splitter): New.
>> 	(arm_zero_extendhisi2): Allow nonimmediate_operand for operand 1,
>> 	and add support for a register alternative requiring a split.
>> 	(thumb1_zero_extendqisi2): Likewise.
>> 	(arm_zero_extendqisi2): Likewise.
>> 	(arm_extendhisi2): Likewise.
>> 	(arm_extendqisi2): Likewise.
>>
>> 	* gcc.target/arm/pr42172-1.c: New test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARM patch (PR42172): Rework the {sign,zero}-extend patterns
  2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
  2010-06-18  0:37 ` Ping: " Bernd Schmidt
@ 2010-07-01 13:11 ` Richard Earnshaw
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Earnshaw @ 2010-07-01 13:11 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches


On Wed, 2010-06-09 at 00:34 +0200, Bernd Schmidt wrote:
> PR42172 is a case where we generate terrible code for bitfield
> assignments on Thumb1.  Part of the problem is that we fail to optimize
> 
> 	strb	r3, [r0]
> 	ldrb	r3, [r0]
> 
> (insn 11 (set (mem:QI (reg:SI 133) (subreg:QI (reg:SI 136) 0))
> (insn 12 (set (reg:SI 140) (zero_extend:SI (mem:QI (reg/v/f:SI 133)))))
> 
> cse will of course try to replace the second MEM with the previous value
> in a register, but the ARM backend (if !arch_v6) does not accept
> zero_extend from a reg, only from a mem.  This makes some sense at first
> glance as there isn't a specific instruction, so the expander creates a
> sequence of shifts when given a REG input.
> 
> However, to make CSE do its job, I believe it's best to make the ARM
> extension patterns accept registers.  We should avoid creating such
> patterns when possible, both at expand time (generating the sequence of
> shifts as before), and in the combiner (by setting appropriate
> rtx_costs), since the two-shift sequence can often be optimized further.
> 
> The following patch does that, and it also fixes a few other problems:
> 
> * The expanders are a little ugly:
>   [(set (match_dup 2)
> 	(ashift:SI (match_operand:HI 1 "nonimmediate_operand" "")
> 		   (const_int 16)))
>   This is invalid RTL; it works because we modify operand[1], but I've
>   changed the patterns to avoid this (and to make them a lot smaller,
>   and, IMO, clearer)
> * Slightly broken Thumb1 patterns:
>     if (which_alternative == 0)
>       return \"sxtb\\t%0, %1\";
>     [....]
>     if (which_alternative == 0)
>       return \"ldrsb\\t%0, %1\";
> * Removed massive duplication of Thumb1 patterns for v6 and !v6
> * Removed apparently useless code to handle LABEL_REFs in addresses
>   in Thumb1 extend patterns.
> 
> Here are some examples of the code generation changes:
>         strb    r3, [r4, #4]
>         ldrb    r2, [r8, #4]    @ zero_extendqisi2
> -       ldrb    r3, [r4, #4]    @ zero_extendqisi2
> +       and     r3, r3, #255
> ====
> -       ldrb    r3, [sp, #36]   @ zero_extendqisi2
> -       bic     r3, r3, #176
> -       orr     r3, r3, #64
> -       strb    r3, [sp, #36]
> -       ldrb    r3, [sp, #36]   @ zero_extendqisi2
> -       bic     r3, r3, #10
> -       orr     r3, r3, #5
> +       mov     r3, #69
> ====
> -       strb    r2, [ip, #4]
> -       ldrb    r2, [ip, #4]    @ zero_extendqisi2
>         bic     r2, r2, #2
>         strb    r2, [ip, #4]
> ====
> -       ldrh    r3, [r5, #2]
> -       lsl     r3, r3, #16
> -       asr     r3, r3, #16
> -       add     r2, r3, #1
> +       mov     r0, #2
> +       ldrsh   r3, [r5, r0]
> +       add     r1, r3, #1
> 
> On the whole things looked very good in the testcases I looked at.
> Unfortunately PR42172 isn't completely solved by this, but it gets a lot
> better.
> 
> Regression tested on:
> Target is arm-none-linux-gnueabi
> Host   is i686-pc-linux-gnu
> 
> Schedule of variations:
>     qemu-system-armv7/arch=armv7-a/thumb
>     qemu-system-armv7/thumb
>     qemu-system-armv7
> 

This is ok.

Minor coding style nit, however:  

+  if (reg_overlap_mentioned_p (operands[0], addr))
+    {
+      ...
+    }
+  if (REG_P (addr))


There should be a blank line at the end of each if...[else ...]
sequence.

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-07-01 13:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-08 22:48 ARM patch (PR42172): Rework the {sign,zero}-extend patterns Bernd Schmidt
2010-06-18  0:37 ` Ping: " Bernd Schmidt
2010-06-21 14:17   ` Ping^2 " Bernd Schmidt
2010-06-28 12:26     ` Ping^3 " Bernd Schmidt
2010-07-01 13:11 ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).