From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-291220-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7962 invoked by alias); 5 May 2011 09:42:30 -0000
Received: (qmail 7950 invoked by uid 22791); 5 May 2011 09:42:28 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,TW_CB
X-Spam-Check-By: sourceware.org
Received: from service87.mimecast.com (HELO service87.mimecast.com) (94.185.240.25)    by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Thu, 05 May 2011 09:42:11 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 05 May 2011 10:42:08 +0100
Received: from [10.1.67.34] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0);	 Thu, 5 May 2011 10:42:05 +0100
Subject: Re: [PATCH, ARM] PR47855 Compute attr length for thumb2 insns, 3/3	(issue4475042)
From: Richard Earnshaw <rearnsha@arm.com>
To: Guozhi Wei <carrot@google.com>
Cc: reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org
In-Reply-To: <20110505065105.BBA6820A24@guozhiwei.sha.corp.google.com>
References: <20110505065105.BBA6820A24@guozhiwei.sha.corp.google.com>
Date: Thu, 05 May 2011 09:43:00 -0000
Message-Id: <1304588525.19537.67.camel@e102346-lin.cambridge.arm.com>
Mime-Version: 1.0
X-MC-Unique: 111050510420802601
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-05/txt/msg00391.txt.bz2

On Thu, 2011-05-05 at 14:51 +0800, Guozhi Wei wrote:
> Hi
>=20
> This is the third part of the fixing for
>=20
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D47855
>=20
> This patch contains the length computation/refinement for insn patterns
> "*thumb2_movsi_insn", "*thumb2_cbz" and "*thumb2_cbnz".
>=20
> At the same time this patch revealed two bugs. The first is the maximum o=
ffset
> of cbz/cbnz, it should be 126, but it is 128 in patterns "*thumb2_cbz" and
> "*thumb2_cbnz". The second is that only 2-register form of shift instruct=
ions
> can be 16 bit, but 3-register form is allowed in "*thumb2_shiftsi3_short"=
 and
> related peephole2. The fix is also contained in this patch.
>=20
> The patch has been tested on arm qemu.
>=20
> thanks
> Carrot
>=20
>=20
> 2011-05-05  Guozhi Wei  <carrot@google.com>
>=20
> 	PR target/47855
> 	* config/arm/thumb2.md (thumb2_movsi_insn): Add length addtribute.
> 	(thumb2_shiftsi3_short and peephole2): Remove 3-register case.
> 	(thumb2_cbz): Refine length computation.
> 	(thumb2_cbnz): Likewise.
>=20

Hmm, although these changes are all related to length calculations, they
are really three patches that are unrelated to each other.  It would be
easier to review this if they were kept separate.

1) thumb2_shiftsi3_short
This appears to be a straight bug.  We are putting out a 32-bit
instruction when we are claiming it to be only 16 bits.  This is OK.

2) thumb2_movsi_insn
There are two things here.
a) Thumb2 has a 16-bit move instruction for all core
register-to-register transfers, so the separation of alternatives 1 and
2 is unnecessary -- just code these as "rk".=20=20
b) The ldm form does not support unaligned memory accesses.  I'm aware
that work is being done to add unaligned support to GCC for ARM, so I
need to find out whether this patch will interfere with those changes.
I'll try to find out what the situation is here and get back to you.

3) thumb2_cbz and thumb2_cbnz
The range calculations look wrong here.  Remember that the 'pc' as far
as GCC is concerned is the address of the start of the insn.  So for a
backwards branch you need to account for all the bytes in the insn
pattern that occur before the branch instruction itself, and secondly
you also have to remember that the 'pc' that the CPU uses is the address
of the branch instruction plus 4.  All these conspire to reduce the
backwards range of a short branch to several bytes less than the 256
that you currently have coded.


R.

> Index: config/arm/thumb2.md
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- config/arm/thumb2.md	(revision 173350)
> +++ config/arm/thumb2.md	(working copy)
> @@ -165,25 +165,49 @@
>  ;; regs.  The high register alternatives are not taken into account when
>  ;; choosing register preferences in order to reflect their expense.
>  (define_insn "*thumb2_movsi_insn"
> -  [(set (match_operand:SI 0 "nonimmediate_operand" "=3Drk,r,r,r,l ,*hk,m=
,*m")
> -	(match_operand:SI 1 "general_operand"	   "rk ,I,K,j,mi,*mi,l,*hk"))]
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=3Dl,rk,r,r,r,l ,*rk=
,Uu,*m")
> +	(match_operand:SI 1 "general_operand"      "l ,rk,I,K,j,Uu,*mi,l ,*rk")=
)]
>    "TARGET_THUMB2 && ! TARGET_IWMMXT
>     && !(TARGET_HARD_FLOAT && TARGET_VFP)
>     && (   register_operand (operands[0], SImode)
>         || register_operand (operands[1], SImode))"
> -  "@
> -   mov%?\\t%0, %1
> -   mov%?\\t%0, %1
> -   mvn%?\\t%0, #%B1
> -   movw%?\\t%0, %1
> -   ldr%?\\t%0, %1
> -   ldr%?\\t%0, %1
> -   str%?\\t%1, %0
> -   str%?\\t%1, %0"
> -  [(set_attr "type" "*,*,*,*,load1,load1,store1,store1")
> +  "*
> +  switch (which_alternative)
> +    {
> +    case 0: return \"mov%?\\t%0, %1\";
> +    case 1: return \"mov%?\\t%0, %1\";
> +    case 2: return \"mov%?\\t%0, %1\";
> +    case 3: return \"mvn%?\\t%0, #%B1\";
> +    case 4: return \"movw%?\\t%0, %1\";
> +
> +    case 5:
> +      if (GET_CODE (XEXP (operands[1], 0)) =3D=3D POST_INC)
> +	{
> +	  operands[1] =3D XEXP (XEXP (operands[1], 0), 0);
> +	  return \"ldm%(ia%)\t%1!, {%0}\";
> +	}
> +      else
> +	return \"ldr%?\\t%0, %1\";
> +
> +    case 6: return \"ldr%?\\t%0, %1\";
> +
> +    case 7:
> +      if (GET_CODE (XEXP (operands[0], 0)) =3D=3D POST_INC)
> +	{
> +	  operands[0] =3D XEXP (XEXP (operands[0], 0), 0);
> +	  return \"stm%(ia%)\t%0!, {%1}\";
> +	}
> +      else
> +	return \"str%?\\t%1, %0\";
> +
> +    case 8: return \"str%?\\t%1, %0\";
> +    default: gcc_unreachable ();
> +    }"
> +  [(set_attr "type" "*,*,*,*,*,load1,load1,store1,store1")
>     (set_attr "predicable" "yes")
> -   (set_attr "pool_range" "*,*,*,*,1020,4096,*,*")
> -   (set_attr "neg_pool_range" "*,*,*,*,0,0,*,*")]
> +   (set_attr "length" "2,4,4,4,4,2,4,2,4")
> +   (set_attr "pool_range" "*,*,*,*,*,1020,4096,*,*")
> +   (set_attr "neg_pool_range" "*,*,*,*,*,0,0,*,*")]
>  )
>=20=20
>  (define_insn "tls_load_dot_plus_four"
> @@ -685,7 +709,8 @@
>    "TARGET_THUMB2
>     && peep2_regno_dead_p(0, CC_REGNUM)
>     && ((GET_CODE(operands[3]) !=3D ROTATE && GET_CODE(operands[3]) !=3D =
ROTATERT)
> -       || REG_P(operands[2]))"
> +       || REG_P(operands[2]))
> +   && (CONSTANT_P (operands[2]) || (operands[0] =3D=3D operands[1]))"
>    [(parallel
>      [(set (match_dup 0)
>  	  (match_op_dup 3
> @@ -696,10 +721,10 @@
>  )
>=20=20
>  (define_insn "*thumb2_shiftsi3_short"
> -  [(set (match_operand:SI   0 "low_register_operand" "=3Dl")
> +  [(set (match_operand:SI   0 "low_register_operand" "=3Dl,l")
>  	(match_operator:SI  3 "shift_operator"
> -	 [(match_operand:SI 1 "low_register_operand"  "l")
> -	  (match_operand:SI 2 "low_reg_or_int_operand" "lM")]))
> +	 [(match_operand:SI 1 "low_register_operand"  "0,l")
> +	  (match_operand:SI 2 "low_reg_or_int_operand" "l,M")]))
>     (clobber (reg:CC CC_REGNUM))]
>    "TARGET_THUMB2 && reload_completed
>     && ((GET_CODE(operands[3]) !=3D ROTATE && GET_CODE(operands[3]) !=3D =
ROTATERT)
> @@ -707,7 +732,7 @@
>    "* return arm_output_shift(operands, 2);"
>    [(set_attr "predicable" "yes")
>     (set_attr "shift" "1")
> -   (set_attr "length" "2")
> +   (set_attr "length" "2,2")
>     (set (attr "type") (if_then_else (match_operand 2 "const_int_operand"=
 "")
>  		      (const_string "alu_shift")
>  		      (const_string "alu_shift_reg")))]
> @@ -965,13 +990,23 @@
>    else
>      return \"cmp\\t%0, #0\;beq\\t%l1\";
>    "
> -  [(set (attr "length")=20
> -        (if_then_else
> -	    (and (ge (minus (match_dup 1) (pc)) (const_int 2))
> -	         (le (minus (match_dup 1) (pc)) (const_int 128))
> -	         (eq (symbol_ref ("which_alternative")) (const_int 0)))
> -	    (const_int 2)
> -	    (const_int 8)))]
> +  [(set (attr "length")
> +	(if_then_else
> +	    (eq (symbol_ref ("which_alternative")) (const_int 0))
> +	    (if_then_else
> +		(and (ge (minus (match_dup 1) (pc)) (const_int 2))
> +		     (le (minus (match_dup 1) (pc)) (const_int 126)))
> +		(const_int 2)
> +		(if_then_else
> +		    (and (ge (minus (match_dup 1) (pc)) (const_int -256))
> +			 (le (minus (match_dup 1) (pc)) (const_int 254)))
> +		    (const_int 4)
> +		    (const_int 6)))
> +	    (if_then_else
> +		(and (ge (minus (match_dup 1) (pc)) (const_int -256))
> +		     (le (minus (match_dup 1) (pc)) (const_int 254)))
> +		(const_int 6)
> +		(const_int 8))))]
>  )
>=20=20
>  (define_insn "*thumb2_cbnz"
> @@ -988,13 +1023,23 @@
>    else
>      return \"cmp\\t%0, #0\;bne\\t%l1\";
>    "
> -  [(set (attr "length")=20
> -        (if_then_else
> -	    (and (ge (minus (match_dup 1) (pc)) (const_int 2))
> -	         (le (minus (match_dup 1) (pc)) (const_int 128))
> -	         (eq (symbol_ref ("which_alternative")) (const_int 0)))
> -	    (const_int 2)
> -	    (const_int 8)))]
> +  [(set (attr "length")
> +	(if_then_else
> +	    (eq (symbol_ref ("which_alternative")) (const_int 0))
> +	    (if_then_else
> +		(and (ge (minus (match_dup 1) (pc)) (const_int 2))
> +		     (le (minus (match_dup 1) (pc)) (const_int 126)))
> +		(const_int 2)
> +		(if_then_else
> +		    (and (ge (minus (match_dup 1) (pc)) (const_int -256))
> +			 (le (minus (match_dup 1) (pc)) (const_int 254)))
> +		    (const_int 4)
> +		    (const_int 6)))
> +	    (if_then_else
> +		(and (ge (minus (match_dup 1) (pc)) (const_int -256))
> +		     (le (minus (match_dup 1) (pc)) (const_int 254)))
> +		(const_int 6)
> +		(const_int 8))))]
>  )
>=20=20
>  ;; 16-bit complement
>=20
> --
> This patch is available for review at http://codereview.appspot.com/44750=
42