From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11816 invoked by alias); 13 Sep 2010 12:51:50 -0000 Received: (qmail 11805 invoked by uid 22791); 13 Sep 2010 12:51:48 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 13 Sep 2010 12:51:39 +0000 Received: from wpaz5.hot.corp.google.com (wpaz5.hot.corp.google.com [172.24.198.69]) by smtp-out.google.com with ESMTP id o8DCpbXj030321 for ; Mon, 13 Sep 2010 05:51:37 -0700 Received: from yxg6 (yxg6.prod.google.com [10.190.2.134]) by wpaz5.hot.corp.google.com with ESMTP id o8DCpRvJ027877 for ; Mon, 13 Sep 2010 05:51:36 -0700 Received: by yxg6 with SMTP id 6so1616690yxg.6 for ; Mon, 13 Sep 2010 05:51:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.84.16 with SMTP id m16mr1921170ybl.354.1284382295552; Mon, 13 Sep 2010 05:51:35 -0700 (PDT) Received: by 10.150.180.8 with HTTP; Mon, 13 Sep 2010 05:51:35 -0700 (PDT) In-Reply-To: References: <1282658136.22948.34.camel@e102325-lin.cambridge.arm.com> <1283354531.25967.50.camel@e102346-lin.cambridge.arm.com> Date: Mon, 13 Sep 2010 14:54:00 -0000 Message-ID: Subject: Re: [PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words From: Carrot Wei To: Richard Earnshaw Cc: ramana.radhakrishnan@arm.com, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-09/txt/msg01104.txt.bz2 Ping On Sat, Sep 4, 2010 at 8:41 PM, Carrot Wei wrote: > > On Wed, Sep 1, 2010 at 11:22 PM, Richard Earnshaw wrot= e: > > If you submit an updated patch, please re-include the changelog entry, > > even if it's the same. > > > > There are two obvious problems with this patch: > > > > 1) You presume that ldrd is always cheaper than ldm(2 regs). =A0This is= n't > > the case on Cortex-a9. =A0I'm not expecting you to work out all the > > details of when A9 should use LDM and when it should use ldrd, but your > > code needs to ascertain the costs of each alternative and make a > > decision based on that answer, not on a static choice. > > > > 2) Your code fails to check for volatile mems. =A0These must not be > > transformed and the original load/store instructions must be preserved. > > > > 1. A new function thumb2_prefer_ldmstm is used to choose ldm/stm or ldrd/= strd. > The default behavior is to output ldrd/strd. One should update this funct= ion if > ldm/stm is better. > > 2. Function thumb2_legitimate_ldrd_p is updated to check volatile memory = access. > > Following is the new patch > > ChangeLog: > 2010-09-04 =A0Wei Guozhi =A0 > > =A0 =A0 =A0 =A0PR target/45335 > =A0 =A0 =A0 =A0* gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1, > =A0 =A0 =A0 =A0thumb2_ldrd_reg2 and peephole2): New insn pattern and rela= ted > =A0 =A0 =A0 =A0peephole2. > =A0 =A0 =A0 =A0(thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peeph= ole2): > =A0 =A0 =A0 =A0New insn pattern and related peephole2. > =A0 =A0 =A0 =A0* gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New fun= ction. > =A0 =A0 =A0 =A0(thumb2_check_ldrd_operands): New function. > =A0 =A0 =A0 =A0(thumb2_prefer_ldmstm): New function. > =A0 =A0 =A0 =A0* gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): = New prototype. > =A0 =A0 =A0 =A0(thumb2_check_ldrd_operands): New prototype. > =A0 =A0 =A0 =A0(thumb2_prefer_ldmstm): New prototype. > =A0 =A0 =A0 =A0* gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm= 2_db): > =A0 =A0 =A0 =A0Change the ldm/stm patterns with 2 words to ARM only. > =A0 =A0 =A0 =A0* gcc/config/arm/constraints.md (Py): New thumb2 constant = constraint > =A0 =A0 =A0 =A0suitable to ldrd/strd instructions. > > > 2010-09-04 =A0Wei Guozhi =A0 > > =A0 =A0 =A0 =A0PR target/45335 > =A0 =A0 =A0 =A0* gcc.target/arm/pr45335.c: New test. > =A0 =A0 =A0 =A0* gcc.target/arm/pr40457-1.c: Changed to load 3 words. > =A0 =A0 =A0 =A0* gcc.target/arm/pr40457-2.c: Changed to store 3 words. > > > > Index: thumb2.md > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- thumb2.md =A0 (revision 163853) > +++ thumb2.md =A0 (working copy) > @@ -1257,3 +1257,226 @@ (define_peephole2 > =A0 " > =A0 operands[2] =3D GEN_INT (32 - INTVAL (operands[2])); > =A0 ") > + > +(define_insn "*thumb2_ldrd" > + =A0[(parallel [(set (match_operand:SI 0 "s_register_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (plus:SI > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 2 "s_register_operand" "rk") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 3 "const_int_operand" "Py")))) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (match_operand:SI 1 "s_register_operand" "= ") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (plus:SI (match_dup 2) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:S= I 4 "const_int_operand" "Py"))))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operands[2]= , operands[3], operands[4], 1)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset1 =3D INTVAL (operands[3]); > + =A0 =A0HOST_WIDE_INT offset2 =3D INTVAL (operands[4]); > + =A0 =A0if (offset1 > offset2) > + =A0 =A0 =A0{ > + =A0 =A0 =A0 /* Swap the operands so that memory [base+offset1] is loade= d into > + =A0 =A0 =A0 =A0 =A0operands[0]. =A0*/ > + =A0 =A0 =A0 rtx tmp =3D operands[0]; > + =A0 =A0 =A0 operands[0] =3D operands[1]; > + =A0 =A0 =A0 operands[1] =3D tmp; > + =A0 =A0 =A0 tmp =3D operands[3]; > + =A0 =A0 =A0 operands[3] =3D operands[4]; > + =A0 =A0 =A0 operands[4] =3D tmp; > + =A0 =A0 =A0 offset1 =3D INTVAL (operands[3]); > + =A0 =A0 =A0 offset2 =3D INTVAL (operands[4]); > + =A0 =A0 =A0} > + =A0 =A0if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operands[2], op= erands[3], operands[4], 1)) > + =A0 =A0 =A0return \"ldmdb\\t%2, {%0, %1}\"; > + =A0 =A0else if (fix_cm3_ldrd && (operands[2] =3D=3D operands[0])) > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (offset1 <=3D -256) > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"sub\\t%2, %2, %n3\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2]\", operands); > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands); > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 return \"\"; > + =A0 =A0 =A0} > + =A0 =A0else > + =A0 =A0 =A0return \"ldrd\\t%0, %1, [%2, %3]\"; > + =A0}" > +) > + > +(define_insn "*thumb2_ldrd_reg1" > + =A0[(parallel [(set (match_operand:SI 0 "s_register_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (match_operand:SI 2 "s_regis= ter_operand" "rk"))) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (match_operand:SI 1 "s_register_operand" "= ") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (plus:SI (match_dup 2) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:S= I 3 "const_int_operand" "Py"))))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 operands[2], 0, operands[3], 1)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset2 =3D INTVAL (operands[3]); > + =A0 =A0if (offset2 =3D=3D 4) > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operand= s[2], 0, operands[3], 1)) > + =A0 =A0 =A0 =A0 return \"ldmia\\t%2, {%0, %1}\"; > + =A0 =A0 =A0 if (fix_cm3_ldrd && (operands[2] =3D=3D operands[0])) > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2]\", operands); > + =A0 =A0 =A0 =A0 =A0 return \"\"; > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 return \"ldrd\\t%0, %1, [%2]\"; > + =A0 =A0 =A0} > + =A0 =A0else > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (fix_cm3_ldrd && (operands[2] =3D=3D operands[1])) > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands); > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 return \"ldrd\\t%1, %0, [%2, %3]\"; > + =A0 =A0 =A0} > + =A0}" > +) > + > +(define_insn "*thumb2_ldrd_reg2" > + =A0[(parallel [(set (match_operand:SI 0 "s_register_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (plus:SI > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 2 "s_register_operand" "rk") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 3 "const_int_operand" "Py")))) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (match_operand:SI 1 "s_register_operand" "= ") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(mem:SI (match_dup 2)))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 operands[2], operands[3], 0, 1)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset1 =3D INTVAL (operands[3]); > + =A0 =A0if (offset1 =3D=3D -4) > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (fix_cm3_ldrd && (operands[2] =3D=3D operands[0])) > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands); > + =A0 =A0 =A0 =A0 =A0 return \"\"; > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 return \"ldrd\\t%0, %1, [%2, %3]\"; > + =A0 =A0 =A0} > + =A0 =A0else > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operand= s[2], operands[3], 0, 1)) > + =A0 =A0 =A0 =A0 return \"ldmia\\t%2, {%1, %0}\"; > + =A0 =A0 =A0 if (fix_cm3_ldrd && (operands[2] =3D=3D operands[1])) > + =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands); > + =A0 =A0 =A0 =A0 =A0 output_asm_insn (\"ldr\\t%1, [%2]\", operands); > + =A0 =A0 =A0 =A0 =A0 return \"\"; > + =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 return \"ldrd\\t%1, %0, [%2]\"; > + =A0 =A0 =A0} > + =A0}" > +) > + > +(define_peephole2 > + =A0[(set (match_operand:SI 0 "s_register_operand" "") > + =A0 =A0 =A0 (match_operand:SI 2 "memory_operand" "")) > + =A0 (set (match_operand:SI 1 "s_register_operand" "") > + =A0 =A0 =A0 (match_operand:SI 3 "memory_operand" ""))] > + =A0"TARGET_THUMB2 && thumb2_legitimate_ldrd_p (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 operands[2], operands[3], 1)" > + =A0[(parallel [(set (match_operand:SI 0 "s_register_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 2 "memory_operand"= "")) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (match_operand:SI 1 "s_register_operand" "= ") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 3 "memory_operand"= ""))])] > + =A0"" > +) > + > +(define_insn "*thumb2_strd" > + =A0[(parallel [(set (mem:SI > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (plus:SI (match_operand:SI = 2 "s_register_operand" "rk") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_o= perand:SI 3 "const_int_operand" "Py"))) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 0 "s_register_oper= and" "")) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (mem:SI (plus:SI (match_dup 2) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_o= perand:SI 4 "const_int_operand" "Py"))) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 1 "s_register_oper= and" ""))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operands[2]= , operands[3], operands[4], 0)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset1 =3D INTVAL (operands[3]); > + =A0 =A0HOST_WIDE_INT offset2 =3D INTVAL (operands[4]); > + =A0 =A0if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operands[2], op= erands[3], operands[4], 0)) > + =A0 =A0 =A0return \"stmdb\\t%2, {%0, %1}\"; > + =A0 =A0if (offset1 < offset2 ) > + =A0 =A0 =A0return \"strd\\t%0, %1, [%2, %3]\"; > + =A0 =A0else > + =A0 =A0 =A0return \"strd\\t%1, %0, [%2, %4]\"; > + =A0}" > +) > + > +(define_insn "*thumb2_strd_reg1" > + =A0[(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "r= k")) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 0 "s_register_oper= and" "")) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (mem:SI (plus:SI (match_dup 2) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 3 "const_int_operand" "Py"))) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 1 "s_register_oper= and" ""))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 operands[2], 0, operands[3], 0)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset2 =3D INTVAL (operands[3]); > + =A0 =A0if (offset2 =3D=3D 4) > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operand= s[2], 0, operands[3], 0)) > + =A0 =A0 =A0 =A0 return \"stmia\\t%2, {%0, %1}\"; > + =A0 =A0 =A0 return \"strd\\t%0, %1, [%2]\"; > + =A0 =A0 =A0} > + =A0 =A0else > + =A0 =A0 =A0return \"strd\\t%1, %0, [%2, %3]\"; > + =A0}" > +) > + > +(define_insn "*thumb2_strd_reg2" > + =A0[(parallel [(set (mem:SI (plus:SI > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 2 "s_register_operand" "rk") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (match_oper= and:SI 3 "const_int_operand" "Py"))) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 0 "s_register_oper= and" "")) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (mem:SI (match_dup 2)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 1 "s_register_oper= and" ""))])] > + =A0"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[= 1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 operands[2], operands[3], 0, 0)" > + =A0"* > + =A0{ > + =A0 =A0HOST_WIDE_INT offset1 =3D INTVAL (operands[3]); > + =A0 =A0if (offset1 =3D=3D -4) > + =A0 =A0 =A0return \"strd\\t%0, %1, [%2, %3]\"; > + =A0 =A0else > + =A0 =A0 =A0{ > + =A0 =A0 =A0 if (thumb2_prefer_ldmstm (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 operand= s[2], operands[3], 0, 0)) > + =A0 =A0 =A0 =A0 return \"stmia\\t%2, {%1, %0}\"; > + =A0 =A0 =A0 return \"strd\\t%1, %0, [%2]\"; > + =A0 =A0 =A0} > + =A0}" > +) > + > +(define_peephole2 > + =A0[(set (match_operand:SI 2 "memory_operand" "") > + =A0 =A0 =A0 (match_operand:SI 0 "s_register_operand" "")) > + =A0 (set (match_operand:SI 3 "memory_operand" "") > + =A0 =A0 =A0 (match_operand:SI 1 "s_register_operand" ""))] > + =A0"TARGET_THUMB2 && thumb2_legitimate_ldrd_p (operands[0], operands[1], > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 operands[2], operands[3], 0)" > + =A0[(parallel [(set (match_operand:SI 2 "memory_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 0 "s_register_oper= and" "")) > + =A0 =A0 =A0 =A0 =A0 =A0 (set (match_operand:SI 3 "memory_operand" "") > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(match_operand:SI 1 "s_register_oper= and" ""))])] > + =A0"" > +) > Index: arm.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- arm.c =A0 =A0 =A0 (revision 163853) > +++ arm.c =A0 =A0 =A0 (working copy) > @@ -22976,4 +22976,125 @@ arm_expand_sync (enum machine_mode mode, > =A0 =A0 } > =A0} > > +/* Check the legality of operands in an ldrd/strd instruction. =A0*/ > +bool > +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rtx off1, rtx off2,= bool ldrd) > +{ > + =A0HOST_WIDE_INT offset1 =3D 0; > + =A0HOST_WIDE_INT offset2 =3D 0; > + > + =A0if (off1 !=3D NULL) > + =A0 =A0offset1 =3D INTVAL (off1); > + =A0if (off2 !=3D NULL) > + =A0 =A0offset2 =3D INTVAL (off2); > + > + =A0if (ldrd && (reg1 =3D=3D reg2)) > + =A0 =A0return false; > + > + =A0if ((offset1 + 4) =3D=3D offset2) > + =A0 =A0return true; > + =A0if ((offset2 + 4) =3D=3D offset1) > + =A0 =A0return true; > + > + =A0return false; > +} > + > +/* Check if the two memory accesses can be merged to an ldrd/strd instru= ction. > + =A0 That is they use the same base register, and the gap between consta= nt > + =A0 offsets should be 4. =A0*/ > +bool > +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool l= drd) > +{ > + =A0rtx base1, base2, op1; > + =A0rtx addr1 =3D XEXP (mem1, 0); > + =A0rtx addr2 =3D XEXP (mem2, 0); > + =A0HOST_WIDE_INT offset1 =3D 0; > + =A0HOST_WIDE_INT offset2 =3D 0; > + > + =A0if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2)) > + =A0 =A0return false; > + > + =A0if (REG_P (addr1)) > + =A0 =A0base1 =3D addr1; > + =A0else if (GET_CODE (addr1) =3D=3D PLUS) > + =A0 =A0{ > + =A0 =A0 =A0base1 =3D XEXP (addr1, 0); > + =A0 =A0 =A0op1 =3D XEXP (addr1, 1); > + =A0 =A0 =A0if (!REG_P (base1) || (GET_CODE (op1) !=3D CONST_INT)) > + =A0 =A0 =A0 return false; > + =A0 =A0 =A0offset1 =3D INTVAL (op1); > + =A0 =A0} > + =A0else > + =A0 =A0return false; > + > + =A0if (REG_P (addr2)) > + =A0 =A0base2 =3D addr2; > + =A0else if (GET_CODE (addr2) =3D=3D PLUS) > + =A0 =A0{ > + =A0 =A0 =A0base2 =3D XEXP (addr2, 0); > + =A0 =A0 =A0op1 =3D XEXP (addr2, 1); > + =A0 =A0 =A0if (!REG_P (base2) || (GET_CODE (op1) !=3D CONST_INT)) > + =A0 =A0 =A0 return false; > + =A0 =A0 =A0offset2 =3D INTVAL (op1); > + =A0 =A0} > + =A0else > + =A0 =A0return false; > + > + =A0if (base1 !=3D base2) > + =A0 =A0return false; > + > + =A0if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) !=3D 0)) > + =A0 =A0return false; > + =A0if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) !=3D 0)) > + =A0 =A0return false; > + > + =A0if (ldrd && ((reg1 =3D=3D reg2) || (reg1 =3D=3D base1))) > + =A0 =A0return false; > + > + =A0if ((offset1 + 4) =3D=3D offset2) > + =A0 =A0return true; > + =A0if ((offset2 + 4) =3D=3D offset1) > + =A0 =A0return true; > + > + =A0return false; > +} > + > +/* Check if the insn can be expressed as ldm/stm with less cost. =A0*/ > +bool > +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rtx off1, rtx off2, bool ldrd) > +{ > + =A0HOST_WIDE_INT offset1 =3D 0; > + =A0HOST_WIDE_INT offset2 =3D 0; > + > + =A0if (off1 !=3D NULL) > + =A0 =A0offset1 =3D INTVAL (off1); > + =A0if (off2 !=3D NULL) > + =A0 =A0offset2 =3D INTVAL (off2); > + > + =A0if (offset1 > offset2) > + =A0{ > + =A0 =A0rtx tmp; > + =A0 =A0HOST_WIDE_INT t =3D offset1; > + =A0 =A0offset1 =3D offset2; > + =A0 =A0offset2 =3D t; > + =A0 =A0tmp =3D reg1; > + =A0 =A0reg1 =3D reg2; > + =A0 =A0reg2 =3D tmp; > + =A0} > + > + =A0/* The offset of ldmdb is -8, the offset of ldmia is 0. =A0*/ > + =A0if ((offset1 !=3D -8) && (offset1 !=3D 0)) > + =A0 =A0return false; > + > + =A0/* Lower register corresponds to lower memory. =A0*/ > + =A0if (REGNO (reg1) > REGNO (reg2)) > + =A0 =A0return false; > + > + =A0/* Now ldm/stm is possible. Check for special cases ldm/stm has lower > + =A0 =A0 cost. =A0*/ > + =A0return false; > +} > + > =A0#include "gt-arm.h" > > Index: arm-protos.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- arm-protos.h =A0 =A0 =A0 =A0(revision 163853) > +++ arm-protos.h =A0 =A0 =A0 =A0(working copy) > @@ -149,7 +149,9 @@ extern void arm_expand_sync (enum machin > =A0extern const char *arm_output_memory_barrier (rtx *); > =A0extern const char *arm_output_sync_insn (rtx, rtx *); > =A0extern unsigned int arm_sync_loop_insns (rtx , rtx *); > - > +extern bool thumb2_check_ldrd_operands (rtx, rtx, rtx, rtx, rtx, bool); > +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool); > +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool); > =A0extern bool arm_output_addr_const_extra (FILE *, rtx); > > =A0#if defined TREE_CODE > Index: ldmstm.md > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- ldmstm.md =A0 (revision 163853) > +++ ldmstm.md =A0 (working copy) > @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia" > =A0 =A0 =A0(set (match_operand:SI 2 "arm_hard_register_operand" "") > =A0 =A0 =A0 =A0 =A0 (mem:SI (plus:SI (match_dup 3) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (const_int 4))))])] > - =A0"TARGET_32BIT && XVECLEN (operands[0], 0) =3D=3D 2" > + =A0"TARGET_ARM && XVECLEN (operands[0], 0) =3D=3D 2" > =A0 "ldm%(ia%)\t%3, {%1, %2}" > =A0 [(set_attr "type" "load2") > =A0 =A0(set_attr "predicable" "yes")]) > @@ -901,7 +901,7 @@ (define_insn "*stm2_ia" > =A0 =A0 =A0 =A0 =A0 (match_operand:SI 1 "arm_hard_register_operand" "")) > =A0 =A0 =A0(set (mem:SI (plus:SI (match_dup 3) (const_int 4))) > =A0 =A0 =A0 =A0 =A0 (match_operand:SI 2 "arm_hard_register_operand" ""))]= )] > - =A0"TARGET_32BIT && XVECLEN (operands[0], 0) =3D=3D 2" > + =A0"TARGET_ARM && XVECLEN (operands[0], 0) =3D=3D 2" > =A0 "stm%(ia%)\t%3, {%1, %2}" > =A0 [(set_attr "type" "store2") > =A0 =A0(set_attr "predicable" "yes")]) > @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db" > =A0 =A0 =A0(set (match_operand:SI 2 "arm_hard_register_operand" "") > =A0 =A0 =A0 =A0 =A0 (mem:SI (plus:SI (match_dup 3) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (const_int -4))))])] > - =A0"TARGET_32BIT && XVECLEN (operands[0], 0) =3D=3D 2" > + =A0"TARGET_ARM && XVECLEN (operands[0], 0) =3D=3D 2" > =A0 "ldm%(db%)\t%3, {%1, %2}" > =A0 [(set_attr "type" "load2") > =A0 =A0(set_attr "predicable" "yes")]) > @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db" > =A0 =A0 =A0 =A0 =A0 (match_operand:SI 1 "arm_hard_register_operand" "")) > =A0 =A0 =A0(set (mem:SI (plus:SI (match_dup 3) (const_int -4))) > =A0 =A0 =A0 =A0 =A0 (match_operand:SI 2 "arm_hard_register_operand" ""))]= )] > - =A0"TARGET_32BIT && XVECLEN (operands[0], 0) =3D=3D 2" > + =A0"TARGET_ARM && XVECLEN (operands[0], 0) =3D=3D 2" > =A0 "stm%(db%)\t%3, {%1, %2}" > =A0 [(set_attr "type" "store2") > =A0 =A0(set_attr "predicable" "yes")]) > Index: constraints.md > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- constraints.md =A0 =A0 =A0(revision 163853) > +++ constraints.md =A0 =A0 =A0(working copy) > @@ -31,7 +31,7 @@ > =A0;; The following multi-letter normal constraints have been used: > =A0;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz > =A0;; in Thumb-1 state: Pa, Pb, Pc, Pd > -;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px > +;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px, Py > > =A0;; The following memory constraints have been used: > =A0;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us > @@ -189,6 +189,13 @@ (define_constraint "Px" > =A0 (and (match_code "const_int") > =A0 =A0 =A0 =A0(match_test "TARGET_THUMB2 && ival >=3D -7 && ival <=3D -1= "))) > > +(define_constraint "Py" > + =A0"@internal In Thumb-2 state a constant that is a multiple of 4 in the > + =A0 range -1020 to 1024" > + =A0(and (match_code "const_int") > + =A0 =A0 =A0 (match_test "TARGET_THUMB2 && ival >=3D -1020 && ival <=3D = 1024 > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 && (ival & 3) =3D=3D 0"))) > + > =A0(define_constraint "G" > =A0"In ARM/Thumb-2 state a valid FPA immediate constant." > =A0(and (match_code "const_double") > > > Index: pr40457-1.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- pr40457-1.c (revision 163853) > +++ pr40457-1.c (working copy) > @@ -1,9 +1,9 @@ > -/* { dg-options "-Os" } =A0*/ > +/* { dg-options "-O2" } =A0*/ > =A0/* { dg-do compile } */ > > =A0int bar(int* p) > =A0{ > - =A0int x =3D p[0] + p[1]; > + =A0int x =3D p[0] + p[1] + p[2]; > =A0 return x; > =A0} > > Index: pr40457-2.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- pr40457-2.c (revision 163853) > +++ pr40457-2.c (working copy) > @@ -5,6 +5,7 @@ void foo(int* p) > =A0{ > =A0 p[0] =3D 1; > =A0 p[1] =3D 0; > + =A0p[2] =3D 2; > =A0} > > =A0/* { dg-final { scan-assembler "stm" } } */ > Index: pr45335.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- pr45335.c =A0 (revision 0) > +++ pr45335.c =A0 (revision 0) > @@ -0,0 +1,22 @@ > +/* { dg-options "-mthumb -O2" } */ > +/* { dg-require-effective-target arm_thumb2_ok } */ > +/* { dg-final { scan-assembler "ldrd" } } */ > +/* { dg-final { scan-assembler "strd" } } */ > + > +struct S > +{ > + =A0 =A0void* p1; > + =A0 =A0void* p2; > + =A0 =A0void* p3; > + =A0 =A0void* p4; > +}; > + > +extern printf(char*, ...); > + > +void foo1(struct S* fp, struct S* otherSaveArea) > +{ > + =A0 =A0struct S* saveA =3D fp - 1; > + =A0 =A0printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSav= eArea); > + =A0 =A0printf("prevFrame=3D%p savedPc=3D%p meth=3D%p curPc=3D%p fp[0]= =3D0x%08x\n", > + =A0 =A0 =A0 =A0saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned i= nt*)fp); > +}