From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3157 invoked by alias); 8 Nov 2011 10:58:24 -0000 Received: (qmail 3130 invoked by uid 22791); 8 Nov 2011 10:58:20 -0000 X-SWARE-Spam-Status: No, hits=-1.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 08 Nov 2011 10:58:06 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 08 Nov 2011 10:58:03 +0000 Received: from [10.1.79.40] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 8 Nov 2011 10:58:01 +0000 Subject: Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue. From: Sameera Deshpande To: Ramana Radhakrishnan Cc: "gcc-patches@gcc.gnu.org" , "nickc@redhat.com" , Richard Earnshaw , "paul@codesourcery.com" , Ramana Radhakrishnan References: <1318324138.2186.40.camel@e102549-lin.cambridge.arm.com> <1318325869.2186.67.camel@e102549-lin.cambridge.arm.com> In-Reply-To: Date: Tue, 08 Nov 2011 11:15:00 -0000 Message-ID: <1320749880.28506.87.camel@e102549-lin.cambridge.arm.com> Mime-Version: 1.0 X-MC-Unique: 111110810580305701 Content-Type: multipart/mixed; boundary="=-jXEhHHY8idxHzp0xW9yq" X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-11/txt/msg01162.txt.bz2 --=-jXEhHHY8idxHzp0xW9yq Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 349 On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote:=20 > change that. Other than that this patch looks OK and please watch out > for stylistic issues from the previous patch. Ramana, please find attached reworked patch. The patch is tested with check-gcc, check-gdb and bootstrap with no regression. - Thanks and regards, Sameera D.= --=-jXEhHHY8idxHzp0xW9yq Content-Type: text/x-patch; name=a15_arm_ldrd_epilogue-4Nov.patch; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Description: a15_arm_ldrd_epilogue-4Nov.patch Content-Disposition: attachment; filename="a15_arm_ldrd_epilogue-4Nov.patch"; size=8814; creation-date="Fri, 04 Nov 2011 18:34:41 GMT"; modification-date="Fri, 04 Nov 2011 18:34:41 GMT" Content-length: 8703 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index deee78b..4a86749 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -15960,6 +15960,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx s= rc2) || (REGNO (src2) =3D=3D SP_REGNUM)); } =20 +/* LDRD in ARM mode needs consecutive registers to be stored. This functi= on + keeps accumulating non-consecutive registers until first consecutive re= gister + pair is found. It then generates multi-reg POP for all accumulated + registers, and then generates LDRD with write-back for consecutive regi= ster + pair. This process is repeated until all the registers are loaded from + stack. multi register POP takes care of lone registers as well. Howev= er, + LDRD cannot be generated for PC, as results are unpredictable. Hence, = if PC + is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with R= ETURN + depending upon number of registers in REGS_TO_BE_POPPED_MASK. */ +static void +arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return) +{ + int num_regs =3D 0; + int i, j; + rtx par =3D NULL_RTX; + rtx insn =3D NULL_RTX; + rtx dwarf =3D NULL_RTX; + rtx tmp; + unsigned long regs_to_be_popped_mask =3D 0; + bool pc_in_list =3D false; + + for (i =3D 0; i <=3D LAST_ARM_REGNUM; i++) + if (saved_regs_mask & (1 << i)) + num_regs++; + + gcc_assert (num_regs && num_regs <=3D 16); + + for (i =3D 0, j =3D 0; i < num_regs; j++) + if (saved_regs_mask & (1 << j)) + { + i++; + if ((j % 2) =3D=3D 0 + && (saved_regs_mask & (1 << (j + 1))) + && (j + 1) !=3D SP_REGNUM + && (j + 1) !=3D PC_REGNUM + && regs_to_be_popped_mask) + { + /* Current register and next register form register pair for w= hich + LDRD can be generated. Generate POP for accumulated regist= ers + and reset regs_to_be_popped_mask. SP should be handled her= e as + the results are unpredictable if register being stored is s= ame + as index register (in this case, SP). PC is always the last + register being popped. Hence, we don't have to worry about= PC + here. */ + arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list); + pc_in_list =3D false; + regs_to_be_popped_mask =3D 0; + continue; + } + + if (j =3D=3D PC_REGNUM) + { + gcc_assert (really_return); + pc_in_list =3D 1; + } + + regs_to_be_popped_mask |=3D (1 << j); + + if ((j % 2) =3D=3D 1 + && (saved_regs_mask & (1 << (j - 1))) + && j !=3D SP_REGNUM + && j !=3D PC_REGNUM) + { + /* Generate a LDRD for register pair R_, R_. The pat= tern + generated here is + [(SET SP, (PLUS SP, 8)) + (SET R_, (MEM SP)) + (SET R_, (MEM (PLUS SP, 4)))]. */ + par =3D gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); + + tmp =3D gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (stack_pointer_rtx, 8)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 0) =3D tmp; + + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, j - 1), + gen_frame_mem (SImode, stack_pointer_rtx)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 1) =3D tmp; + dwarf =3D alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j - 1), + dwarf); + + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, j), + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4= ))); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 2) =3D tmp; + dwarf =3D alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j), + dwarf); + + insn =3D emit_insn (par); + REG_NOTES (insn) =3D dwarf; + pc_in_list =3D false; + regs_to_be_popped_mask =3D 0; + dwarf =3D NULL_RTX; + } + } + + if (regs_to_be_popped_mask) + { + /* single PC pop can happen here. Take care of that. */ + if (pc_in_list && (regs_to_be_popped_mask =3D=3D (1 << PC_REGNUM))) + { + /* Only PC is to be popped. */ + par =3D gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); + XVECEXP (par, 0, 0) =3D ret_rtx; + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, PC_REGNUM), + gen_frame_mem (SImode, + gen_rtx_POST_INC (SImode, + stack_pointer_rtx= ))); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 1) =3D tmp; + emit_jump_insn (par); + } + else + { + arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list); + } + } + + return; +} + /* Generate and emit a pattern that will be recognized as LDRD pattern. I= f even number of registers are being popped, multiple LDRD patterns are create= d for all register pairs. If odd number of registers are popped, last regist= er is @@ -22807,7 +22936,13 @@ arm_expand_epilogue (bool really_return) return_in_pc =3D true; } =20 - arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc); + if (!current_tune->prefer_ldrd_strd + || optimize_function_for_size_p (cfun)) + arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc); + else + /* Generate LDRD pattern instead of POP pattern. */ + arm_emit_ldrd_pop (saved_regs_mask, return_in_pc); + if (return_in_pc =3D=3D true) return; } diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index ffa675d..149fd8b 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -109,6 +109,54 @@ "operands[1] =3D gen_rtx_REG (DImode, REGNO (operands[1]));" ) =20 +(define_insn "*arm_ldrd_base_update" + [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk") + (plus:SI (match_dup 0) + (const_int 8))) + (set (match_operand:SI 1 "arm_hard_register_operand" "=3Dr") + (mem:SI (match_dup 0))) + (set (match_operand:SI 2 "arm_hard_register_operand" "=3Dr") + (mem:SI (plus:SI (match_dup 0) + (const_int 4))))] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + "ldr%(d%)\t%1, %2, [%0], #8" + [(set_attr "type" "load2") + (set_attr "predicable" "yes")]) + +(define_peephole2 + [(parallel + [(set (match_operand:SI 0 "arm_hard_register_operand" "") + (plus:SI (match_dup 0) + (const_int 8))) + (set (match_operand:SI 1 "arm_hard_register_operand" "") + (mem:SI (match_dup 0))) + (set (match_operand:SI 2 "arm_hard_register_operand" "") + (mem:SI (plus:SI (match_dup 0) + (const_int 4))))])] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + [(set (match_dup 1) + (mem:DI (post_inc:SI (match_dup 0))))] + "operands[1] =3D gen_rtx_REG (DImode, REGNO (operands[1]));" +) + +(define_insn "*arm_ldr_with_update" + [(parallel + [(set (match_operand:SI 0 "arm_hard_register_operand" "") + (plus:SI (match_dup 0) + (const_int 4))) + (set (match_operand:SI 1 "arm_hard_register_operand" "") + (mem:SI (match_dup 0)))])] + "(TARGET_ARM && current_tune->prefer_ldrd_strd)" + "ldr%?\t%1, [%0], #4" + [(set_attr "type" "load1") + (set_attr "predicable" "yes")]) + (define_insn "*ldm4_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "arm_hard_register_operand" "")= --=-jXEhHHY8idxHzp0xW9yq--