From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5177 invoked by alias); 30 Dec 2011 12:42:07 -0000 Received: (qmail 5163 invoked by uid 22791); 30 Dec 2011 12:42:05 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 30 Dec 2011 12:41:44 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 30 Dec 2011 12:41:41 +0000 Received: from [10.1.79.40] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 30 Dec 2011 12:41:40 +0000 Subject: Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue. From: Sameera Deshpande To: Ramana Radhakrishnan Cc: "gcc-patches@gcc.gnu.org" , "nickc@redhat.com" , Richard Earnshaw , "paul@codesourcery.com" , Ramana Radhakrishnan In-Reply-To: <1320749880.28506.87.camel@e102549-lin.cambridge.arm.com> References: <1318324138.2186.40.camel@e102549-lin.cambridge.arm.com> <1318325869.2186.67.camel@e102549-lin.cambridge.arm.com> <1320749880.28506.87.camel@e102549-lin.cambridge.arm.com> Date: Fri, 30 Dec 2011 13:29:00 -0000 Message-ID: <1325248900.20655.204.camel@e102549-lin.cambridge.arm.com> Mime-Version: 1.0 X-MC-Unique: 111123012414101801 Content-Type: multipart/mixed; boundary="=-6s+zyUfJ5gLW5QWIf36J" X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-12/txt/msg01825.txt.bz2 --=-6s+zyUfJ5gLW5QWIf36J Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 280 Hi Ramana, Please find attached revised LDRD generation patch for A15 ARM mode. Because of the major rework in ARM RTL epilogue patch, this patch has undergone some changes. The patch is tested with check-gcc, bootstrap and check-gdb without regression. Ok for trunk? --=20= --=-6s+zyUfJ5gLW5QWIf36J Content-Type: text/x-patch; name=a15_arm_ldrd_epilogue_final.patch; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="a15_arm_ldrd_epilogue_final.patch" Content-length: 8876 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d5c651c..46becfb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16101,6 +16101,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx s= rc2) || (REGNO (src2) =3D=3D SP_REGNUM)); } =20 +/* LDRD in ARM mode needs consecutive registers to be stored. This functi= on + keeps accumulating non-consecutive registers until first consecutive re= gister + pair is found. It then generates multi-reg POP for all accumulated + registers, and then generates LDRD with write-back for consecutive regi= ster + pair. This process is repeated until all the registers are loaded from + stack. multi register POP takes care of lone registers as well. Howev= er, + LDRD cannot be generated for PC, as results are unpredictable. Hence, = if PC + is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with R= ETURN + depending upon number of registers in REGS_TO_BE_POPPED_MASK. */ +static void +arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return) +{ + int num_regs =3D 0; + int i, j; + rtx par =3D NULL_RTX; + rtx insn =3D NULL_RTX; + rtx dwarf =3D NULL_RTX; + rtx tmp; + unsigned long regs_to_be_popped_mask =3D 0; + bool pc_in_list =3D false; + + for (i =3D 0; i <=3D LAST_ARM_REGNUM; i++) + if (saved_regs_mask & (1 << i)) + num_regs++; + + gcc_assert (num_regs && num_regs <=3D 16); + + for (i =3D 0, j =3D 0; i < num_regs; j++) + if (saved_regs_mask & (1 << j)) + { + i++; + if ((j % 2) =3D=3D 0 + && (saved_regs_mask & (1 << (j + 1))) + && (j + 1) !=3D SP_REGNUM + && (j + 1) !=3D PC_REGNUM + && regs_to_be_popped_mask) + { + /* Current register and next register form register pair for w= hich + LDRD can be generated. Generate POP for accumulated regist= ers + and reset regs_to_be_popped_mask. SP should be handled her= e as + the results are unpredictable if register being stored is s= ame + as index register (in this case, SP). PC is always the last + register being popped. Hence, we don't have to worry about= PC + here. */ + arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list); + pc_in_list =3D false; + regs_to_be_popped_mask =3D 0; + continue; + } + + if (j =3D=3D PC_REGNUM) + { + gcc_assert (really_return); + pc_in_list =3D 1; + } + + regs_to_be_popped_mask |=3D (1 << j); + + if ((j % 2) =3D=3D 1 + && (saved_regs_mask & (1 << (j - 1))) + && j !=3D SP_REGNUM + && j !=3D PC_REGNUM) + { + /* Generate a LDRD for register pair R_, R_. The pat= tern + generated here is + [(SET SP, (PLUS SP, 8)) + (SET R_, (MEM SP)) + (SET R_, (MEM (PLUS SP, 4)))]. */ + par =3D gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); + + tmp =3D gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (stack_pointer_rtx, 8)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 0) =3D tmp; + + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, j - 1), + gen_frame_mem (SImode, stack_pointer_rtx)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 1) =3D tmp; + dwarf =3D alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j - 1), + dwarf); + + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, j), + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4= ))); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 2) =3D tmp; + dwarf =3D alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j), + dwarf); + + insn =3D emit_insn (par); + REG_NOTES (insn) =3D dwarf; + pc_in_list =3D false; + regs_to_be_popped_mask =3D 0; + dwarf =3D NULL_RTX; + } + } + + if (regs_to_be_popped_mask) + { + /* single PC pop can happen here. Take care of that. */ + if (pc_in_list && (regs_to_be_popped_mask =3D=3D (1 << PC_REGNUM))) + { + /* Only PC is to be popped. */ + par =3D gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); + XVECEXP (par, 0, 0) =3D ret_rtx; + tmp =3D gen_rtx_SET (SImode, + gen_rtx_REG (SImode, PC_REGNUM), + gen_frame_mem (SImode, + gen_rtx_POST_INC (SImode, + stack_pointer_rtx= ))); + RTX_FRAME_RELATED_P (tmp) =3D 1; + XVECEXP (par, 0, 1) =3D tmp; + emit_jump_insn (par); + } + else + { + arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list); + } + } + + return; +} + /* Generate and emit a pattern that will be recognized as LDRD pattern. I= f even number of registers are being popped, multiple LDRD patterns are create= d for all register pairs. If odd number of registers are popped, last regist= er is @@ -23019,12 +23148,14 @@ arm_expand_epilogue (bool really_return) else { if (!current_tune->prefer_ldrd_strd - || optimize_function_for_size_p (cfun) - || TARGET_ARM) + || optimize_function_for_size_p (cfun)) arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc); else /* Generate LDRD pattern instead of POP pattern. */ - thumb2_emit_ldrd_pop (saved_regs_mask, return_in_pc); + if (TARGET_THUMB2) + thumb2_emit_ldrd_pop (saved_regs_mask, return_in_pc); + else + arm_emit_ldrd_pop (saved_regs_mask, return_in_pc); } =20 if (return_in_pc =3D=3D true) diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index ffa675d..149fd8b 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -109,6 +109,54 @@ "operands[1] =3D gen_rtx_REG (DImode, REGNO (operands[1]));" ) =20 +(define_insn "*arm_ldrd_base_update" + [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk") + (plus:SI (match_dup 0) + (const_int 8))) + (set (match_operand:SI 1 "arm_hard_register_operand" "=3Dr") + (mem:SI (match_dup 0))) + (set (match_operand:SI 2 "arm_hard_register_operand" "=3Dr") + (mem:SI (plus:SI (match_dup 0) + (const_int 4))))] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + "ldr%(d%)\t%1, %2, [%0], #8" + [(set_attr "type" "load2") + (set_attr "predicable" "yes")]) + +(define_peephole2 + [(parallel + [(set (match_operand:SI 0 "arm_hard_register_operand" "") + (plus:SI (match_dup 0) + (const_int 8))) + (set (match_operand:SI 1 "arm_hard_register_operand" "") + (mem:SI (match_dup 0))) + (set (match_operand:SI 2 "arm_hard_register_operand" "") + (mem:SI (plus:SI (match_dup 0) + (const_int 4))))])] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + [(set (match_dup 1) + (mem:DI (post_inc:SI (match_dup 0))))] + "operands[1] =3D gen_rtx_REG (DImode, REGNO (operands[1]));" +) + +(define_insn "*arm_ldr_with_update" + [(parallel + [(set (match_operand:SI 0 "arm_hard_register_operand" "") + (plus:SI (match_dup 0) + (const_int 4))) + (set (match_operand:SI 1 "arm_hard_register_operand" "") + (mem:SI (match_dup 0)))])] + "(TARGET_ARM && current_tune->prefer_ldrd_strd)" + "ldr%?\t%1, [%0], #4" + [(set_attr "type" "load1") + (set_attr "predicable" "yes")]) + (define_insn "*ldm4_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "arm_hard_register_operand" "")= --=-6s+zyUfJ5gLW5QWIf36J--