From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2490 invoked by alias); 8 Nov 2011 10:57:57 -0000 Received: (qmail 2479 invoked by uid 22791); 8 Nov 2011 10:57:54 -0000 X-SWARE-Spam-Status: No, hits=-1.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 08 Nov 2011 10:57:39 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 08 Nov 2011 10:57:34 +0000 Received: from [10.1.79.40] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 8 Nov 2011 10:57:33 +0000 Subject: Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue. From: Sameera Deshpande To: Ramana Radhakrishnan Cc: "gcc-patches@gcc.gnu.org" , "nickc@redhat.com" , Richard Earnshaw , "paul@codesourcery.com" , Ramana Radhakrishnan References: <1318324138.2186.40.camel@e102549-lin.cambridge.arm.com> <1318325448.2186.62.camel@e102549-lin.cambridge.arm.com> In-Reply-To: Date: Tue, 08 Nov 2011 11:14:00 -0000 Message-ID: <1320749852.28506.86.camel@e102549-lin.cambridge.arm.com> Mime-Version: 1.0 X-MC-Unique: 111110810573404501 Content-Type: multipart/mixed; boundary="=-VPF5slTwzb+9vah9smXp" X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-11/txt/msg01161.txt.bz2 --=-VPF5slTwzb+9vah9smXp Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 2845 On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote:=20 > >+arm_emit_strd_push (unsigned long saved_regs_mask) >=20 > How different is this from the thumb2 version you sent out in Patch 03/05= ? >=20 Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot. Because of which we accumulate non-consecutive STRDs in ARM mode and emit STM instruction. For consecutive registers, STRD is generated. > >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void) > > use 32-bit push/pop instructions. */ > > if (! any_sibcall_uses_r3 () > > && arm_size_return_regs () <=3D 12 > >- && (offsets->saved_regs_mask & (1 << 3)) =3D=3D 0) > >+ && (offsets->saved_regs_mask & (1 << 3)) =3D=3D 0 > >+ && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd)) >=20 > Not sure I completely follow this change yet. >=20 If the stack is not aligned, we need to adjust the stack in prologue. Here, instead of adjusting the stack, we PUSH register R3 on stack, so that no additional ADD instruction is needed for stack adjustment. This works fine when we generate multi-reg load/store instructions. However, when we generate STRD in ARM mode, non-consecutive registers are stored using STR/STM instruction. As pair register of R3 (reg R2) is never pushed on stack, we always end up generating STR instruction to PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for stack adjustment. e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned, hence, we PUSH {R3, R4, R5, R6} So, Instructions generated are: STR R6, [sp, #4] STRD R4, R5, [sp, #12] STR R3, [sp, #16] However, if instead of R3, other caller-saved register is PUSHed, we push {R4, R5, R6, R7}, to generate STRD R6, R7, [sp, #8] STRD R4, R5, [sp, #16] If no caller saved register is available, we generate ADD instruction, which is still better than generating STR.=20 >=20 > Hmmm the question remains if we want to put these into ldmstm.md since > it was theoretically > auto-generated from ldmstm.ml. If this has to be marked to be separate > then I'd like > to regenerate ldmstm.md from ldmstm.ml and differentiate between the > bits that can be auto-generated > and the bits that have been added since. >=20 The current patterns are quite different from patterns generated using arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating ldrd/strd patterns as a new patch. Is that fine? The patch is tested with check-gcc, check-gdb and bootstrap. I see a regression in gcc: FAIL: gcc.c-torture/execute/vector-compare-1.c compilation, -O3 -fomit-frame-pointer -funroll-loops with error message=20 /tmp/ccC13odV.s: Assembler messages: /tmp/ccC13odV.s:544: Error: co-processor offset out of range This seems to be uncovered latent bug, and I am looking into it. - Thanks and regards, Sameera D.= --=-VPF5slTwzb+9vah9smXp Content-Type: text/x-patch; name=a15_arm_strd_prologue-4Nov.patch; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Description: a15_arm_strd_prologue-4Nov.patch Content-Disposition: attachment; filename="a15_arm_strd_prologue-4Nov.patch"; size=9583; creation-date="Fri, 04 Nov 2011 18:32:20 GMT"; modification-date="Fri, 04 Nov 2011 18:32:20 GMT" Content-length: 9450 diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index e71ead5..ccf05c7 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -163,6 +163,7 @@ extern const char *arm_output_memory_barrier (rtx *); extern const char *arm_output_sync_insn (rtx, rtx *); extern unsigned int arm_sync_loop_insns (rtx , rtx *); extern int arm_attr_length_push_multi(rtx, rtx); +extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx); =20 #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 334a25f..deee78b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int); static void arm_print_operand (FILE *, rtx, int); static void arm_print_operand_address (FILE *, rtx); static bool arm_print_operand_punct_valid_p (unsigned char code); +static rtx emit_multi_reg_push (unsigned long); static const char *fp_const_from_val (REAL_VALUE_TYPE *); static arm_cc get_arm_condition_code (rtx); static HOST_WIDE_INT int_log2 (HOST_WIDE_INT); @@ -15438,6 +15439,117 @@ arm_output_function_epilogue (FILE *file ATTRIBUT= E_UNUSED, } } =20 +/* STRD in ARM mode needs consecutive registers to be stored. This functi= on + keeps accumulating non-consecutive registers until first consecutive re= gister + pair is found. It then generates multi register PUSH for all accumulat= ed + registers, and then generates STRD with write-back for consecutive regi= ster + pair. This process is repeated until all the registers are stored on s= tack. + multi register PUSH takes care of lone registers as well. */ +static void +arm_emit_strd_push (unsigned long saved_regs_mask) +{ + int num_regs =3D 0; + int i, j; + rtx par =3D NULL_RTX; + rtx dwarf =3D NULL_RTX; + rtx insn =3D NULL_RTX; + rtx tmp, tmp1; + unsigned long regs_to_be_pushed_mask; + + for (i =3D 0; i <=3D LAST_ARM_REGNUM; i++) + if (saved_regs_mask & (1 << i)) + num_regs++; + + gcc_assert (num_regs && num_regs <=3D 16); + + /* Var j iterates over all registers to gather all registers in + saved_regs_mask. Var i is used to count number of registers stored on + stack. regs_to_be_pushed_mask accumulates non-consecutive registers + that can be pushed using multi register PUSH before STRD is + generated. */ + for (i=3D0, j =3D LAST_ARM_REGNUM, regs_to_be_pushed_mask =3D 0; i < num= _regs; j--) + if (saved_regs_mask & (1 << j)) + { + gcc_assert (j !=3D SP_REGNUM); + gcc_assert (j !=3D PC_REGNUM); + i++; + + if ((j % 2 =3D=3D 1) + && (saved_regs_mask & (1 << (j - 1))) + && regs_to_be_pushed_mask) + { + /* Current register and previous register form register pair f= or + which STRD can be generated. Hence, emit PUSH for accumula= ted + registers and reset regs_to_be_pushed_mask. */ + insn =3D emit_multi_reg_push (regs_to_be_pushed_mask); + regs_to_be_pushed_mask =3D 0; + RTX_FRAME_RELATED_P (insn) =3D 1; + continue; + } + + regs_to_be_pushed_mask |=3D (1 << j); + + if ((j % 2) =3D=3D 0 && (saved_regs_mask & (1 << (j + 1)))) + { + /* We have found 2 consecutive registers, for which STRD can be + generated. Generate pattern to emit STRD as accumulated + registers have already been pushed. */ + par =3D gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); + dwarf =3D gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3)); + + tmp =3D gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (stack_pointer_rtx, -8)); + tmp1 =3D gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (stack_pointer_rtx, -8)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + RTX_FRAME_RELATED_P (tmp1) =3D 1; + XVECEXP (par, 0, 0) =3D tmp; + XVECEXP (dwarf, 0, 0) =3D tmp1; + + tmp =3D gen_rtx_SET (SImode, + gen_frame_mem (SImode, stack_pointer_rtx), + gen_rtx_REG (SImode, j)); + tmp1 =3D gen_rtx_SET (SImode, + gen_frame_mem (SImode, stack_pointer_rtx), + gen_rtx_REG (SImode, j)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + RTX_FRAME_RELATED_P (tmp1) =3D 1; + XVECEXP (par, 0, 1) =3D tmp; + XVECEXP (dwarf, 0, 1) =3D tmp1; + + tmp =3D gen_rtx_SET (SImode, + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4)), + gen_rtx_REG (SImode, j + 1)); + tmp1 =3D gen_rtx_SET (SImode, + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4)), + gen_rtx_REG (SImode, j + 1)); + RTX_FRAME_RELATED_P (tmp) =3D 1; + RTX_FRAME_RELATED_P (tmp1) =3D 1; + XVECEXP (par, 0, 2) =3D tmp; + XVECEXP (dwarf, 0, 2) =3D tmp1; + + insn =3D emit_insn (par); + add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf); + RTX_FRAME_RELATED_P (insn) =3D 1; + regs_to_be_pushed_mask =3D 0; + } + } + + /* Check if any accumulated registers are yet to be pushed, and generate + multi register PUSH for them. */ + if (regs_to_be_pushed_mask) + { + insn =3D emit_multi_reg_push (regs_to_be_pushed_mask); + RTX_FRAME_RELATED_P (insn) =3D 1; + } + + return; +} + /* Generate and emit a pattern that will be recognized as STRD pattern. I= f even number of registers are being pushed, multiple STRD patterns are create= d for all register pairs. If odd number of registers are pushed, emit a @@ -15826,6 +15938,17 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num= _regs, rtx base_reg) } =20 bool +bad_reg_pair_for_arm_ldrd_strd (rtx src1, rtx src2) +{ + return (GET_CODE (src1) !=3D REG + || GET_CODE (src2) !=3D REG + || ((REGNO (src1) + 1) !=3D REGNO (src2)) + || ((REGNO (src1) % 2) !=3D 0) + || (REGNO (src2) =3D=3D PC_REGNUM) + || (REGNO (src2) =3D=3D SP_REGNUM)); +} + +bool bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2) { return (GET_CODE (src1) !=3D REG @@ -16249,7 +16372,8 @@ arm_get_frame_offsets (void) use 32-bit push/pop instructions. */ if (! any_sibcall_uses_r3 () && arm_size_return_regs () <=3D 12 - && (offsets->saved_regs_mask & (1 << 3)) =3D=3D 0) + && (offsets->saved_regs_mask & (1 << 3)) =3D=3D 0 + && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd)) { reg =3D 3; } @@ -16718,11 +16842,13 @@ arm_expand_prologue (void) } } =20 - if (TARGET_THUMB2 - && current_tune->prefer_ldrd_strd + if (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun)) { - thumb2_emit_strd_push (live_regs_mask); + if (TARGET_THUMB2) + thumb2_emit_strd_push (live_regs_mask); + else + arm_emit_strd_push (live_regs_mask); } else { diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index e3dcd4f..ffa675d 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -73,6 +73,42 @@ [(set_attr "type" "store2") (set_attr "predicable" "yes")]) =20 +(define_insn "*arm_strd_base_update" + [(set (match_operand:SI 0 "arm_hard_register_operand" "+&rk") + (plus:SI (match_dup 0) + (const_int -8))) + (set (mem:SI (match_dup 0)) + (match_operand:SI 1 "arm_hard_register_operand" "r")) + (set (mem:SI (plus:SI (match_dup 0) + (const_int 4))) + (match_operand:SI 2 "arm_hard_register_operand" "r"))] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + "str%(d%)\t%1, %2, [%0, #-8]!" + [(set_attr "type" "store2") + (set_attr "predicable" "yes")]) + +(define_peephole2 + [(parallel + [(set (match_operand:SI 0 "arm_hard_register_operand" "") + (plus:SI (match_dup 0) + (const_int -8))) + (set (mem:SI (match_dup 0)) + (match_operand:SI 1 "arm_hard_register_operand" "")) + (set (mem:SI (plus:SI (match_dup 0) + (const_int 4))) + (match_operand:SI 2 "arm_hard_register_operand" ""))])] + "(TARGET_ARM && current_tune->prefer_ldrd_strd + && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2])) + && (REGNO (operands[1]) !=3D REGNO (operands[0])) + && (REGNO (operands[2]) !=3D REGNO (operands[0])))" + [(set (mem:DI (pre_dec:SI (match_dup 0))) + (match_dup 1))] + "operands[1] =3D gen_rtx_REG (DImode, REGNO (operands[1]));" +) + (define_insn "*ldm4_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "arm_hard_register_operand" "")= --=-VPF5slTwzb+9vah9smXp--