From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5168 invoked by alias); 21 Oct 2011 12:44:34 -0000 Received: (qmail 5157 invoked by uid 22791); 21 Oct 2011 12:44:33 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-qy0-f182.google.com (HELO mail-qy0-f182.google.com) (209.85.216.182) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 21 Oct 2011 12:44:18 +0000 Received: by qyg14 with SMTP id 14so3614631qyg.20 for ; Fri, 21 Oct 2011 05:44:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.66.224 with SMTP id o32mr3223907qci.13.1319201057766; Fri, 21 Oct 2011 05:44:17 -0700 (PDT) Received: by 10.229.215.207 with HTTP; Fri, 21 Oct 2011 05:44:17 -0700 (PDT) In-Reply-To: <1318325232.2186.58.camel@e102549-lin.cambridge.arm.com> References: <1318324138.2186.40.camel@e102549-lin.cambridge.arm.com> <1318325232.2186.58.camel@e102549-lin.cambridge.arm.com> Date: Fri, 21 Oct 2011 13:00:00 -0000 Message-ID: Subject: Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue. From: Ramana Radhakrishnan To: Sameera Deshpande Cc: "gcc-patches@gcc.gnu.org" , "nickc@redhat.com" , Richard Earnshaw , "paul@codesourcery.com" , Ramana Radhakrishnan Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg01969.txt.bz2 On 11 October 2011 10:27, Sameera Deshpande wrote: > Hi! > > This patch generates STRD instruction instead of PUSH in thumb2 mode for > A15. > > For optimize_size, original prologue is generated for A15. > The work involves defining new functions, predicates and patterns. > > +/* Generate and emit a pattern that will be recognized as STRD pattern. If even > + number of registers are being pushed, multiple STRD patterns are created for > + all register pairs. If odd number of registers are pushed, first register is numchar > 80 > + stored by using STR pattern. */ s/stored/Stored. A better comment would be "Emit a combination of strd and str's for the prologue saves. " > +static void > +thumb2_emit_strd_push (unsigned long saved_regs_mask) > +{ > + int num_regs = 0; > + int i, j; > + rtx par = NULL_RTX; > + rtx insn = NULL_RTX; > + rtx dwarf = NULL_RTX; > + rtx tmp, reg, tmp1; > + > + for (i = 0; i <= LAST_ARM_REGNUM; i++) > + if (saved_regs_mask & (1 << i)) > + num_regs++; > + > + gcc_assert (num_regs && num_regs <= 16); > + > + /* Pre-decrement the stack pointer, based on there being num_regs 4-byte > + registers to push. */ > + tmp = gen_rtx_SET (VOIDmode, > + stack_pointer_rtx, > + plus_constant (stack_pointer_rtx, -4 * num_regs)); > + RTX_FRAME_RELATED_P (tmp) = 1; > + insn = emit_insn (tmp); > + > + /* Create sequence for DWARF info. */ > + dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1)); > + > + /* RTLs cannot be shared, hence create new copy for dwarf. */ > + tmp1 = gen_rtx_SET (VOIDmode, > + stack_pointer_rtx, > + plus_constant (stack_pointer_rtx, -4 * num_regs)); > + RTX_FRAME_RELATED_P (tmp1) = 1; > + XVECEXP (dwarf, 0, 0) = tmp1; > + > + for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--) > + /* Var j iterates over all the registers to gather all the registers in > + saved_regs_mask. Var i gives index of register R_j in stack frame. > + A PARALLEL RTX of register-pair is created here, so that pattern for > + STRD can be matched. If num_regs is odd, 1st register will be pushed > + using STR and remaining registers will be pushed with STRD in pairs. > + If num_regs is even, all registers are pushed with STRD in pairs. > + Hence, skip first element for odd num_regs. */ Comment before the loop please. > + if (saved_regs_mask & (1 << j)) > + { > + gcc_assert (j != SP_REGNUM); > + gcc_assert (j != PC_REGNUM); > + > + /* Create RTX for store. New RTX is created for dwarf as > + they are not sharable. */ > + reg = gen_rtx_REG (SImode, j); > + tmp = gen_rtx_SET (SImode, > + gen_frame_mem > + (SImode, > + plus_constant (stack_pointer_rtx, 4 * i)), > + reg); > + > + tmp1 = gen_rtx_SET (SImode, > + gen_frame_mem > + (SImode, > + plus_constant (stack_pointer_rtx, 4 * i)), > + reg); > + RTX_FRAME_RELATED_P (tmp) = 1; > + RTX_FRAME_RELATED_P (tmp1) = 1; > + > + if (((i - (num_regs % 2)) % 2) == 1) > + /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to > + be created. Hence create it first. The STRD pattern we are > + generating is : > + [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1)) > + (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ] > + were target registers need not be consecutive. */ > + par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); > + > + /* Register R_j is added in PARALLEL RTX. If (i - (num_regs % 2)) is > + even, the reg_j is added as 0th element and if it is odd, reg_i is > + added as 1st element of STRD pattern shown above. */ > + XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp; > + XVECEXP (dwarf, 0, (i + 1)) = tmp1; > + > + if (((i - (num_regs % 2)) % 2) == 0) > + /* When (i - (num_regs % 2)) is even, RTXs for both the registers > + to be loaded are generated in above given STRD pattern, and the > + pattern can be emitted now. */ > + emit_insn (par); > + > + i--; > + } > + > + if ((num_regs % 2) == 1) > + { > + /* If odd number of registers are pushed, generate STR pattern to store > + lone register. */ > + for (; (saved_regs_mask & (1 << j)) == 0; j--); > + > + tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i)); > + reg = gen_rtx_REG (SImode, j); > + tmp = gen_rtx_SET (SImode, tmp1, reg); > + RTX_FRAME_RELATED_P (tmp) = 1; > + > + emit_insn (tmp); > + > + tmp1 = gen_rtx_SET (SImode, > + gen_frame_mem > + (SImode, > + plus_constant (stack_pointer_rtx, 4 * i)), > + reg); > + RTX_FRAME_RELATED_P (tmp1) = 1; > + XVECEXP (dwarf, 0, (i + 1)) = tmp1; > + } > + > + add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf); > + RTX_FRAME_RELATED_P (insn) = 1; > + return; > +} > + > /* Generate and emit an insn that we will recognize as a push_multi. > Unfortunately, since this insn does not reflect very well the actual > semantics of the operation, we need to annotate the insn for the benefit > @@ -16307,8 +16426,16 @@ arm_expand_prologue (void) > saved_regs += frame; > } > } > - insn = emit_multi_reg_push (live_regs_mask); > - RTX_FRAME_RELATED_P (insn) = 1; > + > + if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size) Replace optimize_size by optimize_function_for_size_p () . OK with those changes. ramana