On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: > >+arm_emit_strd_push (unsigned long saved_regs_mask) > > How different is this from the thumb2 version you sent out in Patch 03/05 ? > Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot. Because of which we accumulate non-consecutive STRDs in ARM mode and emit STM instruction. For consecutive registers, STRD is generated. > >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void) > > use 32-bit push/pop instructions. */ > > if (! any_sibcall_uses_r3 () > > && arm_size_return_regs () <= 12 > >- && (offsets->saved_regs_mask & (1 << 3)) == 0) > >+ && (offsets->saved_regs_mask & (1 << 3)) == 0 > >+ && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd)) > > Not sure I completely follow this change yet. > If the stack is not aligned, we need to adjust the stack in prologue. Here, instead of adjusting the stack, we PUSH register R3 on stack, so that no additional ADD instruction is needed for stack adjustment. This works fine when we generate multi-reg load/store instructions. However, when we generate STRD in ARM mode, non-consecutive registers are stored using STR/STM instruction. As pair register of R3 (reg R2) is never pushed on stack, we always end up generating STR instruction to PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for stack adjustment. e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned, hence, we PUSH {R3, R4, R5, R6} So, Instructions generated are: STR R6, [sp, #4] STRD R4, R5, [sp, #12] STR R3, [sp, #16] However, if instead of R3, other caller-saved register is PUSHed, we push {R4, R5, R6, R7}, to generate STRD R6, R7, [sp, #8] STRD R4, R5, [sp, #16] If no caller saved register is available, we generate ADD instruction, which is still better than generating STR. > > Hmmm the question remains if we want to put these into ldmstm.md since > it was theoretically > auto-generated from ldmstm.ml. If this has to be marked to be separate > then I'd like > to regenerate ldmstm.md from ldmstm.ml and differentiate between the > bits that can be auto-generated > and the bits that have been added since. > The current patterns are quite different from patterns generated using arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating ldrd/strd patterns as a new patch. Is that fine? The patch is tested with check-gcc, check-gdb and bootstrap. I see a regression in gcc: FAIL: gcc.c-torture/execute/vector-compare-1.c compilation, -O3 -fomit-frame-pointer -funroll-loops with error message /tmp/ccC13odV.s: Assembler messages: /tmp/ccC13odV.s:544: Error: co-processor offset out of range This seems to be uncovered latent bug, and I am looking into it. - Thanks and regards, Sameera D.