public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: James Greenhalgh <james.greenhalgh@arm.com>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	Andrew Pinski	<pinskia@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	Marcus Shawcroft	<marcus.shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,	<nd@arm.com>
Subject: Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
Date: Tue, 29 Nov 2016 10:58:00 -0000	[thread overview]
Message-ID: <20161129105732.GA35224@arm.com> (raw)
In-Reply-To: <5829C958.2010601@foss.arm.com>

On Mon, Nov 14, 2016 at 02:25:28PM +0000, Kyrill Tkachov wrote:
> 
> On 11/11/16 15:31, Kyrill Tkachov wrote:
> >
> >On 11/11/16 10:17, Kyrill Tkachov wrote:
> >>
> >>On 10/11/16 23:39, Segher Boessenkool wrote:
> >>>On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:
> >>>>On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov
> >>>>>I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
> >>>>>some interesting swings.
> >>>>>458.sjeng     +1.45%
> >>>>>471.omnetpp   +2.19%
> >>>>>445.gobmk     -2.01%
> >>>>>
> >>>>>On SPECFP:
> >>>>>453.povray    +7.00%
> >>>>
> >>>>Wow, this looks really good.  Thank you for implementing this.  If I
> >>>>get some time I am going to try it out on other processors than A72
> >>>>but I doubt I have time any time soon.
> >>>I'd love to hear what causes the slowdown for gobmk as well, btw.
> >>
> >>I haven't yet gotten a direct answer for that (through performance analysis
> >>tools) but I have noticed that load/store pairs are not generated as
> >>aggressively as I hoped.  They are being merged by the sched fusion pass
> >>and peepholes (which runs after this) but it still misses cases. I've
> >>hacked the SWS hooks to generate pairs explicitly and that increases the
> >>number of pairs and helps code size to boot. It complicates the logic of
> >>the hooks a bit but not too much.
> >>
> >>I'll make those changes and re-benchmark, hopefully that
> >>will help performance.
> >>
> >
> >And here's a version that explicitly emits pairs. I've looked at assembly
> >codegen on SPEC2006 and it generates quite a few more LDP/STP pairs than the
> >original version.  I kicked off benchmarks over the weekend to see the
> >effect.  Andrew, if you want to try it out (more benchmarking and testing
> >always welcome) this is the one to try.
> >
> 
> And I discovered over the weekend that gamess and wrf have validation errors.
> This version runs correctly.  SPECINT results were fine though and there is
> even a small overall gain due to sjeng and omnetpp. However, gobmk still has
> the regression.  I'll rerun SPECFP with this patch (it's really just a small
> bugfix over the previous version) and get on with analysing gobmk.

I have some comments in line, most of which are about hardcoding the
maximum register number, but at a high level this looks good to me.

Thanks,
James

> 2016-11-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.h (machine_function): Add
>     reg_is_wrapped_separately field.
>     * config/aarch64/aarch64.c (emit_set_insn): Change return type to
>     rtx_insn *.
>     (aarch64_save_callee_saves): Don't save registers that are wrapped
>     separately.
>     (aarch64_restore_callee_saves): Don't restore registers that are
>     wrapped separately.
>     (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
>     aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
>     (aarch64_get_separate_components): New function.
>     (aarch64_get_next_set_bit): Likewise.
>     (aarch64_components_for_bb): Likewise.
>     (aarch64_disqualify_components): Likewise.
>     (aarch64_emit_prologue_components): Likewise.
>     (aarch64_emit_epilogue_components): Likewise.
>     (aarch64_set_handled_components): Likewise.
>     (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
>     TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
>     TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
>     TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
>     TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
>     TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.
> 

> commit 06ac3c30d8aa38781ee9019e60a5fcf727b85231
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Tue Oct 11 09:25:54 2016 +0100
> 
>     [AArch64] Separate shrink wrapping hooks implementation
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 325e725..2d33ef6 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm,
>  
>  /* Emit an insn that's a simple single-set.  Both the operands must be
>     known to be valid.  */
> -inline static rtx
> +inline static rtx_insn *
>  emit_set_insn (rtx x, rtx y)
>  {
>    return emit_insn (gen_rtx_SET (x, y));
> @@ -3135,6 +3135,9 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
>  	      || regno == cfun->machine->frame.wb_candidate2))
>  	continue;
>  
> +      if (cfun->machine->reg_is_wrapped_separately[regno])
> +       continue;
> +
>        reg = gen_rtx_REG (mode, regno);
>        offset = start_offset + cfun->machine->frame.reg_offset[regno];
>        mem = gen_mem_ref (mode, plus_constant (Pmode, stack_pointer_rtx,
> @@ -3143,6 +3146,7 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
>        regno2 = aarch64_next_callee_save (regno + 1, limit);
>  
>        if (regno2 <= limit
> +	  && !cfun->machine->reg_is_wrapped_separately[regno2]
>  	  && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
>  	      == cfun->machine->frame.reg_offset[regno2]))
>  
> @@ -3191,6 +3195,9 @@ aarch64_restore_callee_saves (machine_mode mode,
>         regno <= limit;
>         regno = aarch64_next_callee_save (regno + 1, limit))
>      {
> +      if (cfun->machine->reg_is_wrapped_separately[regno])
> +       continue;
> +
>        rtx reg, mem;
>  
>        if (skip_wb
> @@ -3205,6 +3212,7 @@ aarch64_restore_callee_saves (machine_mode mode,
>        regno2 = aarch64_next_callee_save (regno + 1, limit);
>  
>        if (regno2 <= limit
> +	  && !cfun->machine->reg_is_wrapped_separately[regno2]
>  	  && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
>  	      == cfun->machine->frame.reg_offset[regno2]))
>  	{
> @@ -3224,6 +3232,272 @@ aarch64_restore_callee_saves (machine_mode mode,
>      }
>  }
>  
> +static inline bool
> +offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
> +			       HOST_WIDE_INT offset)
> +{
> +  return offset >= -256 && offset < 256;

A follow-up changing this to use IN_RANGE would seem correct.

> +}
> +
> +static inline bool
> +offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
> +{
> +  return (offset >= 0
> +	  && offset < 4096 * GET_MODE_SIZE (mode)
> +	  && offset % GET_MODE_SIZE (mode) == 0);
> +}
> +
> +bool
> +aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
> +{
> +  return (offset >= -64 * GET_MODE_SIZE (mode)
> +	  && offset < 64 * GET_MODE_SIZE (mode)
> +	  && offset % GET_MODE_SIZE (mode) == 0);

And certainly here, IN_RANGE would be neater.

> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> +
> +static sbitmap
> +aarch64_get_separate_components (void)
> +{
> +  aarch64_layout_frame ();
> +
> +  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
> +  bitmap_clear (components);
> +
> +  /* The registers we need saved to the frame.  */
> +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> +    if (aarch64_register_saved_on_entry (regno))
> +      {
> +	HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
> +	if (!frame_pointer_needed)
> +	  offset += cfun->machine->frame.frame_size
> +		    - cfun->machine->frame.hard_fp_offset;
> +	/* Check that we can access the stack slot of the register with one
> +	   direct load with no adjustments needed.  */
> +	if (offset_12bit_unsigned_scaled_p (DImode, offset))
> +	  bitmap_set_bit (components, regno);
> +      }
> +
> +  /* Don't mess with the hard frame pointer.  */
> +  if (frame_pointer_needed)
> +    bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> +
> +  unsigned reg1 = cfun->machine->frame.wb_candidate1;
> +  unsigned reg2 = cfun->machine->frame.wb_candidate2;
> +  /* If aarch64_layout_frame has chosen registers to store/restore with
> +     writeback don't interfere with them to avoid having to output explicit
> +     stack adjustment instructions.  */
> +  if (reg2 != INVALID_REGNUM)
> +    bitmap_clear_bit (components, reg2);
> +  if (reg1 != INVALID_REGNUM)
> +    bitmap_clear_bit (components, reg1);
> +
> +  bitmap_clear_bit (components, LR_REGNUM);
> +  bitmap_clear_bit (components, SP_REGNUM);
> +
> +  return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  */
> +
> +static sbitmap
> +aarch64_components_for_bb (basic_block bb)
> +{
> +  bitmap in = DF_LIVE_IN (bb);
> +  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
> +  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
> +
> +  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
> +  bitmap_clear (components);
> +
> +  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
> +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)

The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're hardcoding
where the end of the register file is (does this, for example, fall apart
with the SVE work that was recently posted). Something like a
LAST_HARDREG_NUM might work?

> +    if ((!call_used_regs[regno])
> +       && (bitmap_bit_p (in, regno)
> +	   || bitmap_bit_p (gen, regno)
> +	   || bitmap_bit_p (kill, regno)))
> +	  bitmap_set_bit (components, regno);
> +
> +  return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS.
> +   Nothing to do for aarch64.  */
> +
> +static void
> +aarch64_disqualify_components (sbitmap, edge, sbitmap, bool)
> +{
> +}

Is there no default "do nothing" hook for this?

> +
> +/* Return the next set bit in BMP from START onwards.  Return the total number
> +   of bits in BMP if no set bit is found at or after START.  */
> +
> +static unsigned int
> +aarch64_get_next_set_bit (sbitmap bmp, unsigned int start)
> +{
> +  unsigned int nbits = SBITMAP_SIZE (bmp);
> +  if (start == nbits)
> +    return start;
> +
> +  gcc_assert (start < nbits);
> +  for (unsigned int i = start; i < nbits; i++)
> +    if (bitmap_bit_p (bmp, i))
> +      return i;
> +
> +  return nbits;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS.  */
> +
> +static void
> +aarch64_emit_prologue_components (sbitmap components)
> +{
> +  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
> +			     ? HARD_FRAME_POINTER_REGNUM
> +			     : STACK_POINTER_REGNUM);
> +
> +  unsigned total_bits = SBITMAP_SIZE (components);

Would this be clearer called last_regno ?

> +  unsigned regno = aarch64_get_next_set_bit (components, R0_REGNUM);
> +  rtx_insn *insn = NULL;
> +
> +  while (regno != total_bits)
> +    {
> +      machine_mode mode = GP_REGNUM_P (regno) ? DImode : DFmode;

Comment about why this can be DFmode rather than some 128-bit mode might
be useful here.

> +      rtx reg = gen_rtx_REG (mode, regno);
> +      HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
> +      if (!frame_pointer_needed)
> +	offset += cfun->machine->frame.frame_size
> +		  - cfun->machine->frame.hard_fp_offset;
> +      rtx addr = plus_constant (Pmode, ptr_reg, offset);
> +      rtx mem = gen_frame_mem (mode, addr);
> +
> +      rtx set = gen_rtx_SET (mem, reg);
> +      unsigned regno2 = aarch64_get_next_set_bit (components, regno + 1);
> +      /* No more registers to save after REGNO.
> +	 Emit a single save and exit.  */
> +      if (regno2 == total_bits)
> +	{
> +	  insn = emit_insn (set);
> +	  RTX_FRAME_RELATED_P (insn) = 1;
> +	  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
> +	  break;
> +	}
> +
> +      HOST_WIDE_INT offset2 = cfun->machine->frame.reg_offset[regno2];
> +      /* The next register is not of the same class or its offset is not
> +	 mergeable with the current one into a pair.  */
> +      if (!satisfies_constraint_Ump (mem)
> +	  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> +	  || (offset2 - cfun->machine->frame.reg_offset[regno])
> +		!= GET_MODE_SIZE (DImode))
> +	{
> +	  insn = emit_insn (set);
> +	  RTX_FRAME_RELATED_P (insn) = 1;
> +	  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
> +
> +	  regno = regno2;
> +	  continue;
> +	}
> +
> +      /* REGNO2 can be stored in a pair with REGNO.  */
> +      rtx reg2 = gen_rtx_REG (mode, regno2);
> +      if (!frame_pointer_needed)
> +	offset2 += cfun->machine->frame.frame_size
> +		  - cfun->machine->frame.hard_fp_offset;
> +      rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
> +      rtx mem2 = gen_frame_mem (mode, addr2);
> +      rtx set2 = gen_rtx_SET (mem2, reg2);
> +
> +      insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, reg2));
> +      RTX_FRAME_RELATED_P (insn) = 1;
> +      add_reg_note (insn, REG_CFA_OFFSET, set);
> +      add_reg_note (insn, REG_CFA_OFFSET, set2);
> +
> +      regno = aarch64_get_next_set_bit (components, regno2 + 1);
> +    }
> +
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS.  */
> +
> +static void
> +aarch64_emit_epilogue_components (sbitmap components)

Given the similarity of the logic, is there any way you can refactor this
with the prologue code above?

> +{
> +
> +  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
> +			     ? HARD_FRAME_POINTER_REGNUM
> +			     : STACK_POINTER_REGNUM);
> +  unsigned total_bits = SBITMAP_SIZE (components);
> +  unsigned regno = aarch64_get_next_set_bit (components, R0_REGNUM);
> +  rtx_insn *insn = NULL;
> +
> +  while (regno != total_bits)
> +    {
> +      machine_mode mode = GP_REGNUM_P (regno) ? DImode : DFmode;
> +      rtx reg = gen_rtx_REG (mode, regno);
> +      HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
> +      if (!frame_pointer_needed)
> +	offset += cfun->machine->frame.frame_size
> +		  - cfun->machine->frame.hard_fp_offset;
> +      rtx addr = plus_constant (Pmode, ptr_reg, offset);
> +      rtx mem = gen_frame_mem (mode, addr);
> +
> +      unsigned regno2 = aarch64_get_next_set_bit (components, regno + 1);
> +      /* No more registers after REGNO to restore.
> +	 Emit a single restore and exit.  */
> +      if (regno2 == total_bits)
> +	{
> +	  insn = emit_move_insn (reg, mem);
> +	  RTX_FRAME_RELATED_P (insn) = 1;
> +	  add_reg_note (insn, REG_CFA_RESTORE, reg);
> +	  break;
> +	}
> +
> +      HOST_WIDE_INT offset2 = cfun->machine->frame.reg_offset[regno2];
> +      /* The next register is not of the same class or its offset is not
> +	 mergeable with the current one into a pair or the offset doesn't fit
> +	 for a load pair.  Emit a single restore and continue from REGNO2.  */
> +      if (!satisfies_constraint_Ump (mem)
> +	  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> +	  || (offset2 - cfun->machine->frame.reg_offset[regno])
> +		!= GET_MODE_SIZE (DImode))
> +	{
> +	  insn = emit_move_insn (reg, mem);
> +	  RTX_FRAME_RELATED_P (insn) = 1;
> +	  add_reg_note (insn, REG_CFA_RESTORE, reg);
> +
> +	  regno = regno2;
> +	  continue;
> +	}
> +
> +      /* REGNO2 can be loaded in a pair with REGNO.  */
> +      rtx reg2 = gen_rtx_REG (mode, regno2);
> +      if (!frame_pointer_needed)
> +	offset2 += cfun->machine->frame.frame_size
> +		  - cfun->machine->frame.hard_fp_offset;
> +      rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
> +      rtx mem2 = gen_frame_mem (mode, addr2);
> +
> +      insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2));
> +      RTX_FRAME_RELATED_P (insn) = 1;
> +      add_reg_note (insn, REG_CFA_RESTORE, reg);
> +      add_reg_note (insn, REG_CFA_RESTORE, reg2);
> +
> +      regno = aarch64_get_next_set_bit (components, regno2 + 1);
> +    }
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS.  */
> +
> +static void
> +aarch64_set_handled_components (sbitmap components)
> +{
> +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> +    if (bitmap_bit_p (components, regno))
> +      cfun->machine->reg_is_wrapped_separately[regno] = true;
> +}
> +
>  /* AArch64 stack frames generated by this compiler look like:
>  
>  	+-------------------------------+
> @@ -3944,29 +4218,6 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x,
>    return false;
>  }
>  
> -bool
> -aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
> -{
> -  return (offset >= -64 * GET_MODE_SIZE (mode)
> -	  && offset < 64 * GET_MODE_SIZE (mode)
> -	  && offset % GET_MODE_SIZE (mode) == 0);
> -}
> -
> -static inline bool
> -offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
> -			       HOST_WIDE_INT offset)
> -{
> -  return offset >= -256 && offset < 256;
> -}
> -
> -static inline bool
> -offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
> -{
> -  return (offset >= 0
> -	  && offset < 4096 * GET_MODE_SIZE (mode)
> -	  && offset % GET_MODE_SIZE (mode) == 0);
> -}
> -
>  /* Return true if MODE is one of the modes for which we
>     support LDP/STP operations.  */
>  
> @@ -14452,6 +14703,30 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
>  #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD \
>    aarch64_first_cycle_multipass_dfa_lookahead_guard
>  
> +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \
> +  aarch64_get_separate_components
> +
> +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB
> +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB \
> +  aarch64_components_for_bb
> +
> +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
> +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \
> +  aarch64_disqualify_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \
> +  aarch64_emit_prologue_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \
> +  aarch64_emit_epilogue_components
> +
> +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS
> +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \
> +  aarch64_set_handled_components
> +
>  #undef TARGET_TRAMPOLINE_INIT
>  #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 584ff5c..fb89e5a 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -591,6 +591,8 @@ struct GTY (()) aarch64_frame
>  typedef struct GTY (()) machine_function
>  {
>    struct aarch64_frame frame;
> +  /* One entry for each GPR and FP register.  */
> +  bool reg_is_wrapped_separately[V31_REGNUM + 1];

Another hardcoded use of V31_REGNUM.

Thanks,
James

  parent reply	other threads:[~2016-11-29 10:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-10 14:26 Kyrill Tkachov
2016-11-10 16:33 ` Segher Boessenkool
2016-11-10 16:45   ` Kyrill Tkachov
2016-11-10 22:42 ` Andrew Pinski
2016-11-10 23:39   ` Segher Boessenkool
2016-11-11 10:18     ` Kyrill Tkachov
2016-11-11 15:31       ` Kyrill Tkachov
2016-11-14 14:25         ` Kyrill Tkachov
2016-11-17 14:22           ` Kyrill Tkachov
2016-11-17 14:44             ` Segher Boessenkool
2016-11-17 14:55               ` Kyrill Tkachov
2016-11-17 15:02                 ` Segher Boessenkool
2016-11-17 15:06                   ` Kyrill Tkachov
2016-11-17 16:50                 ` Kyrill Tkachov
2016-11-17 17:46                   ` Segher Boessenkool
2016-11-18  9:29                     ` Kyrill Tkachov
2016-11-18 12:50                       ` Segher Boessenkool
2016-11-22 16:47                         ` Kyrill Tkachov
2016-11-29 10:58           ` James Greenhalgh [this message]
2016-11-29 11:18             ` Kyrill Tkachov
2016-11-29 11:32               ` Kyrill Tkachov
2016-11-29 11:37                 ` James Greenhalgh
2016-11-29 20:29             ` Segher Boessenkool
2016-11-30 14:08               ` Kyrill Tkachov
2016-12-02 17:09                 ` James Greenhalgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161129105732.GA35224@arm.com \
    --to=james.greenhalgh@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=marcus.shawcroft@arm.com \
    --cc=nd@arm.com \
    --cc=pinskia@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).