public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
Date: Tue, 25 Oct 2022 13:00:22 +0000	[thread overview]
Message-ID: <AM0PR03MB4882F969E9D82C374962BFA182319@AM0PR03MB4882.eurprd03.prod.outlook.com> (raw)
In-Reply-To: <CAFiYyc1eGhbz3va2EmuFZ9BjjxXyDhP7W6q-pAbHQFUBcFy+Xg@mail.gmail.com>

Hi Richard,

> I don't follow how only having BASE + OFFSET addressing prevents
> calculation of complexity for other addressing modes?  Can you explain?

It's the valid_mem_ref_p target hook that prevents complexity calculation
for other addressing modes (for Mips and RISC-V).
Here's the snippet of the algorithm (after f9f69dd) for the complexity 
calculation, which is located at the beginning of the get_address_cost
function in tree-ssa-loop-ivopts.cc:

  if (!aff_combination_const_p (aff_inv))
    {
      parts.index = integer_one_node;
      /* Addressing mode "base + index".  */
      ok_without_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
      if (ratio != 1)
	{
	  parts.step = wide_int_to_tree (type, ratio);
	  /* Addressing mode "base + index << scale".  */
	  ok_with_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
	  if (!ok_with_ratio_p)
	    parts.step = NULL_TREE;
	}
      ...

The algorithm "builds up" the actual addressing mode step-by-step,
starting from BASE + INDEX. However, if valid_mem_ref_p returns
negative, parts.* is set to NULL_TREE and we bail out. For Mips (or 
RISC-V), it always returns negative (given we are building the addressing
mode up from BASE + INDEX), since Mips allows BASE + OFFSET only
(see the case PLUS in mips_classify_address in config/mips/mips.cc).
The result is that all addressing modes besides BASE + OFFSET, for Mips
(or RISC-V) have complexities of 0. f9f69dd introduced calls to valid_mem_ref_p
target hook, which were not there before, and I'm not sure why exactly.

> Do you have a testcase that shows how both changes improve IV selection
> for MIPS?

Certainly, consider the following test case:

void daxpy(float *vector1, float *vector2, int n, float fp_const)
{
	for (int i = 0; i < n; ++i)
		vector1[i] += fp_const * vector2[i];
}

void dgefa(float *vector, int m, int n, int l)
{
	for (int i = 0; i < n - 1; ++i)
	{
		for (int j = i + 1; j < n; ++j)
		{
			float t = vector[m * j + l];
			daxpy(&vector[m * i + i + 1], &vector[m * j + i + 1], n - (i + 1), t);
		}
	}
}

At the third inner loop (which gets inlined from daxpy), an unoptimal candidate
selection takes place.
Worth noting is that f9f69dd doesn't change the costs (they are, however, multiplied by
a factor, but what was lesser/greater before is lesser/greater after). Here's how complexities
stand:

===== Before f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.	inv.vars
  1	13	1	3;	NIL;
  2	13	2	4;	NIL;
  4	9	1	5;	NIL;
  5	1	0	NIL;	NIL;
  7	9	1	3;	NIL;
===== Before f9f69dd =====
===== After f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.	inv.vars
  1	10	0	4;	NIL;
  2	10	0	5;	NIL;
  4	6	0	6;	NIL;
  5	1	0	NIL;	NIL;
  7	6	0	4;	NIL;
===== After f9f69dd =====

Notice how all complexities are zero, even though the candidates have
different addressing modes.

For this particular example, this leads to a different candidate selection:

===== Before f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
Candidate 5:
  Var befor: ivtmp.18_99
  Var after: ivtmp.18_98
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _14)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== Before f9f69dd =====
===== After f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 1 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== After f9f69dd =====

which, in turn, leads to the following assembly sequence:

===== Before f9f69dd =====
.L83:
	lwc1	$f5,0($3)
	lwc1	$f8,0($2)
	lwc1	$f7,4($2)
	lwc1	$f6,8($2)
	lwc1	$f9,12($2)
	lwc1	$f10,16($2)
	maddf.s	$f8,$f0,$f5
	lwc1	$f11,20($2)
	lwc1	$f12,24($2)
	lwc1	$f13,28($2)
	ld	$12,72($sp)
	swc1	$f8,0($2)
	lwc1	$f14,4($3)
	maddf.s	$f7,$f0,$f14
	swc1	$f7,4($2)
	lwc1	$f15,8($3)
	maddf.s	$f6,$f0,$f15
	swc1	$f6,8($2)
	lwc1	$f16,12($3)
	maddf.s	$f9,$f0,$f16
	swc1	$f9,12($2)
	lwc1	$f17,16($3)
	maddf.s	$f10,$f0,$f17
	swc1	$f10,16($2)
	lwc1	$f18,20($3)
	maddf.s	$f11,$f0,$f18
	swc1	$f11,20($2)
	lwc1	$f19,24($3)
	maddf.s	$f12,$f0,$f19
	swc1	$f12,24($2)
	lwc1	$f20,28($3)
	maddf.s	$f13,$f0,$f20
	swc1	$f13,28($2)
	daddiu	$2,$2,32
	bne	$2,$12,.L83
	daddiu	$3,$3,32
        ...
===== Before f9f69dd =====
===== After f9f69dd =====
.L93:
	dsubu	$18,$2,$4
	lwc1	$f13,0($2)
	daddu	$19,$18,$5
	daddiu	$16,$2,4
	lwc1	$f14,0($19)
	dsubu	$17,$16,$4
	daddu	$25,$17,$5
	lwc1	$f15,4($2)
	daddiu	$19,$2,12
	daddiu	$20,$2,8
	maddf.s	$f13,$f1,$f14
	dsubu	$16,$19,$4
	daddiu	$17,$2,16
	dsubu	$18,$20,$4
	daddu	$19,$16,$5
	daddiu	$16,$2,20
	lwc1	$f10,8($2)
	daddu	$20,$18,$5
	lwc1	$f16,12($2)
	dsubu	$18,$17,$4
	lwc1	$f17,16($2)
	dsubu	$17,$16,$4
	lwc1	$f18,20($2)
	daddiu	$16,$2,24
	lwc1	$f20,24($2)
	daddu	$18,$18,$5
	swc1	$f13,0($2)
	daddu	$17,$17,$5
	lwc1	$f19,0($25)
	daddiu	$25,$2,28
	lwc1	$f11,28($2)
	daddiu	$2,$2,32
	dsubu	$16,$16,$4
	dsubu	$25,$25,$4
	maddf.s	$f15,$f1,$f19
	daddu	$16,$16,$5
	daddu	$25,$25,$5
	swc1	$f15,-28($2)
	lwc1	$f21,0($20)
	ld	$20,48($sp)
	maddf.s	$f10,$f1,$f21
	swc1	$f10,-24($2)
	lwc1	$f22,0($19)
	maddf.s	$f16,$f1,$f22
	swc1	$f16,-20($2)
	lwc1	$f23,0($18)
	maddf.s	$f17,$f1,$f23
	swc1	$f17,-16($2)
	lwc1	$f0,0($17)
	maddf.s	$f18,$f1,$f0
	swc1	$f18,-12($2)
	lwc1	$f7,0($16)
	maddf.s	$f20,$f1,$f7
	swc1	$f20,-8($2)
	lwc1	$f12,0($25)
	maddf.s	$f11,$f1,$f12
	bne	$2,$20,.L93
	swc1	$f11,-4($2)
        ...
===== After f9f69dd =====

Notice the additional instructions used for index calculation, due to
unoptimal candidate selection.

Regards,
Dimitrije

From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, October 25, 2022 1:08 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.

I don't follow how only having BASE + OFFSET addressing prevents
calculation of complexity for other addressing modes?  Can you explain?

Do you have a testcase that shows how both changes improve IV selection
for MIPS?

>
> gcc/ChangeLog:
>
>         * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
>         to non-static.
>         * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
>         * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
>         (compute_min_and_max_offset): Likewise.
>         (get_address_cost): Revert
>         complexity calculation.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
>  gcc/tree-ssa-address.cc     |   2 +-
>  gcc/tree-ssa-address.h      |   2 +
>  gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
>  3 files changed, 207 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
> index ba7b7c93162..442f54f0165 100644
> --- a/gcc/tree-ssa-address.cc
> +++ b/gcc/tree-ssa-address.cc
> @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
>     validity for a memory reference accessing memory of mode MODE in address
>     space AS.  */
>
> -static bool
> +bool
>  multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
>                                  addr_space_t as)
>  {
> diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
> index 95143a099b9..09f36ee2f19 100644
> --- a/gcc/tree-ssa-address.h
> +++ b/gcc/tree-ssa-address.h
> @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
>                      class aff_tree *, tree, tree, tree, bool);
>  extern void copy_ref_info (tree, tree);
>  tree maybe_fold_tmr (tree);
> +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> +                                addr_space_t as);
>
>  extern unsigned int preferred_mem_scale_factor (tree base,
>                                                 machine_mode mem_mode,
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index a6f926a68ef..d53ba05a4f6 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
>    return infinite_cost;
>  }
>
> +static void
> +compute_symbol_and_var_present (tree e1, tree e2,
> +       bool *symbol_present, bool *var_present)
> +{
> +  poly_uint64_pod off1, off2;
> +
> +  e1 = strip_offset (e1, &off1);
> +  e2 = strip_offset (e2, &off2);
> +
> +  STRIP_NOPS (e1);
> +  STRIP_NOPS (e2);
> +
> +  if (TREE_CODE (e1) == ADDR_EXPR)
> +    {
> +      poly_int64_pod diff;
> +      if (ptr_difference_const (e1, e2, &diff))
> +  {
> +    *symbol_present = false;
> +    *var_present = false;
> +    return;
> +  }
> +
> +      if (integer_zerop (e2))
> +  {
> +    tree core;
> +    poly_int64_pod bitsize;
> +    poly_int64_pod bitpos;
> +    widest_int mul;
> +    tree toffset;
> +    machine_mode mode;
> +    int unsignedp, reversep, volatilep;
> +
> +    core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
> +      &toffset, &mode, &unsignedp, &reversep, &volatilep);
> +
> +    if (toffset != 0
> +    || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
> +    || reversep
> +    || !VAR_P (core))
> +      {
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +      }
> +
> +    if (TREE_STATIC (core)
> +    || DECL_EXTERNAL (core))
> +      {
> +    *symbol_present = true;
> +    *var_present = false;
> +    return;
> +      }
> +
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +  }
> +
> +      *symbol_present = false;
> +      *var_present = true;
> +    }
> +  *symbol_present = false;
> +
> +  if (operand_equal_p (e1, e2, 0))
> +    {
> +      *var_present = false;
> +      return;
> +    }
> +
> +  *var_present = true;
> +}
> +
> +static void
> +compute_min_and_max_offset (addr_space_t as,
> +       machine_mode mem_mode, poly_int64_pod *min_offset,
> +       poly_int64_pod *max_offset)
> +{
> +  machine_mode address_mode = targetm.addr_space.address_mode (as);
> +  HOST_WIDE_INT i;
> +  poly_int64_pod off, width;
> +  rtx addr;
> +  rtx reg1;
> +
> +  reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
> +
> +  width = GET_MODE_BITSIZE (address_mode) - 1;
> +  if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
> +         width = HOST_BITS_PER_WIDE_INT - 1;
> +  gcc_assert (width.is_constant ());
> +  addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> +
> +  off = 0;
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = -(HOST_WIDE_INT_1U << i);
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +  if (i == -1)
> +    *min_offset = 0;
> +  else
> +    *min_offset = off;
> +  // *min_offset = (i == -1? 0 : off);
> +
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = (HOST_WIDE_INT_1U << i) - 1;
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    /* For some strict-alignment targets, the offset must be naturally
> +      aligned.  Try an aligned offset if mem_mode is not QImode.  */
> +      off = mem_mode != QImode
> +      ? (HOST_WIDE_INT_1U << i)
> +      - (GET_MODE_SIZE (mem_mode))
> +      : 0;
> +      if (known_gt (off, 0))
> +    {
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +    }
> +  if (i == -1)
> +         off = 0;
> +  *max_offset = off;
> +}
> +
>  /* Return cost of computing USE's address expression by using CAND.
>     AFF_INV and AFF_VAR represent invariant and variant parts of the
>     address expression, respectively.  If AFF_INV is simple, store
> @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    /* Only true if ratio != 1.  */
>    bool ok_with_ratio_p = false;
>    bool ok_without_ratio_p = false;
> +  tree ubase = use->iv->base;
> +  tree cbase = cand->iv->base, cstep = cand->iv->step;
> +  tree utype = TREE_TYPE (ubase), ctype;
> +  unsigned HOST_WIDE_INT cstepi;
> +  bool symbol_present = false, var_present = false, stmt_is_after_increment;
> +  poly_int64_pod min_offset, max_offset;
> +  bool offset_p, ratio_p;
>
>    if (!aff_combination_const_p (aff_inv))
>      {
> @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
>    cost += address_cost (addr, mem_mode, as, speed);
>
> -  if (parts.symbol != NULL_TREE)
> -    cost.complexity += 1;
> -  /* Don't increase the complexity of adding a scaled index if it's
> -     the only kind of index that the target allows.  */
> -  if (parts.step != NULL_TREE && ok_without_ratio_p)
> -    cost.complexity += 1;
> -  if (parts.base != NULL_TREE && parts.index != NULL_TREE)
> -    cost.complexity += 1;
> -  if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
> -    cost.complexity += 1;
> +  if (cst_and_fits_in_hwi (cstep))
> +    cstepi = int_cst_value (cstep);
> +  else
> +    cstepi = 0;
> +
> +  STRIP_NOPS (cbase);
> +  ctype = TREE_TYPE (cbase);
> +
> +  stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
> +    use->stmt);
> +
> +  if (cst_and_fits_in_hwi (cbase))
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +  else if (ratio == 1)
> +    {
> +      tree real_cbase = cbase;
> +
> +      /* Check to see if any adjustment is needed.  */
> +      if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
> +       {
> +         aff_tree real_cbase_aff;
> +         aff_tree cstep_aff;
> +
> +         tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
> +                                  &real_cbase_aff);
> +         tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
> +
> +         aff_combination_add (&real_cbase_aff, &cstep_aff);
> +         real_cbase = aff_combination_to_tree (&real_cbase_aff);
> +       }
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else if (!POINTER_TYPE_P (ctype)
> +          && multiplier_allowed_in_address_p
> +               (ratio, mem_mode,
> +                       TYPE_ADDR_SPACE (TREE_TYPE (utype))))
> +    {
> +      tree real_cbase = cbase;
> +
> +      if (cstepi == 0 && stmt_is_after_increment)
> +       {
> +         if (POINTER_TYPE_P (ctype))
> +           real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
> +         else
> +           real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
> +       }
> +      real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
> +                               build_int_cst (ctype, ratio));
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else
> +    {
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +    }
> +
> +  compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
> +  offset_p = maybe_ne (aff_inv->offset, 0)
> +       && known_le (min_offset, aff_inv->offset)
> +       && known_le (aff_inv->offset, max_offset);
> +  ratio_p = (ratio != 1
> +            && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> +
> +  cost.complexity = (symbol_present != 0) + (var_present != 0)
> +       + offset_p + ratio_p;
>
>    return cost;
>  }
> --
> 2.25.1
>

  reply	other threads:[~2022-10-25 13:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
2022-10-25 11:08   ` Richard Biener
2022-10-25 13:00     ` Dimitrije Milosevic [this message]
2022-10-27 23:02   ` Jeff Law
2022-10-28  6:43     ` Dimitrije Milosevic
2022-10-28  7:00       ` Richard Biener
2022-10-28 13:39         ` Dimitrije Milosevic
2022-11-01 18:46         ` Jeff Law
2022-11-02  8:40           ` Dimitrije Milosevic
2022-11-07 13:35             ` Richard Biener
2022-12-15 15:26               ` Dimitrije Milosevic
2022-12-16  9:58                 ` Richard Biener
2022-12-16 11:37                   ` Dimitrije Milosevic
2022-12-16 11:58                     ` Richard Biener
2022-10-21 13:52 ` [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure Dimitrije Milosevic
2022-10-25 11:07   ` Richard Biener
2022-10-25 13:00     ` Dimitrije Milosevic
2022-10-28  7:38       ` Richard Biener
2022-10-28 13:39         ` Dimitrije Milosevic
2022-11-07 12:56           ` Richard Biener
2024-03-18 11:28 [PATCH 1/2] ivopts: Revert computation of address cost complexity Aleksandar Rakic
2024-03-18 20:27 Aleksandar Rakic
2024-04-15 13:30 ` Aleksandar Rakic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR03MB4882F969E9D82C374962BFA182319@AM0PR03MB4882.eurprd03.prod.outlook.com \
    --to=dimitrije.milosevic@syrmia.com \
    --cc=Djordje.Todorovic@syrmia.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).