From: 钟居哲 <juzhe.zhong@rivai.ai>
To: richard.sandiford <richard.sandiford@arm.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, rguenther <rguenther@suse.de>
Subject: 回复: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support
Date: Wed, 24 May 2023 23:31:40 +0800 [thread overview]
Message-ID: <445C203252AFFD2F+2023052423314032757830@rivai.ai> (raw)
In-Reply-To: <2023052423130398041121@rivai.ai>
[-- Attachment #1: Type: text/plain, Size: 17423 bytes --]
Hi, Richard.
I think it can work after I analyze it.
Let's take a look the codes:
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
}
}
For "x", each scalar iteration calculate 2 elements (x[i + 0] and x[i + 1])
For "y", each scalar iteration calculate 4 elements (y[i + 0] and y[i + 1] and y[j + 2] and y[j + 3)
With this patch:
loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
The total elements of "x" vector of each iteration is maximum 8 which is 128bit (8 16bit elements)
So the vector can process "4" scalar iterations (x[i + 0] and x[i + 1])
So there is a len_load: vect__1.6_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);
Since the INT16 (x) is "4" scalar iterations, then INT8 ("y") is also 4 scalar iterations and
each process 4 scalar elements (y[i + 0] and y[i + 1] and y[j + 2] and y[j + 3)
So you can see 4 vector operations of y:
vect__11.18_59 = vect__10.14_52 + { 1, 2, 3, 4 };
vect__11.18_60 = vect__10.15_54 + { 1, 2, 3, 4 };
vect__11.18_61 = vect__10.16_56 + { 1, 2, 3, 4 };
vect__11.18_62 = vect__10.17_58 + { 1, 2, 3, 4 };
.LEN_STORE (_31, 32B, loop_len_48, vect__11.18_59, 0);
.LEN_STORE (_29, 32B, loop_len_49, vect__11.18_60, 0);
.LEN_STORE (_25, 32B, loop_len_50, vect__11.18_61, 0);
.LEN_STORE (_79, 32B, loop_len_51, vect__11.18_62, 0);
So each vector loop has 1 group "x" (4 * 2 elements = 8 elements) and 4 group "y" (4 * 4)
And we adjust loop len for each control of y:
loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
loop_len_48 = MIN_EXPR <loop_len_34, 4>;
_74 = loop_len_34 - loop_len_48;
loop_len_49 = MIN_EXPR <_74, 4>;
_75 = _74 - loop_len_49;
loop_len_50 = MIN_EXPR <_75, 4>;
loop_len_51 = _75 - loop_len_50;
It seems to work. I wonder why we need multiplication ?
Thanks.
juzhe.zhong@rivai.ai
发件人: 钟居哲
发送时间: 2023-05-24 23:13
收件人: richard.sandiford
抄送: gcc-patches; rguenther
主题: Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support
Hi, the .optimized dump is like this:
<bb 2> [local count: 21045336]:
ivtmp.26_36 = (unsigned long) &x;
ivtmp.27_3 = (unsigned long) &y;
ivtmp.30_6 = (unsigned long) &MEM <int[200]> [(void *)&y + 16B];
ivtmp.31_10 = (unsigned long) &MEM <int[200]> [(void *)&y + 32B];
ivtmp.32_14 = (unsigned long) &MEM <int[200]> [(void *)&y + 48B];
<bb 3> [local count: 273589366]:
# ivtmp_72 = PHI <ivtmp_73(3), 100(2)>
# ivtmp.26_41 = PHI <ivtmp.26_37(3), ivtmp.26_36(2)>
# ivtmp.27_1 = PHI <ivtmp.27_2(3), ivtmp.27_3(2)>
# ivtmp.30_4 = PHI <ivtmp.30_5(3), ivtmp.30_6(2)>
# ivtmp.31_8 = PHI <ivtmp.31_9(3), ivtmp.31_10(2)>
# ivtmp.32_12 = PHI <ivtmp.32_13(3), ivtmp.32_14(2)>
loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
loop_len_48 = MIN_EXPR <loop_len_34, 4>;
_74 = loop_len_34 - loop_len_48;
loop_len_49 = MIN_EXPR <_74, 4>;
_75 = _74 - loop_len_49;
loop_len_50 = MIN_EXPR <_75, 4>;
loop_len_51 = _75 - loop_len_50;
_16 = (void *) ivtmp.26_41;
_17 = &MEM <vector(8) short int> [(short int *)_16];
vect__1.6_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);
vect__2.7_23 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(vect__1.6_33);
vect__3.8_22 = vect__2.7_23 + { 1, 2, 1, 2, 1, 2, 1, 2 };
vect__4.9_21 = VIEW_CONVERT_EXPR<vector(8) short int>(vect__3.8_22);
.LEN_STORE (_17, 16B, loop_len_34, vect__4.9_21, 0);
_20 = (void *) ivtmp.27_1;
_31 = &MEM <vector(4) int> [(int *)_20];
vect__10.14_52 = .LEN_LOAD (_31, 32B, loop_len_48, 0);
_30 = (void *) ivtmp.30_4;
_29 = &MEM <vector(4) int> [(int *)_30];
vect__10.15_54 = .LEN_LOAD (_29, 32B, loop_len_49, 0);
_26 = (void *) ivtmp.31_8;
_25 = &MEM <vector(4) int> [(int *)_26];
vect__10.16_56 = .LEN_LOAD (_25, 32B, loop_len_50, 0);
_78 = (void *) ivtmp.32_12;
_79 = &MEM <vector(4) int> [(int *)_78];
vect__10.17_58 = .LEN_LOAD (_79, 32B, loop_len_51, 0);
vect__11.18_59 = vect__10.14_52 + { 1, 2, 3, 4 };
vect__11.18_60 = vect__10.15_54 + { 1, 2, 3, 4 };
vect__11.18_61 = vect__10.16_56 + { 1, 2, 3, 4 };
vect__11.18_62 = vect__10.17_58 + { 1, 2, 3, 4 };
.LEN_STORE (_31, 32B, loop_len_48, vect__11.18_59, 0);
.LEN_STORE (_29, 32B, loop_len_49, vect__11.18_60, 0);
.LEN_STORE (_25, 32B, loop_len_50, vect__11.18_61, 0);
.LEN_STORE (_79, 32B, loop_len_51, vect__11.18_62, 0);
ivtmp_73 = ivtmp_72 - loop_len_34;
ivtmp.26_37 = ivtmp.26_41 + 16;
ivtmp.27_2 = ivtmp.27_1 + 64;
ivtmp.30_5 = ivtmp.30_4 + 64;
ivtmp.31_9 = ivtmp.31_8 + 64;
ivtmp.32_13 = ivtmp.32_12 + 64;
if (ivtmp_73 != 0)
goto <bb 3>; [92.31%]
else
goto <bb 4>; [7.69%]
I am still check about it but I send it to you earlier.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-05-24 23:07
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support
Thanks for trying it. I'm still surprised that no multiplication
is needed though. Does the patch work for:
short x[100];
int y[200];
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
}
}
? Here, there should be a single-control rgroup for x, counting
2 units per scalar iteration. I'd expect the IV to use this scale.
There should also be a 4-control rgroup for y, counting 4 units per
scalar iteration. So I think the IV would need to be multiplied by 2
before being used for the y rgroup.
Thanks,
Richard
juzhe.zhong@rivai.ai writes:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> This patch is supporting decrement IV by following the flow designed by Richard:
>
> (1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
> call vect_set_loop_controls_directly.
>
> (2) vect_set_loop_controls_directly calculates "step" as in your patch.
> If rgc has 1 control, this step is the SSA name created for that control.
> Otherwise the step is a fresh SSA name, as in your patch.
>
> (3) vect_set_loop_controls_directly stores this step somewhere for later
> use, probably in LOOP_VINFO. Let's use "S" to refer to this stored step.
>
> (4) After the vect_set_loop_controls_directly call above, and outside
> the "if" statement that now contains vect_set_loop_controls_directly,
> check whether rgc->controls.length () > 1. If so, use
> vect_adjust_loop_lens_control to set the controls based on S.
>
> Then the only caller of vect_adjust_loop_lens_control is
> vect_set_loop_condition_partial_vectors. And the starting
> step for vect_adjust_loop_lens_control is always S.
>
> This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
> passed all testcase in RISC-V port.
>
> Also, pass tests for multiple-rgroup (non-SLP) tested on vec_pack_trunk.
>
> ---
> gcc/tree-vect-loop-manip.cc | 178 +++++++++++++++++++++++++++++++++---
> gcc/tree-vect-loop.cc | 13 +++
> gcc/tree-vectorizer.h | 12 +++
> 3 files changed, 192 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..578ac5b783e 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -468,6 +468,38 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
> gimple_stmt_iterator incr_gsi;
> bool insert_after;
> standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> + {
> + /* single rgroup:
> + ...
> + _10 = (unsigned long) count_12(D);
> + ...
> + # ivtmp_9 = PHI <ivtmp_35(6), _10(5)>
> + _36 = MIN_EXPR <ivtmp_9, POLY_INT_CST [4, 4]>;
> + ...
> + vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
> + ...
> + ivtmp_35 = ivtmp_9 - _36;
> + ...
> + if (ivtmp_35 != 0)
> + goto <bb 4>; [83.33%]
> + else
> + goto <bb 5>; [16.67%]
> + */
> + nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
> + tree step = rgc->controls.length () == 1 ? rgc->controls[0]
> + : make_ssa_name (iv_type);
> + /* Create decrement IV. */
> + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
> + insert_after, &index_before_incr, &index_after_incr);
> + gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
> + index_before_incr,
> + nitems_step));
> + LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo) = step;
> + return index_after_incr;
> + }
> +
> + /* Create increment IV. */
> create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
> loop, &incr_gsi, insert_after, &index_before_incr,
> &index_after_incr);
> @@ -683,6 +715,63 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
> return next_ctrl;
> }
>
> +/* Try to use adjust loop lens for multiple-rgroups.
> +
> + _36 = MIN_EXPR <ivtmp_34, VF>;
> +
> + First length (MIN (X, VF/N)):
> + loop_len_15 = MIN_EXPR <_36, VF/N>;
> +
> + Second length:
> + tmp = _36 - loop_len_15;
> + loop_len_16 = MIN (tmp, VF/N);
> +
> + Third length:
> + tmp2 = tmp - loop_len_16;
> + loop_len_17 = MIN (tmp2, VF/N);
> +
> + Last length:
> + loop_len_18 = tmp2 - loop_len_17;
> +*/
> +
> +static void
> +vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
> + rgroup_controls *dest_rgm, tree step)
> +{
> + tree ctrl_type = dest_rgm->type;
> + poly_uint64 nitems_per_ctrl
> + = TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
> + tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
> +
> + for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
> + {
> + tree ctrl = dest_rgm->controls[i];
> + if (i == 0)
> + {
> + /* First iteration: MIN (X, VF/N) capped to the range [0, VF/N]. */
> + gassign *assign
> + = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
> + gimple_seq_add_stmt (seq, assign);
> + }
> + else if (i == dest_rgm->controls.length () - 1)
> + {
> + /* Last iteration: Remain capped to the range [0, VF/N]. */
> + gassign *assign = gimple_build_assign (ctrl, MINUS_EXPR, step,
> + dest_rgm->controls[i - 1]);
> + gimple_seq_add_stmt (seq, assign);
> + }
> + else
> + {
> + /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N]. */
> + step = gimple_build (seq, MINUS_EXPR, iv_type, step,
> + dest_rgm->controls[i - 1]);
> + gassign *assign
> + = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
> + gimple_seq_add_stmt (seq, assign);
> + }
> + }
> +}
> +
> /* Set up the iteration condition and rgroup controls for LOOP, given
> that LOOP_VINFO_USING_PARTIAL_VECTORS_P is true for the vectorized
> loop. LOOP_VINFO describes the vectorization of LOOP. NITERS is
> @@ -753,17 +842,84 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
> continue;
> }
>
> - /* See whether zero-based IV would ever generate all-false masks
> - or zero length before wrapping around. */
> - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> -
> - /* Set up all controls for this group. */
> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> - &preheader_seq,
> - &header_seq,
> - loop_cond_gsi, rgc,
> - niters, niters_skip,
> - might_wrap_p);
> + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
> + || !LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo))
> + {
> + /* See whether zero-based IV would ever generate all-false masks
> + or zero length before wrapping around. */
> + bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> +
> + /* Set up all controls for this group. */
> + test_ctrl
> + = vect_set_loop_controls_directly (loop, loop_vinfo,
> + &preheader_seq, &header_seq,
> + loop_cond_gsi, rgc, niters,
> + niters_skip, might_wrap_p);
> + }
> +
> + /* Decrement IV only run vect_set_loop_controls_directly once. */
> + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
> + && rgc->controls.length () > 1)
> + {
> + /*
> + - Multiple rgroup (SLP):
> + ...
> + _38 = (unsigned long) bnd.7_29;
> + _39 = _38 * 2;
> + ...
> + # ivtmp_41 = PHI <ivtmp_42(6), _39(5)>
> + ...
> + _43 = MIN_EXPR <ivtmp_41, 32>;
> + loop_len_26 = MIN_EXPR <_43, 16>;
> + loop_len_25 = _43 - loop_len_26;
> + ...
> + .LEN_STORE (_6, 8B, loop_len_26, ...);
> + ...
> + .LEN_STORE (_25, 8B, loop_len_25, ...);
> + _33 = loop_len_26 / 2;
> + ...
> + .LEN_STORE (_8, 16B, _33, ...);
> + _36 = loop_len_25 / 2;
> + ...
> + .LEN_STORE (_15, 16B, _36, ...);
> + ivtmp_42 = ivtmp_41 - _43;
> + ...
> +
> + - Multiple rgroup (non-SLP):
> + ...
> + _38 = (unsigned long) n_12(D);
> + ...
> + # ivtmp_38 = PHI <ivtmp_39(3), 100(2)>
> + ...
> + _40 = MIN_EXPR <ivtmp_38, POLY_INT_CST [8, 8]>;
> + loop_len_21 = MIN_EXPR <_40, POLY_INT_CST [2, 2]>;
> + _41 = _40 - loop_len_21;
> + loop_len_20 = MIN_EXPR <_41, POLY_INT_CST [2, 2]>;
> + _42 = _40 - loop_len_20;
> + loop_len_19 = MIN_EXPR <_42, POLY_INT_CST [2, 2]>;
> + _43 = _40 - loop_len_19;
> + loop_len_16 = MIN_EXPR <_43, POLY_INT_CST [2, 2]>;
> + ...
> + vect__4.8_15 = .LEN_LOAD (_6, 64B, loop_len_21, 0);
> + ...
> + vect__4.9_8 = .LEN_LOAD (_13, 64B, loop_len_20, 0);
> + ...
> + vect__4.10_28 = .LEN_LOAD (_46, 64B, loop_len_19, 0);
> + ...
> + vect__4.11_30 = .LEN_LOAD (_49, 64B, loop_len_16, 0);
> + vect__7.13_31 = VEC_PACK_TRUNC_EXPR <...>,
> + vect__7.13_32 = VEC_PACK_TRUNC_EXPR <...>;
> + vect__7.12_33 = VEC_PACK_TRUNC_EXPR <...>;
> + ...
> + .LEN_STORE (_14, 16B, _40, vect__7.12_33, 0);
> + ivtmp_39 = ivtmp_38 - _40;
> + ...
> + */
> + tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> + tree step = LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo);
> + gcc_assert (step);
> + vect_adjust_loop_lens_control (iv_type, &header_seq, rgc, step);
> + }
> }
>
> /* Emit all accumulated statements. */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index cf10132b0bf..456f50fa7cc 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -973,6 +973,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
> vectorizable (false),
> can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> using_partial_vectors_p (false),
> + using_decrementing_iv_p (false),
> + decrementing_iv_step (NULL_TREE),
> epil_using_partial_vectors_p (false),
> partial_load_store_bias (0),
> peeling_for_gaps (false),
> @@ -2725,6 +2727,17 @@ start_over:
> && !vect_verify_loop_lens (loop_vinfo))
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>
> + /* If we're vectorizing an loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control. */
> + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> + && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo))))
> + LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> +
> /* If we're vectorizing an epilogue loop, the vectorized loop either needs
> to be able to handle fewer than VF scalars, or needs to have a lower VF
> than the main loop. */
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 02d2ad6fba1..7ed079f543a 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -818,6 +818,16 @@ public:
> the vector loop can handle fewer than VF scalars. */
> bool using_partial_vectors_p;
>
> + /* True if we've decided to use a decrementing loop control IV that counts
> + scalars. This can be done for any loop that:
> +
> + (a) uses length "controls"; and
> + (b) can iterate more than once. */
> + bool using_decrementing_iv_p;
> +
> + /* The variable amount step for decrement IV. */
> + tree decrementing_iv_step;
> +
> /* True if we've decided to use partially-populated vectors for the
> epilogue of loop. */
> bool epil_using_partial_vectors_p;
> @@ -890,6 +900,8 @@ public:
> #define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable
> #define LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P(L) (L)->can_use_partial_vectors_p
> #define LOOP_VINFO_USING_PARTIAL_VECTORS_P(L) (L)->using_partial_vectors_p
> +#define LOOP_VINFO_USING_DECREMENTING_IV_P(L) (L)->using_decrementing_iv_p
> +#define LOOP_VINFO_DECREMENTING_IV_STEP(L) (L)->decrementing_iv_step
> #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L) \
> (L)->epil_using_partial_vectors_p
> #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias
prev parent reply other threads:[~2023-05-24 15:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-24 14:48 juzhe.zhong
2023-05-24 15:07 ` Richard Sandiford
2023-05-24 15:13 ` 钟居哲
2023-05-24 15:31 ` Richard Sandiford
2023-05-24 15:42 ` 钟居哲
2023-05-24 15:47 ` Richard Sandiford
2023-05-24 15:52 ` 钟居哲
2023-05-24 16:00 ` Richard Sandiford
2023-05-24 16:15 ` 钟居哲
2023-05-24 16:37 ` 钟居哲
2023-05-24 20:05 ` Richard Sandiford
2023-05-25 3:05 ` juzhe.zhong
[not found] ` <2023052423130398041121@rivai.ai>
2023-05-24 15:31 ` 钟居哲 [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445C203252AFFD2F+2023052423314032757830@rivai.ai \
--to=juzhe.zhong@rivai.ai \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).