public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Noah Goldstein <goldstein.w.n@gmail.com>
To: libc-alpha@sourceware.org
Subject: [PATCH v1 08/23] x86: Optimize strspn in strspn-c.c
Date: Wed, 23 Mar 2022 16:57:24 -0500	[thread overview]
Message-ID: <20220323215734.3927131-8-goldstein.w.n@gmail.com> (raw)
In-Reply-To: <20220323215734.3927131-1-goldstein.w.n@gmail.com>

Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of
_mm_cmpistri. Also change offset to unsigned to avoid unnecessary
sign extensions.

geometric_mean(N=20) of all benchmarks that dont fallback on
sse2; New / Original: .901

All string/memory tests pass.
---
Geomtric Mean N=20 runs; All functions page aligned
len, align1, align2,  pos, New Time / Old Time
  1,      0,      0,  512,               0.768
  1,      1,      0,  512,               0.666
  1,      0,      1,  512,               1.193
  1,      1,      1,  512,               0.872
  2,      0,      0,  512,               0.698
  2,      2,      0,  512,               0.687
  2,      0,      2,  512,               1.393
  2,      2,      2,  512,               0.944
  3,      0,      0,  512,               0.691
  3,      3,      0,  512,               0.676
  3,      0,      3,  512,               1.388
  3,      3,      3,  512,               0.948
  4,      0,      0,  512,                0.74
  4,      4,      0,  512,               0.678
  4,      0,      4,  512,               1.421
  4,      4,      4,  512,               0.943
  5,      0,      0,  512,               0.691
  5,      5,      0,  512,               0.675
  5,      0,      5,  512,               1.348
  5,      5,      5,  512,               0.952
  6,      0,      0,  512,               0.685
  6,      6,      0,  512,                0.67
  6,      0,      6,  512,               1.333
  6,      6,      6,  512,                0.95
  7,      0,      0,  512,               0.688
  7,      7,      0,  512,               0.675
  7,      0,      7,  512,               1.344
  7,      7,      7,  512,               0.919
  8,      0,      0,  512,               0.716
  8,      0,      8,  512,               0.935
  9,      0,      0,  512,               0.716
  9,      1,      0,  512,               0.712
  9,      0,      9,  512,               0.956
  9,      1,      9,  512,               0.992
 10,      0,      0,  512,               0.699
 10,      2,      0,  512,                0.68
 10,      0,     10,  512,               0.952
 10,      2,     10,  512,               0.932
 11,      0,      0,  512,               0.705
 11,      3,      0,  512,               0.685
 11,      0,     11,  512,               0.956
 11,      3,     11,  512,               0.927
 12,      0,      0,  512,               0.695
 12,      4,      0,  512,               0.675
 12,      0,     12,  512,               0.948
 12,      4,     12,  512,               0.928
 13,      0,      0,  512,                 0.7
 13,      5,      0,  512,               0.678
 13,      0,     13,  512,               0.944
 13,      5,     13,  512,               0.931
 14,      0,      0,  512,               0.703
 14,      6,      0,  512,               0.678
 14,      0,     14,  512,               0.949
 14,      6,     14,  512,                0.93
 15,      0,      0,  512,               0.694
 15,      7,      0,  512,               0.678
 15,      0,     15,  512,               0.953
 15,      7,     15,  512,               0.924
 16,      0,      0,  512,               1.021
 16,      0,     16,  512,               1.067
 17,      0,      0,  512,               0.991
 17,      1,      0,  512,               0.984
 17,      0,     17,  512,               0.979
 17,      1,     17,  512,               0.993
 18,      0,      0,  512,               0.992
 18,      2,      0,  512,               1.008
 18,      0,     18,  512,               1.016
 18,      2,     18,  512,               0.993
 19,      0,      0,  512,               0.984
 19,      3,      0,  512,               0.985
 19,      0,     19,  512,               1.007
 19,      3,     19,  512,               1.006
 20,      0,      0,  512,               0.969
 20,      4,      0,  512,               0.968
 20,      0,     20,  512,               0.975
 20,      4,     20,  512,               0.975
 21,      0,      0,  512,               0.992
 21,      5,      0,  512,               0.992
 21,      0,     21,  512,                0.98
 21,      5,     21,  512,                0.97
 22,      0,      0,  512,               0.989
 22,      6,      0,  512,               0.987
 22,      0,     22,  512,                0.99
 22,      6,     22,  512,               0.985
 23,      0,      0,  512,               0.989
 23,      7,      0,  512,                0.98
 23,      0,     23,  512,                 1.0
 23,      7,     23,  512,               0.993
 24,      0,      0,  512,                0.99
 24,      0,     24,  512,               0.998
 25,      0,      0,  512,                1.01
 25,      1,      0,  512,                 1.0
 25,      0,     25,  512,                0.97
 25,      1,     25,  512,               0.967
 26,      0,      0,  512,               1.009
 26,      2,      0,  512,               0.986
 26,      0,     26,  512,               0.997
 26,      2,     26,  512,               0.993
 27,      0,      0,  512,               0.984
 27,      3,      0,  512,               0.997
 27,      0,     27,  512,               0.989
 27,      3,     27,  512,               0.976
 28,      0,      0,  512,               0.991
 28,      4,      0,  512,               1.003
 28,      0,     28,  512,               0.986
 28,      4,     28,  512,               0.989
 29,      0,      0,  512,               0.986
 29,      5,      0,  512,               0.985
 29,      0,     29,  512,               0.984
 29,      5,     29,  512,               0.977
 30,      0,      0,  512,               0.991
 30,      6,      0,  512,               0.987
 30,      0,     30,  512,               0.979
 30,      6,     30,  512,               0.974
 31,      0,      0,  512,               0.995
 31,      7,      0,  512,               0.995
 31,      0,     31,  512,               0.994
 31,      7,     31,  512,               0.984
  4,      0,      0,   32,               0.861
  4,      1,      0,   32,               0.864
  4,      0,      1,   32,               0.962
  4,      1,      1,   32,               0.967
  4,      0,      0,   64,               0.884
  4,      2,      0,   64,               0.818
  4,      0,      2,   64,               0.889
  4,      2,      2,   64,               0.918
  4,      0,      0,  128,               0.942
  4,      3,      0,  128,               0.884
  4,      0,      3,  128,               0.931
  4,      3,      3,  128,               0.883
  4,      0,      0,  256,               0.964
  4,      4,      0,  256,               0.922
  4,      0,      4,  256,               0.956
  4,      4,      4,  256,                0.93
  4,      5,      0,  512,               0.833
  4,      0,      5,  512,               1.027
  4,      5,      5,  512,               0.929
  4,      0,      0, 1024,               0.998
  4,      6,      0, 1024,               0.986
  4,      0,      6, 1024,               0.984
  4,      6,      6, 1024,               0.977
  4,      0,      0, 2048,               0.991
  4,      7,      0, 2048,               0.987
  4,      0,      7, 2048,               0.996
  4,      7,      7, 2048,                0.98
 10,      1,      0,   64,               0.826
 10,      1,      1,   64,               0.907
 10,      2,      0,   64,               0.829
 10,      2,      2,   64,                0.91
 10,      3,      0,   64,                0.83
 10,      3,      3,   64,               0.915
 10,      4,      0,   64,                0.83
 10,      4,      4,   64,               0.911
 10,      5,      0,   64,               0.828
 10,      5,      5,   64,               0.905
 10,      6,      0,   64,               0.828
 10,      6,      6,   64,               0.812
 10,      7,      0,   64,                0.83
 10,      7,      7,   64,               0.819
  6,      0,      0,    0,               1.261
  6,      0,      0,    1,               1.252
  6,      0,      1,    1,               0.845
  6,      0,      0,    2,                1.27
  6,      0,      2,    2,                0.85
  6,      0,      0,    3,               1.269
  6,      0,      3,    3,               0.845
  6,      0,      0,    4,               1.287
  6,      0,      4,    4,               0.852
  6,      0,      0,    5,               1.278
  6,      0,      5,    5,               0.851
  6,      0,      0,    6,               1.269
  6,      0,      6,    6,               0.841
  6,      0,      0,    7,               1.268
  6,      0,      7,    7,               0.851
  6,      0,      0,    8,               1.291
  6,      0,      8,    8,               0.837
  6,      0,      0,    9,               1.283
  6,      0,      9,    9,               0.831
  6,      0,      0,   10,               1.252
  6,      0,     10,   10,               0.997
  6,      0,      0,   11,               1.295
  6,      0,     11,   11,               1.046
  6,      0,      0,   12,               1.296
  6,      0,     12,   12,               1.038
  6,      0,      0,   13,               1.287
  6,      0,     13,   13,               1.082
  6,      0,      0,   14,               1.284
  6,      0,     14,   14,               1.001
  6,      0,      0,   15,               1.286
  6,      0,     15,   15,               1.002
  6,      0,      0,   16,               0.894
  6,      0,     16,   16,               0.874
  6,      0,      0,   17,               0.892
  6,      0,     17,   17,               0.974
  6,      0,      0,   18,               0.907
  6,      0,     18,   18,               0.993
  6,      0,      0,   19,               0.909
  6,      0,     19,   19,                0.99
  6,      0,      0,   20,               0.894
  6,      0,     20,   20,               0.978
  6,      0,      0,   21,                0.89
  6,      0,     21,   21,               0.958
  6,      0,      0,   22,               0.893
  6,      0,     22,   22,                0.99
  6,      0,      0,   23,               0.899
  6,      0,     23,   23,               0.986
  6,      0,      0,   24,               0.893
  6,      0,     24,   24,               0.989
  6,      0,      0,   25,               0.889
  6,      0,     25,   25,               0.982
  6,      0,      0,   26,               0.889
  6,      0,     26,   26,               0.852
  6,      0,      0,   27,                0.89
  6,      0,     27,   27,               0.832
  6,      0,      0,   28,                0.89
  6,      0,     28,   28,               0.831
  6,      0,      0,   29,                0.89
  6,      0,     29,   29,               0.838
  6,      0,      0,   30,               0.907
  6,      0,     30,   30,               0.833
  6,      0,      0,   31,               0.888
  6,      0,     31,   31,               0.837
  6,      0,      0,   32,               0.853
  6,      0,     32,   32,               0.828
  6,      0,      0,   33,               0.857
  6,      0,     33,   33,               0.947
  6,      0,      0,   34,               0.847
  6,      0,     34,   34,               0.954
  6,      0,      0,   35,               0.841
  6,      0,     35,   35,                0.94
  6,      0,      0,   36,               0.854
  6,      0,     36,   36,               0.958
  6,      0,      0,   37,               0.856
  6,      0,     37,   37,               0.957
  6,      0,      0,   38,               0.839
  6,      0,     38,   38,               0.962
  6,      0,      0,   39,               0.866
  6,      0,     39,   39,               0.945
  6,      0,      0,   40,               0.845
  6,      0,     40,   40,               0.961
  6,      0,      0,   41,               0.858
  6,      0,     41,   41,               0.961
  6,      0,      0,   42,               0.862
  6,      0,     42,   42,               0.825
  6,      0,      0,   43,               0.864
  6,      0,     43,   43,                0.82
  6,      0,      0,   44,               0.843
  6,      0,     44,   44,                0.81
  6,      0,      0,   45,               0.859
  6,      0,     45,   45,               0.816
  6,      0,      0,   46,               0.866
  6,      0,     46,   46,                0.81
  6,      0,      0,   47,               0.858
  6,      0,     47,   47,               0.807
  6,      0,      0,   48,                0.87
  6,      0,     48,   48,                0.87
  6,      0,      0,   49,               0.871
  6,      0,     49,   49,               0.874
  6,      0,      0,   50,                0.87
  6,      0,     50,   50,               0.881
  6,      0,      0,   51,               0.868
  6,      0,     51,   51,               0.875
  6,      0,      0,   52,               0.873
  6,      0,     52,   52,               0.871
  6,      0,      0,   53,               0.866
  6,      0,     53,   53,               0.882
  6,      0,      0,   54,               0.863
  6,      0,     54,   54,               0.876
  6,      0,      0,   55,               0.851
  6,      0,     55,   55,               0.871
  6,      0,      0,   56,               0.867
  6,      0,     56,   56,               0.888
  6,      0,      0,   57,               0.862
  6,      0,     57,   57,               0.899
  6,      0,      0,   58,               0.873
  6,      0,     58,   58,               0.798
  6,      0,      0,   59,               0.881
  6,      0,     59,   59,               0.785
  6,      0,      0,   60,               0.867
  6,      0,     60,   60,               0.797
  6,      0,      0,   61,               0.872
  6,      0,     61,   61,               0.791
  6,      0,      0,   62,               0.859
  6,      0,     62,   62,                0.79
  6,      0,      0,   63,                0.87
  6,      0,     63,   63,               0.796

 sysdeps/x86_64/multiarch/strspn-c.c | 86 +++++++++++++----------------
 1 file changed, 39 insertions(+), 47 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/strspn-c.c b/sysdeps/x86_64/multiarch/strspn-c.c
index 8fb3aba64d..6124033ceb 100644
--- a/sysdeps/x86_64/multiarch/strspn-c.c
+++ b/sysdeps/x86_64/multiarch/strspn-c.c
@@ -62,81 +62,73 @@ __strspn_sse42 (const char *s, const char *a)
     return 0;
 
   const char *aligned;
-  __m128i mask;
-  int offset = (int) ((size_t) a & 15);
+  __m128i mask, maskz, zero;
+  unsigned int maskz_bits;
+  unsigned int offset = (int) ((size_t) a & 15);
+  zero = _mm_set1_epi8 (0);
   if (offset != 0)
     {
       /* Load masks.  */
       aligned = (const char *) ((size_t) a & -16L);
       __m128i mask0 = _mm_load_si128 ((__m128i *) aligned);
-
-      mask = __m128i_shift_right (mask0, offset);
+      maskz = _mm_cmpeq_epi8 (mask0, zero);
 
       /* Find where the NULL terminator is.  */
-      int length = _mm_cmpistri (mask, mask, 0x3a);
-      if (length == 16 - offset)
-	{
-	  /* There is no NULL terminator.  */
-	  __m128i mask1 = _mm_load_si128 ((__m128i *) (aligned + 16));
-	  int index = _mm_cmpistri (mask1, mask1, 0x3a);
-	  length += index;
-
-	  /* Don't use SSE4.2 if the length of A > 16.  */
-	  if (length > 16)
-	    return __strspn_sse2 (s, a);
-
-	  if (index != 0)
-	    {
-	      /* Combine mask0 and mask1.  We could play games with
-		 palignr, but frankly this data should be in L1 now
-		 so do the merge via an unaligned load.  */
-	      mask = _mm_loadu_si128 ((__m128i *) a);
-	    }
-	}
+      maskz_bits = _mm_movemask_epi8 (maskz) >> offset;
+      if (maskz_bits != 0)
+        {
+          mask = __m128i_shift_right (mask0, offset);
+          offset = (unsigned int) ((size_t) s & 15);
+          if (offset)
+            goto start_unaligned;
+
+          aligned = s;
+          goto start_loop;
+        }
     }
-  else
-    {
-      /* A is aligned.  */
-      mask = _mm_load_si128 ((__m128i *) a);
 
-      /* Find where the NULL terminator is.  */
-      int length = _mm_cmpistri (mask, mask, 0x3a);
-      if (length == 16)
-	{
-	  /* There is no NULL terminator.  Don't use SSE4.2 if the length
-	     of A > 16.  */
-	  if (a[16] != 0)
-	    return __strspn_sse2 (s, a);
-	}
+  /* A is aligned.  */
+  mask = _mm_loadu_si128 ((__m128i *) a);
+
+  /* Find where the NULL terminator is.  */
+  maskz = _mm_cmpeq_epi8 (mask, zero);
+  maskz_bits = _mm_movemask_epi8 (maskz);
+  if (maskz_bits == 0)
+    {
+      /* There is no NULL terminator.  Don't use SSE4.2 if the length
+         of A > 16.  */
+      if (a[16] != 0)
+        return __strspn_sse2 (s, a);
     }
+  aligned = s;
+  offset = (unsigned int) ((size_t) s & 15);
 
-  offset = (int) ((size_t) s & 15);
   if (offset != 0)
     {
+    start_unaligned:
       /* Check partial string.  */
       aligned = (const char *) ((size_t) s & -16L);
       __m128i value = _mm_load_si128 ((__m128i *) aligned);
+      __m128i adj_value = __m128i_shift_right (value, offset);
 
-      value = __m128i_shift_right (value, offset);
-
-      int length = _mm_cmpistri (mask, value, 0x12);
+      unsigned int length = _mm_cmpistri (mask, adj_value, 0x12);
       /* No need to check CFlag since it is always 1.  */
       if (length < 16 - offset)
 	return length;
       /* Find where the NULL terminator is.  */
-      int index = _mm_cmpistri (value, value, 0x3a);
-      if (index < 16 - offset)
+      maskz = _mm_cmpeq_epi8 (value, zero);
+      maskz_bits = _mm_movemask_epi8 (maskz) >> offset;
+      if (maskz_bits != 0)
 	return length;
       aligned += 16;
     }
-  else
-    aligned = s;
 
+start_loop:
   while (1)
     {
       __m128i value = _mm_load_si128 ((__m128i *) aligned);
-      int index = _mm_cmpistri (mask, value, 0x12);
-      int cflag = _mm_cmpistrc (mask, value, 0x12);
+      unsigned int index = _mm_cmpistri (mask, value, 0x12);
+      unsigned int cflag = _mm_cmpistrc (mask, value, 0x12);
       if (cflag)
 	return (size_t) (aligned + index - s);
       aligned += 16;
-- 
2.25.1


  parent reply	other threads:[~2022-03-23 21:59 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 21:57 [PATCH v1 01/23] benchtests: Use json-lib in bench-strchr.c Noah Goldstein
2022-03-23 21:57 ` [PATCH v1 02/23] benchtests: Add random benchmark " Noah Goldstein
2022-03-24 18:44   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 03/23] x86: Code cleanup in strchr-avx2 and comment justifying branch Noah Goldstein
2022-03-24 18:53   ` H.J. Lu
2022-03-24 19:20     ` Noah Goldstein
2022-03-24 19:36       ` H.J. Lu
2022-05-12 19:31         ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 04/23] x86: Code cleanup in strchr-evex " Noah Goldstein
2022-03-24 18:54   ` H.J. Lu
2022-05-12 19:32     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 05/23] benchtests: Use json-lib in bench-strpbrk.c Noah Goldstein
2022-03-24 18:54   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 06/23] benchtests: Use json-lib in bench-strspn.c Noah Goldstein
2022-03-24 18:54   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 07/23] x86: Optimize strcspn and strpbrk in strcspn-c.c Noah Goldstein
2022-03-24 18:55   ` H.J. Lu
2022-05-12 19:34     ` Sunil Pandey
2022-03-23 21:57 ` Noah Goldstein [this message]
2022-03-24 18:56   ` [PATCH v1 08/23] x86: Optimize strspn in strspn-c.c H.J. Lu
2022-05-12 19:39     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 09/23] x86: Remove strcspn-sse2.S and use the generic implementation Noah Goldstein
2022-03-24 18:57   ` H.J. Lu
2022-05-12 19:40     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 10/23] x86: Remove strpbrk-sse2.S " Noah Goldstein
2022-03-24 18:57   ` H.J. Lu
2022-05-12 19:41     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 11/23] x86: Remove strspn-sse2.S " Noah Goldstein
2022-03-24 18:57   ` H.J. Lu
2022-05-12 19:42     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 12/23] x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896] Noah Goldstein
2022-03-24 18:59   ` H.J. Lu
2022-03-24 19:18     ` Noah Goldstein
2022-03-24 19:34       ` H.J. Lu
2022-03-24 19:39         ` Noah Goldstein
2022-03-24 20:50   ` [PATCH v2 12/31] " Noah Goldstein
2022-03-24 21:26     ` H.J. Lu
2022-03-24 21:43       ` Noah Goldstein
2022-03-24 21:58         ` H.J. Lu
2022-05-04  6:05           ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 13/23] benchtests: Use json-lib in bench-strcasecmp.c Noah Goldstein
2022-03-24 19:00   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 14/23] benchtests: Use json-lib in bench-strncasecmp.c Noah Goldstein
2022-03-24 19:00   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 15/23] string: Expand page cross tests in test-strcasecmp.c Noah Goldstein
2022-03-24 19:01   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 16/23] string: Expand page cross tests in test-strncasecmp.c Noah Goldstein
2022-03-24 19:01   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 17/23] x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S Noah Goldstein
2022-03-24 19:02   ` H.J. Lu
2022-05-12 19:44     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 18/23] x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S Noah Goldstein
2022-03-24 19:02   ` H.J. Lu
2022-05-12 19:45     ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 19/23] string: Expand page cross test cases in test-strcmp.c Noah Goldstein
2022-03-24 19:02   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 20/23] string: Expand page cross test cases in test-strncmp.c Noah Goldstein
2022-03-24 19:02   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 21/23] x86: Add AVX2 optimized str{n}casecmp Noah Goldstein
2022-03-24 19:03   ` H.J. Lu
2022-03-24 22:41   ` [PATCH v3 " Noah Goldstein
2022-03-24 22:41   ` [PATCH v3 22/23] x86: Add EVEX " Noah Goldstein
2022-03-24 23:56   ` [PATCH v4 21/23] x86: Add AVX2 " Noah Goldstein
2022-03-24 23:56     ` [PATCH v4 22/23] x86: Add EVEX " Noah Goldstein
2022-03-25 18:15       ` H.J. Lu
2022-03-25 18:18         ` Noah Goldstein
2022-05-12 19:47           ` Sunil Pandey
2022-05-12 19:52             ` Sunil Pandey
2022-03-25 18:14     ` [PATCH v4 21/23] x86: Add AVX2 " H.J. Lu
2022-05-12 19:52       ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 22/23] x86: Add EVEX " Noah Goldstein
2022-03-24 19:04   ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 23/23] x86: Remove AVX str{n}casecmp Noah Goldstein
2022-03-24 19:04   ` H.J. Lu
2022-05-12 19:54     ` Sunil Pandey
2022-03-24 18:43 ` [PATCH v1 01/23] benchtests: Use json-lib in bench-strchr.c H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220323215734.3927131-8-goldstein.w.n@gmail.com \
    --to=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).