public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/2] gcc_qsort: build system changes
  2018-05-10 15:57 [PATCH 0/2] Introduce gcc_qsort Alexander Monakov
@ 2018-05-10 15:57 ` Alexander Monakov
  2018-05-11  9:32   ` Richard Biener
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 15:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alexander Monakov

	* Makefile.in (OBJS-libcommon): Add sort.o.
        (build/sort.o): New target.  Use it...
        (BUILD_RTL): ... here, and...
        (build/gencfn-macros): ... here, and...
        (build/genmatch): ... here.

---
 gcc/Makefile.in | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 20bee0494b1..8ec0511704d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1076,7 +1076,7 @@ BUILD_LIBS = $(BUILD_LIBIBERTY)
 
 BUILD_RTL = build/rtl.o build/read-rtl.o build/ggc-none.o \
 	    build/vec.o build/min-insn-modes.o build/gensupport.o \
-	    build/print-rtl.o build/hash-table.o
+	    build/print-rtl.o build/hash-table.o build/sort.o
 BUILD_MD = build/read-md.o
 BUILD_ERRORS = build/errors.o
 
@@ -1611,7 +1611,7 @@ OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
 	pretty-print.o intl.o \
 	sbitmap.o \
 	vec.o input.o version.o hash-table.o ggc-none.o memory-block.o \
-	selftest.o selftest-diagnostic.o
+	selftest.o selftest-diagnostic.o sort.o
 
 # Objects in libcommon-target.a, used by drivers and by the core
 # compiler and containing target-dependent code.
@@ -2681,6 +2681,7 @@ build/vec.o : vec.c $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(VEC_H)	\
   $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H)
 build/hash-table.o : hash-table.c $(BCONFIG_H) $(SYSTEM_H)		\
   $(CORETYPES_H) $(HASH_TABLE_H) $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H)
+build/sort.o : sort.cc $(BCONFIG_H) $(SYSTEM_H)
 build/inchash.o : inchash.c $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H)	\
   $(HASHTAB_H) inchash.h
 build/gencondmd.o : build/gencondmd.c $(BCONFIG_H) $(SYSTEM_H)		\
@@ -2817,7 +2818,7 @@ build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
 build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
 build/gencfn-macros$(build_exeext) : build/hash-table.o build/vec.o \
-  build/ggc-none.o
+  build/ggc-none.o build/sort.o
 
 # For stage1 and when cross-compiling use the build libcpp which is
 # built with NLS disabled.  For stage2+ use the host library and
@@ -2831,7 +2832,7 @@ build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
 endif
 
 build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
-  $(BUILD_ERRORS) build/vec.o build/hash-table.o
+  $(BUILD_ERRORS) build/vec.o build/hash-table.o build/sort.o
 
 # These programs are not linked with the MD reader.
 build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \
-- 
2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/2] Introduce gcc_qsort
@ 2018-05-10 15:57 Alexander Monakov
  2018-05-10 15:57 ` [PATCH 2/2] gcc_qsort: build system changes Alexander Monakov
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 15:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alexander Monakov

Hello.

This introduces a replacement for qsort() in GCC. The main selling point is
reproducibility (currently compiler output may change depending on how libc
qsort reorders not-bitwise-identical elements that compare equal) with a 
small improvement speed-wise and small code growth (under 2K on x86-64).

The opening comment in sort.cc gives a brief implementation overview:

/* This implements a sort function suitable for GCC use cases:
   - signature-compatible to C qsort, but relaxed contract:
     - may apply the comparator to elements in a temporary buffer
     - may abort on allocation failure
   - deterministic (but not necessarily stable)
   - fast, especially for common cases (0-5 elements of size 8 or 4)

   The implementation uses a network sort for up to 5 elements and
   a merge sort on top of that.  Neither stage has branches depending on
   comparator result, trading extra arithmetic for branch mispredictions.  */

I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 compilation.

Overall the new implementation is roughly 30% faster compared to Glibc qsort,
with 2x or more speedup for cases with tiny element count. I see one instance
where the new approach is significantly (1.5x) slower: it is ipa-icf.c:
sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 entries)
and needs 14 indirect loads just to reach values to compare, so when branch
prediction manages to guess correctly, it allows to overlap execution of two
comparators and better hide their cache miss latency.

Overall GCC spends about 0.3% time under qsort, but this doesn't automatically
mean that this brings a 0.1% speed improvement: it may be larger or smaller
depending on how new code affects cache behavior and branch predictors in
other code, and it's not trivial to measure precisely.

I can go into more detail about measured stats if there's interest :)

Makefile.in changes are separated to patch 2 in the hope it'd make review
easier, but the two patches will need to be applied together.

Bootstrapped/regtested on x86-64, OK for trunk?

Alexander Monakov (2):
  gcc_qsort: source code changes
  gcc_qsort: build system changes

 gcc/Makefile.in |   9 ++-
 gcc/sort.cc     | 232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/system.h    |   7 +-
 gcc/vec.c       |   2 +-
 4 files changed, 243 insertions(+), 7 deletions(-)
 create mode 100644 gcc/sort.cc

-- 
2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
@ 2018-05-10 17:01   ` David Malcolm
  2018-05-10 17:44   ` Richard Biener
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: David Malcolm @ 2018-05-10 17:01 UTC (permalink / raw)
  To: Alexander Monakov, gcc-patches

On Thu, 2018-05-10 at 18:56 +0300, Alexander Monakov wrote:
> 	* sort.cc: New file.
>         * system.h [!CHECKING_P] (qsort): Redirect to gcc_qsort.
>         * vec.c (qsort_chk): Use gcc_qsort.

[...snip...]

I'm not a reviewer for this, but there's a lot of fiddly implementation
logic here, so maybe this code could use the selftest framework?

Maybe, in pseudo-code, something like this:

template <typename T>
static void
test_gcc_sort ()
{
   for (creation_strategy in {in-order, backwards}: // and anything else?
     for (int n = 0; n < some_limit; n++)
       {
          make_a_list_of_t (n, creation_strategy)
          gcc_sort (the_list);
          assert that the list is sorted;
	  assert that the number of calls to the callback was sane
     }
}

void
test_gcc_sort_cc ()
{
   test_gcc_sort<int, int_comparator> ();
   test_gcc_sort<long, long_comparator> ();
   // etc; maybe some custom structs to exercise the deterministic property???
}

...or some such, to quickly get coverage of the various list sizes
(which the implementation seems to rely on heavily), in a non-release
build.



Hope this is constructive
Dave

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 15:57 [PATCH 0/2] Introduce gcc_qsort Alexander Monakov
  2018-05-10 15:57 ` [PATCH 2/2] gcc_qsort: build system changes Alexander Monakov
@ 2018-05-10 17:01 ` Alexander Monakov
  2018-05-10 17:01   ` David Malcolm
                     ` (3 more replies)
  2018-05-10 17:35 ` [PATCH 0/2] Introduce gcc_qsort Jakub Jelinek
  2018-05-10 17:43 ` Richard Biener
  3 siblings, 4 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 17:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alexander Monakov

	* sort.cc: New file.
        * system.h [!CHECKING_P] (qsort): Redirect to gcc_qsort.
        * vec.c (qsort_chk): Use gcc_qsort.

---
 gcc/sort.cc  | 232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/system.h |   7 +-
 gcc/vec.c    |   2 +-
 3 files changed, 238 insertions(+), 3 deletions(-)
 create mode 100644 gcc/sort.cc

diff --git a/gcc/sort.cc b/gcc/sort.cc
new file mode 100644
index 00000000000..4faf6d45dc6
--- /dev/null
+++ b/gcc/sort.cc
@@ -0,0 +1,232 @@
+/* Platform-independent deterministic sort function.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This implements a sort function suitable for GCC use cases:
+   - signature-compatible to C qsort, but relaxed contract:
+     - may apply the comparator to elements in a temporary buffer
+     - may abort on allocation failure
+   - deterministic (but not necessarily stable)
+   - fast, especially for common cases (0-5 elements of size 8 or 4)
+
+   The implementation uses a network sort for up to 5 elements and
+   a merge sort on top of that.  Neither stage has branches depending on
+   comparator result, trading extra arithmetic for branch mispredictions.  */
+
+#ifdef GENERATOR_FILE
+#include "bconfig.h"
+#else
+#include "config.h"
+#endif
+
+#include "system.h"
+
+#define likely(cond) __builtin_expect ((cond), 1)
+
+#ifdef __GNUC__
+#define noinline __attribute__ ((__noinline__))
+#else
+#define noinline
+#endif
+
+/* C-style qsort comparator function type.  */
+typedef int cmp_fn (const void *, const void *);
+
+/* Structure holding read-mostly (read-only in netsort) context. */
+struct sort_ctx
+{
+  cmp_fn *cmp; // pointer to comparator
+  char   *out; // output buffer
+  size_t n;    // number of elements
+  size_t size; // element size
+};
+
+/* Helper for netsort. Permute, possibly in-place, 2 or 3 elements,
+   placing E0 to C->OUT, E1 to C->OUT + C->SIZE, and so on. */
+static void
+reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
+{
+#define REORDER_23(SIZE, STRIDE, OFFSET)        \
+do {                                            \
+  size_t t0, t1;                                \
+  memcpy (&t0, e0 + OFFSET, SIZE);              \
+  memcpy (&t1, e1 + OFFSET, SIZE);              \
+  char *out = c->out + OFFSET;                  \
+  if (likely (c->n == 3))                       \
+    memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
+  memcpy (out, &t0, SIZE); out += STRIDE;       \
+  memcpy (out, &t1, SIZE);                      \
+} while (0)
+
+  if (sizeof (size_t) == 8 && likely (c->size == 8))
+    REORDER_23 (8, 8, 0);
+  else if (likely (c->size == 4))
+    REORDER_23 (4, 4, 0);
+  else
+    {
+      size_t offset = 0, step = sizeof (size_t);
+      for (; offset + step <= c->size; offset += step)
+	REORDER_23 (step, c->size, offset);
+      for (; offset < c->size; offset++)
+	REORDER_23 (1, c->size, offset);
+    }
+}
+
+/* Like reorder23, but permute 4 or 5 elements. */
+static void
+reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
+{
+#define REORDER_45(SIZE, STRIDE, OFFSET)        \
+do {                                            \
+  size_t t0, t1, t2, t3;                        \
+  memcpy (&t0, e0 + OFFSET, SIZE);              \
+  memcpy (&t1, e1 + OFFSET, SIZE);              \
+  memcpy (&t2, e2 + OFFSET, SIZE);              \
+  memcpy (&t3, e3 + OFFSET, SIZE);              \
+  char *out = c->out + OFFSET;                  \
+  if (likely (c->n == 5))                       \
+    memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
+  memcpy (out, &t0, SIZE); out += STRIDE;       \
+  memcpy (out, &t1, SIZE); out += STRIDE;       \
+  memcpy (out, &t2, SIZE); out += STRIDE;       \
+  memcpy (out, &t3, SIZE);                      \
+} while (0)
+
+  if (sizeof (size_t) == 8 && likely (c->size == 8))
+    REORDER_45 (8, 8, 0);
+  else if (likely(c->size == 4))
+    REORDER_45 (4, 4, 0);
+  else
+    {
+      size_t offset = 0, step = sizeof (size_t);
+      for (; offset + step <= c->size; offset += step)
+	REORDER_45 (step, c->size, offset);
+      for (; offset < c->size; offset++)
+	REORDER_45 (1, c->size, offset);
+    }
+}
+
+/* Helper for netsort. Invoke comparator CMP on E0 and E1.
+   Return E0^E1 if E0 compares less than E1, zero otherwise.
+   This is noinline to avoid code growth and confine invocation
+   to a single call site, assisting indirect branch prediction. */
+noinline static intptr_t
+cmp1 (char *e0, char *e1, cmp_fn *cmp)
+{
+  intptr_t x = (intptr_t)e0 ^ (intptr_t)e1;
+  return x & (cmp (e0, e1) >> 31);
+}
+
+/* Execute network sort on 2 to 5 elements from IN, placing them into C->OUT.
+   IN may be equal to C->OUT, in which case elements are sorted in place.  */
+static void
+netsort (char *in, sort_ctx *c)
+{
+#define CMP(e0, e1)                   \
+do {                                  \
+  intptr_t x = cmp1 (e1, e0, c->cmp); \
+  e0 = (char *)((intptr_t)e0 ^ x);    \
+  e1 = (char *)((intptr_t)e1 ^ x);    \
+} while (0)
+
+  char *e0 = in, *e1 = e0 + c->size, *e2 = e1 + c->size;
+  CMP (e0, e1);
+  if (likely (c->n == 3))
+    {
+      CMP (e1, e2);
+      CMP (e0, e1);
+    }
+  if (c->n <= 3)
+    return reorder23 (c, e0, e1, e2);
+  char *e3 = e2 + c->size, *e4 = e3 + c->size;
+  if (likely (c->n == 5))
+    {
+      CMP (e3, e4);
+      CMP (e2, e4);
+    }
+  CMP (e2, e3);
+  if (likely (c->n == 5))
+    {
+      CMP (e0, e3);
+      CMP (e1, e4);
+    }
+  CMP (e0, e2);
+  CMP (e1, e3);
+  CMP (e1, e2);
+  reorder45 (c, e0, e1, e2, e3, e4);
+}
+
+/* Execute merge sort on N elements from IN, placing them into OUT,
+   using TMP as temporary storage if IN is equal to OUT.
+   This is a stable sort if netsort is used only for 2 or 3 elements. */
+static void
+mergesort (char *in, sort_ctx *c, size_t n, char *out, char *tmp)
+{
+  if (likely (n <= 5))
+    {
+      c->out = out;
+      c->n = n;
+      return netsort (in, c);
+    }
+  size_t nl = n / 2, nr = n - nl, sz = nl * c->size;
+  char *mid = in + sz, *r = out + sz, *l = in == out ? tmp : in;
+  /* Sort the right half, outputting to right half of OUT. */
+  mergesort (mid, c, nr, r, tmp);
+  /* Sort the left half, leaving left half of OUT free.  */
+  mergesort (in, c, nl, l, mid);
+  /* Merge sorted halves given by L, R to [OUT, END). */
+#define MERGE_ELTSIZE(SIZE)                     \
+do {                                            \
+  intptr_t mr = c->cmp (r, l) >> 31;            \
+  intptr_t lr = (intptr_t)l ^ (intptr_t)r;      \
+  lr = (intptr_t)l ^ (lr & mr);                 \
+  out = (char *)memcpy (out, (char *)lr, SIZE); \
+  out += SIZE;                                  \
+  r += mr & SIZE;                               \
+  if (r == out) return;                         \
+  l += ~mr & SIZE;                              \
+} while (r != end)
+
+  if (likely (c->cmp(r, l + (r - out) - c->size) < 0))
+    {
+      char *end = out + n * c->size;
+      if (sizeof (size_t) == 8 && likely (c->size == 8))
+	MERGE_ELTSIZE (8);
+      else if (likely (c->size == 4))
+	MERGE_ELTSIZE (4);
+      else
+	MERGE_ELTSIZE (c->size);
+    }
+  memcpy (out, l, r - out);
+}
+
+void
+gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
+{
+  if (n < 2)
+    return;
+  char *base = (char *)vbase;
+  sort_ctx c = {cmp, base, n, size};
+  long long scratch[32];
+  size_t bufsz = (n / 2) * size;
+  void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
+  mergesort (base, &c, n, base, (char *)buf);
+  if (buf != scratch)
+    free (buf);
+}
diff --git a/gcc/system.h b/gcc/system.h
index 4abc321c71d..88dffccb8ab 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1202,11 +1202,14 @@ helper_const_non_const_cast (const char *p)
 /* qsort comparator consistency checking: except in release-checking compilers,
    redirect 4-argument qsort calls to qsort_chk; keep 1-argument invocations
    corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
-#if CHECKING_P
+void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
+void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
 #define PP_5th(a1, a2, a3, a4, a5, ...) a5
 #undef qsort
+#if CHECKING_P
 #define qsort(...) PP_5th (__VA_ARGS__, qsort_chk, 3, 2, qsort, 0) (__VA_ARGS__)
-void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
+#else
+#define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) (__VA_ARGS__)
 #endif
 
 #endif /* ! GCC_SYSTEM_H */
diff --git a/gcc/vec.c b/gcc/vec.c
index 11924a80a2d..2941715a34a 100644
--- a/gcc/vec.c
+++ b/gcc/vec.c
@@ -215,7 +215,7 @@ void
 qsort_chk (void *base, size_t n, size_t size,
 	   int (*cmp)(const void *, const void *))
 {
-  (qsort) (base, n, size, cmp);
+  gcc_qsort (base, n, size, cmp);
 #if 0
 #define LIM(n) (n)
 #else
-- 
2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-10 15:57 [PATCH 0/2] Introduce gcc_qsort Alexander Monakov
  2018-05-10 15:57 ` [PATCH 2/2] gcc_qsort: build system changes Alexander Monakov
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
@ 2018-05-10 17:35 ` Jakub Jelinek
  2018-05-10 17:48   ` Alexander Monakov
  2018-05-10 17:43 ` Richard Biener
  3 siblings, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2018-05-10 17:35 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Thu, May 10, 2018 at 06:56:39PM +0300, Alexander Monakov wrote:
> Overall the new implementation is roughly 30% faster compared to Glibc qsort,
> with 2x or more speedup for cases with tiny element count. I see one instance
> where the new approach is significantly (1.5x) slower: it is ipa-icf.c:
> sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 entries)
> and needs 14 indirect loads just to reach values to compare, so when branch
> prediction manages to guess correctly, it allows to overlap execution of two
> comparators and better hide their cache miss latency.
> 
> Overall GCC spends about 0.3% time under qsort, but this doesn't automatically
> mean that this brings a 0.1% speed improvement: it may be larger or smaller
> depending on how new code affects cache behavior and branch predictors in
> other code, and it's not trivial to measure precisely.
> 
> I can go into more detail about measured stats if there's interest :)
> 
> Makefile.in changes are separated to patch 2 in the hope it'd make review
> easier, but the two patches will need to be applied together.
> 
> Bootstrapped/regtested on x86-64, OK for trunk?

Have you gathered some statistics on the element sizes and how often they
appear in qsort calls (perhaps weighted by n*log(n) of the element count)
across bootstrap+regtest?

glibc uses indirect sorting (sorts pointers to the elements or indexes and
just reshuffles at the end) and has special case for the most commonly used
small element size (4/8).  With C++ templates you could achieve that even
without macros, just by instantiating the mergesort and its helpers for the
few common cases.  Or is that not worth it (as in, we never sort really
large (say > 32 bytes) element sizes and the 4 or 8 bytes long element sizes
aren't common enough to see benefit by using constant size memcpy for those
cases?

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-10 15:57 [PATCH 0/2] Introduce gcc_qsort Alexander Monakov
                   ` (2 preceding siblings ...)
  2018-05-10 17:35 ` [PATCH 0/2] Introduce gcc_qsort Jakub Jelinek
@ 2018-05-10 17:43 ` Richard Biener
  2018-05-10 17:57   ` Alexander Monakov
  2018-05-11 10:35   ` Segher Boessenkool
  3 siblings, 2 replies; 24+ messages in thread
From: Richard Biener @ 2018-05-10 17:43 UTC (permalink / raw)
  To: gcc-patches, Alexander Monakov

On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov <amonakov@ispras.ru> wrote:
>Hello.
>
>This introduces a replacement for qsort() in GCC. The main selling
>point is
>reproducibility (currently compiler output may change depending on how
>libc
>qsort reorders not-bitwise-identical elements that compare equal) with
>a 
>small improvement speed-wise and small code growth (under 2K on
>x86-64).
>
>The opening comment in sort.cc gives a brief implementation overview:
>
>/* This implements a sort function suitable for GCC use cases:
>   - signature-compatible to C qsort, but relaxed contract:
>     - may apply the comparator to elements in a temporary buffer

What consequences has this or rather how is this observable and makes comparators behave differently? 

Otherwise thanks for doing this. Will review tomorrow. 

Richard. 

>     - may abort on allocation failure
>   - deterministic (but not necessarily stable)
>   - fast, especially for common cases (0-5 elements of size 8 or 4)
>
>   The implementation uses a network sort for up to 5 elements and
>  a merge sort on top of that.  Neither stage has branches depending on
>comparator result, trading extra arithmetic for branch mispredictions. 
>*/
>
>I used a Sandy Bridge CPU to collect statistics on tramp3d -O2
>compilation.
>
>Overall the new implementation is roughly 30% faster compared to Glibc
>qsort,
>with 2x or more speedup for cases with tiny element count. I see one
>instance
>where the new approach is significantly (1.5x) slower: it is ipa-icf.c:
>sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500
>entries)
>and needs 14 indirect loads just to reach values to compare, so when
>branch
>prediction manages to guess correctly, it allows to overlap execution
>of two
>comparators and better hide their cache miss latency.
>
>Overall GCC spends about 0.3% time under qsort, but this doesn't
>automatically
>mean that this brings a 0.1% speed improvement: it may be larger or
>smaller
>depending on how new code affects cache behavior and branch predictors
>in
>other code, and it's not trivial to measure precisely.
>
>I can go into more detail about measured stats if there's interest :)
>
>Makefile.in changes are separated to patch 2 in the hope it'd make
>review
>easier, but the two patches will need to be applied together.
>
>Bootstrapped/regtested on x86-64, OK for trunk?
>
>Alexander Monakov (2):
>  gcc_qsort: source code changes
>  gcc_qsort: build system changes
>
> gcc/Makefile.in |   9 ++-
>gcc/sort.cc     | 232
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> gcc/system.h    |   7 +-
> gcc/vec.c       |   2 +-
> 4 files changed, 243 insertions(+), 7 deletions(-)
> create mode 100644 gcc/sort.cc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
  2018-05-10 17:01   ` David Malcolm
@ 2018-05-10 17:44   ` Richard Biener
  2018-05-10 18:08     ` Alexander Monakov
  2018-05-11 12:03   ` Richard Biener
  2018-05-13 23:56   ` H.J. Lu
  3 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2018-05-10 17:44 UTC (permalink / raw)
  To: gcc-patches, Alexander Monakov

On May 10, 2018 5:56:40 PM GMT+02:00, Alexander Monakov <amonakov@ispras.ru> wrote:
>	* sort.cc: New file.
>        * system.h [!CHECKING_P] (qsort): Redirect to gcc_qsort.
>        * vec.c (qsort_chk): Use gcc_qsort.

Just a quick first remark - how about putting this into libiberty?  And then name it xqsort? 

Richard. 

>---
>gcc/sort.cc  | 232
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> gcc/system.h |   7 +-
> gcc/vec.c    |   2 +-
> 3 files changed, 238 insertions(+), 3 deletions(-)
> create mode 100644 gcc/sort.cc
>
>diff --git a/gcc/sort.cc b/gcc/sort.cc
>new file mode 100644
>index 00000000000..4faf6d45dc6
>--- /dev/null
>+++ b/gcc/sort.cc
>@@ -0,0 +1,232 @@
>+/* Platform-independent deterministic sort function.
>+   Copyright (C) 2018 Free Software Foundation, Inc.
>+   Contributed by Alexander Monakov.
>+
>+This file is part of GCC.
>+
>+GCC is free software; you can redistribute it and/or modify it
>+under the terms of the GNU General Public License as published by the
>+Free Software Foundation; either version 3, or (at your option) any
>+later version.
>+
>+GCC is distributed in the hope that it will be useful, but WITHOUT
>+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>+for more details.
>+
>+You should have received a copy of the GNU General Public License
>+along with GCC; see the file COPYING3.  If not see
>+<http://www.gnu.org/licenses/>.  */
>+
>+/* This implements a sort function suitable for GCC use cases:
>+   - signature-compatible to C qsort, but relaxed contract:
>+     - may apply the comparator to elements in a temporary buffer
>+     - may abort on allocation failure
>+   - deterministic (but not necessarily stable)
>+   - fast, especially for common cases (0-5 elements of size 8 or 4)
>+
>+   The implementation uses a network sort for up to 5 elements and
>+   a merge sort on top of that.  Neither stage has branches depending
>on
>+   comparator result, trading extra arithmetic for branch
>mispredictions.  */
>+
>+#ifdef GENERATOR_FILE
>+#include "bconfig.h"
>+#else
>+#include "config.h"
>+#endif
>+
>+#include "system.h"
>+
>+#define likely(cond) __builtin_expect ((cond), 1)
>+
>+#ifdef __GNUC__
>+#define noinline __attribute__ ((__noinline__))
>+#else
>+#define noinline
>+#endif
>+
>+/* C-style qsort comparator function type.  */
>+typedef int cmp_fn (const void *, const void *);
>+
>+/* Structure holding read-mostly (read-only in netsort) context. */
>+struct sort_ctx
>+{
>+  cmp_fn *cmp; // pointer to comparator
>+  char   *out; // output buffer
>+  size_t n;    // number of elements
>+  size_t size; // element size
>+};
>+
>+/* Helper for netsort. Permute, possibly in-place, 2 or 3 elements,
>+   placing E0 to C->OUT, E1 to C->OUT + C->SIZE, and so on. */
>+static void
>+reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
>+{
>+#define REORDER_23(SIZE, STRIDE, OFFSET)        \
>+do {                                            \
>+  size_t t0, t1;                                \
>+  memcpy (&t0, e0 + OFFSET, SIZE);              \
>+  memcpy (&t1, e1 + OFFSET, SIZE);              \
>+  char *out = c->out + OFFSET;                  \
>+  if (likely (c->n == 3))                       \
>+    memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
>+  memcpy (out, &t0, SIZE); out += STRIDE;       \
>+  memcpy (out, &t1, SIZE);                      \
>+} while (0)
>+
>+  if (sizeof (size_t) == 8 && likely (c->size == 8))
>+    REORDER_23 (8, 8, 0);
>+  else if (likely (c->size == 4))
>+    REORDER_23 (4, 4, 0);
>+  else
>+    {
>+      size_t offset = 0, step = sizeof (size_t);
>+      for (; offset + step <= c->size; offset += step)
>+	REORDER_23 (step, c->size, offset);
>+      for (; offset < c->size; offset++)
>+	REORDER_23 (1, c->size, offset);
>+    }
>+}
>+
>+/* Like reorder23, but permute 4 or 5 elements. */
>+static void
>+reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char
>*e4)
>+{
>+#define REORDER_45(SIZE, STRIDE, OFFSET)        \
>+do {                                            \
>+  size_t t0, t1, t2, t3;                        \
>+  memcpy (&t0, e0 + OFFSET, SIZE);              \
>+  memcpy (&t1, e1 + OFFSET, SIZE);              \
>+  memcpy (&t2, e2 + OFFSET, SIZE);              \
>+  memcpy (&t3, e3 + OFFSET, SIZE);              \
>+  char *out = c->out + OFFSET;                  \
>+  if (likely (c->n == 5))                       \
>+    memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
>+  memcpy (out, &t0, SIZE); out += STRIDE;       \
>+  memcpy (out, &t1, SIZE); out += STRIDE;       \
>+  memcpy (out, &t2, SIZE); out += STRIDE;       \
>+  memcpy (out, &t3, SIZE);                      \
>+} while (0)
>+
>+  if (sizeof (size_t) == 8 && likely (c->size == 8))
>+    REORDER_45 (8, 8, 0);
>+  else if (likely(c->size == 4))
>+    REORDER_45 (4, 4, 0);
>+  else
>+    {
>+      size_t offset = 0, step = sizeof (size_t);
>+      for (; offset + step <= c->size; offset += step)
>+	REORDER_45 (step, c->size, offset);
>+      for (; offset < c->size; offset++)
>+	REORDER_45 (1, c->size, offset);
>+    }
>+}
>+
>+/* Helper for netsort. Invoke comparator CMP on E0 and E1.
>+   Return E0^E1 if E0 compares less than E1, zero otherwise.
>+   This is noinline to avoid code growth and confine invocation
>+   to a single call site, assisting indirect branch prediction. */
>+noinline static intptr_t
>+cmp1 (char *e0, char *e1, cmp_fn *cmp)
>+{
>+  intptr_t x = (intptr_t)e0 ^ (intptr_t)e1;
>+  return x & (cmp (e0, e1) >> 31);
>+}
>+
>+/* Execute network sort on 2 to 5 elements from IN, placing them into
>C->OUT.
>+   IN may be equal to C->OUT, in which case elements are sorted in
>place.  */
>+static void
>+netsort (char *in, sort_ctx *c)
>+{
>+#define CMP(e0, e1)                   \
>+do {                                  \
>+  intptr_t x = cmp1 (e1, e0, c->cmp); \
>+  e0 = (char *)((intptr_t)e0 ^ x);    \
>+  e1 = (char *)((intptr_t)e1 ^ x);    \
>+} while (0)
>+
>+  char *e0 = in, *e1 = e0 + c->size, *e2 = e1 + c->size;
>+  CMP (e0, e1);
>+  if (likely (c->n == 3))
>+    {
>+      CMP (e1, e2);
>+      CMP (e0, e1);
>+    }
>+  if (c->n <= 3)
>+    return reorder23 (c, e0, e1, e2);
>+  char *e3 = e2 + c->size, *e4 = e3 + c->size;
>+  if (likely (c->n == 5))
>+    {
>+      CMP (e3, e4);
>+      CMP (e2, e4);
>+    }
>+  CMP (e2, e3);
>+  if (likely (c->n == 5))
>+    {
>+      CMP (e0, e3);
>+      CMP (e1, e4);
>+    }
>+  CMP (e0, e2);
>+  CMP (e1, e3);
>+  CMP (e1, e2);
>+  reorder45 (c, e0, e1, e2, e3, e4);
>+}
>+
>+/* Execute merge sort on N elements from IN, placing them into OUT,
>+   using TMP as temporary storage if IN is equal to OUT.
>+   This is a stable sort if netsort is used only for 2 or 3 elements.
>*/
>+static void
>+mergesort (char *in, sort_ctx *c, size_t n, char *out, char *tmp)
>+{
>+  if (likely (n <= 5))
>+    {
>+      c->out = out;
>+      c->n = n;
>+      return netsort (in, c);
>+    }
>+  size_t nl = n / 2, nr = n - nl, sz = nl * c->size;
>+  char *mid = in + sz, *r = out + sz, *l = in == out ? tmp : in;
>+  /* Sort the right half, outputting to right half of OUT. */
>+  mergesort (mid, c, nr, r, tmp);
>+  /* Sort the left half, leaving left half of OUT free.  */
>+  mergesort (in, c, nl, l, mid);
>+  /* Merge sorted halves given by L, R to [OUT, END). */
>+#define MERGE_ELTSIZE(SIZE)                     \
>+do {                                            \
>+  intptr_t mr = c->cmp (r, l) >> 31;            \
>+  intptr_t lr = (intptr_t)l ^ (intptr_t)r;      \
>+  lr = (intptr_t)l ^ (lr & mr);                 \
>+  out = (char *)memcpy (out, (char *)lr, SIZE); \
>+  out += SIZE;                                  \
>+  r += mr & SIZE;                               \
>+  if (r == out) return;                         \
>+  l += ~mr & SIZE;                              \
>+} while (r != end)
>+
>+  if (likely (c->cmp(r, l + (r - out) - c->size) < 0))
>+    {
>+      char *end = out + n * c->size;
>+      if (sizeof (size_t) == 8 && likely (c->size == 8))
>+	MERGE_ELTSIZE (8);
>+      else if (likely (c->size == 4))
>+	MERGE_ELTSIZE (4);
>+      else
>+	MERGE_ELTSIZE (c->size);
>+    }
>+  memcpy (out, l, r - out);
>+}
>+
>+void
>+gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
>+{
>+  if (n < 2)
>+    return;
>+  char *base = (char *)vbase;
>+  sort_ctx c = {cmp, base, n, size};
>+  long long scratch[32];
>+  size_t bufsz = (n / 2) * size;
>+  void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
>+  mergesort (base, &c, n, base, (char *)buf);
>+  if (buf != scratch)
>+    free (buf);
>+}
>diff --git a/gcc/system.h b/gcc/system.h
>index 4abc321c71d..88dffccb8ab 100644
>--- a/gcc/system.h
>+++ b/gcc/system.h
>@@ -1202,11 +1202,14 @@ helper_const_non_const_cast (const char *p)
>/* qsort comparator consistency checking: except in release-checking
>compilers,
>redirect 4-argument qsort calls to qsort_chk; keep 1-argument
>invocations
>corresponding to vec::qsort (cmp): they use C qsort internally anyway. 
>*/
>-#if CHECKING_P
>+void qsort_chk (void *, size_t, size_t, int (*)(const void *, const
>void *));
>+void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const
>void *));
> #define PP_5th(a1, a2, a3, a4, a5, ...) a5
> #undef qsort
>+#if CHECKING_P
>#define qsort(...) PP_5th (__VA_ARGS__, qsort_chk, 3, 2, qsort, 0)
>(__VA_ARGS__)
>-void qsort_chk (void *, size_t, size_t, int (*)(const void *, const
>void *));
>+#else
>+#define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0)
>(__VA_ARGS__)
> #endif
> 
> #endif /* ! GCC_SYSTEM_H */
>diff --git a/gcc/vec.c b/gcc/vec.c
>index 11924a80a2d..2941715a34a 100644
>--- a/gcc/vec.c
>+++ b/gcc/vec.c
>@@ -215,7 +215,7 @@ void
> qsort_chk (void *base, size_t n, size_t size,
> 	   int (*cmp)(const void *, const void *))
> {
>-  (qsort) (base, n, size, cmp);
>+  gcc_qsort (base, n, size, cmp);
> #if 0
> #define LIM(n) (n)
> #else

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-10 17:35 ` [PATCH 0/2] Introduce gcc_qsort Jakub Jelinek
@ 2018-05-10 17:48   ` Alexander Monakov
  0 siblings, 0 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 17:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Thu, 10 May 2018, Jakub Jelinek wrote:
> Have you gathered some statistics on the element sizes and how often they
> appear in qsort calls (perhaps weighted by n*log(n) of the element count)
> across bootstrap+regtest?

No, but Adhemerval Zanella collected stats on bootstrap, and they are similar
to my observations: https://sourceware.org/ml/libc-alpha/2018-01/msg00629.html

> glibc uses indirect sorting (sorts pointers to the elements or indexes and
> just reshuffles at the end) and has special case for the most commonly used
> small element size (4/8).  With C++ templates you could achieve that even
> without macros, just by instantiating the mergesort and its helpers for the
> few common cases.  Or is that not worth it (as in, we never sort really
> large (say > 32 bytes) element sizes and the 4 or 8 bytes long element sizes
> aren't common enough to see benefit by using constant size memcpy for those
> cases?

I think it's not worth it as branches by element size are not too frequent,
off the critical path, and easy for predictors. So doing it via templates
would cause significant code growth for no speed gain.

As for indirect sorting, we rarely sort elements of size other than 4/8, so
I believe that's not worth it either.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-10 17:43 ` Richard Biener
@ 2018-05-10 17:57   ` Alexander Monakov
  2018-05-11 10:35   ` Segher Boessenkool
  1 sibling, 0 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 17:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, 10 May 2018, Richard Biener wrote:
> >   - signature-compatible to C qsort, but relaxed contract:
> >     - may apply the comparator to elements in a temporary buffer
> 
> What consequences has this or rather how is this observable and makes comparators behave differently? 

The only serious consequence I'm aware of is this:

the buffer must be sufficiently aligned to match the alignment requirement
of elements being sorted

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 17:44   ` Richard Biener
@ 2018-05-10 18:08     ` Alexander Monakov
  2018-05-10 18:57       ` DJ Delorie
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2018-05-10 18:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, 10 May 2018, Richard Biener wrote:
> 
> Just a quick first remark - how about putting this into libiberty?  And then name it xqsort? 

I'm not sure.  It has a weaker contract compared to qsort, and I believe
functions in libiberty are understood to provide stronger/better replacements.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 18:08     ` Alexander Monakov
@ 2018-05-10 18:57       ` DJ Delorie
  0 siblings, 0 replies; 24+ messages in thread
From: DJ Delorie @ 2018-05-10 18:57 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: richard.guenther, gcc-patches


Alexander Monakov <amonakov@ispras.ru> writes:
> I'm not sure.  It has a weaker contract compared to qsort, and I believe
> functions in libiberty are understood to provide stronger/better replacements.

The original intent of libiberty was to provide a stronger *portability*
contract, i.e. to work around differences in underlying operating
systems.  The xfoo() variants often handle error conditions also, as
that has traditionally been something that OSs do differently anyway.

Having said that, adding something to libiberty is more complicated than
adding something to gcc (it didn't used to be), and if nobody else needs
a more portable qsort, it's a wasted effort.

Libiberty is *not* a generic "toss things in here because they're useful
and generic" library, despite being used as such.  However, it is common
among a few large projects (which used to share a repo, limiting copies
of libiberty to one), and does help in code re-use.

Given all that, I'd say that an xqsort might be appropriate in
libiberty, if it was (1) able to take over for the generic qsort[1] ,
and (2) the changes are also needed or useful in one of the other
projects using libiberty.  But given that it's currently written in C++
(it would need to be C-compatible) and only used by gcc, IMHO putting it
in libiberty would be inappropriate at this time.  The fact that qsort
is defined to be nondeterministic is not a portability issue[2].

Consider that there is also gnulib, which serves a similar purpose.

[1] i.e. if replacing qsort() with xqsort() in a C or C++ program
    resulted in the same behavior as far as standards imply.

[2] if the nondeterminism is a problem, you probably need to fix your
    compare function ;-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] gcc_qsort: build system changes
  2018-05-10 15:57 ` [PATCH 2/2] gcc_qsort: build system changes Alexander Monakov
@ 2018-05-11  9:32   ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2018-05-11  9:32 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: GCC Patches

On Thu, May 10, 2018 at 5:56 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
>         * Makefile.in (OBJS-libcommon): Add sort.o.
>         (build/sort.o): New target.  Use it...
>         (BUILD_RTL): ... here, and...
>         (build/gencfn-macros): ... here, and...
>         (build/genmatch): ... here.

OK if the rest is approved.

Richard.

> ---
>  gcc/Makefile.in | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 20bee0494b1..8ec0511704d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1076,7 +1076,7 @@ BUILD_LIBS = $(BUILD_LIBIBERTY)
>
>  BUILD_RTL = build/rtl.o build/read-rtl.o build/ggc-none.o \
>             build/vec.o build/min-insn-modes.o build/gensupport.o \
> -           build/print-rtl.o build/hash-table.o
> +           build/print-rtl.o build/hash-table.o build/sort.o
>  BUILD_MD = build/read-md.o
>  BUILD_ERRORS = build/errors.o
>
> @@ -1611,7 +1611,7 @@ OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
>         pretty-print.o intl.o \
>         sbitmap.o \
>         vec.o input.o version.o hash-table.o ggc-none.o memory-block.o \
> -       selftest.o selftest-diagnostic.o
> +       selftest.o selftest-diagnostic.o sort.o
>
>  # Objects in libcommon-target.a, used by drivers and by the core
>  # compiler and containing target-dependent code.
> @@ -2681,6 +2681,7 @@ build/vec.o : vec.c $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(VEC_H)      \
>    $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H)
>  build/hash-table.o : hash-table.c $(BCONFIG_H) $(SYSTEM_H)             \
>    $(CORETYPES_H) $(HASH_TABLE_H) $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H)
> +build/sort.o : sort.cc $(BCONFIG_H) $(SYSTEM_H)
>  build/inchash.o : inchash.c $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H)    \
>    $(HASHTAB_H) inchash.h
>  build/gencondmd.o : build/gencondmd.c $(BCONFIG_H) $(SYSTEM_H)         \
> @@ -2817,7 +2818,7 @@ build/genautomata$(build_exeext) : BUILD_LIBS += -lm
>
>  build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
>  build/gencfn-macros$(build_exeext) : build/hash-table.o build/vec.o \
> -  build/ggc-none.o
> +  build/ggc-none.o build/sort.o
>
>  # For stage1 and when cross-compiling use the build libcpp which is
>  # built with NLS disabled.  For stage2+ use the host library and
> @@ -2831,7 +2832,7 @@ build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
>  endif
>
>  build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
> -  $(BUILD_ERRORS) build/vec.o build/hash-table.o
> +  $(BUILD_ERRORS) build/vec.o build/hash-table.o build/sort.o
>
>  # These programs are not linked with the MD reader.
>  build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-10 17:43 ` Richard Biener
  2018-05-10 17:57   ` Alexander Monakov
@ 2018-05-11 10:35   ` Segher Boessenkool
  2018-05-11 10:44     ` Alexander Monakov
  2018-05-11 11:17     ` Jakub Jelinek
  1 sibling, 2 replies; 24+ messages in thread
From: Segher Boessenkool @ 2018-05-11 10:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Alexander Monakov

On Thu, May 10, 2018 at 07:40:57PM +0200, Richard Biener wrote:
> On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov <amonakov@ispras.ru> wrote:
> >/* This implements a sort function suitable for GCC use cases:
> >   - signature-compatible to C qsort, but relaxed contract:
> >     - may apply the comparator to elements in a temporary buffer
> 
> What consequences has this or rather how is this observable and makes comparators behave differently? 

A comparison function is allowed to compare the addresses of the elements,
or compute what should be the index into the array that is sorted from the
pointers to the elements, etc.  That will not work if the pointers to the
elements do not actually point into the original array.

For example from reload1.c:

static int
reload_reg_class_lower (const void *r1p, const void *r2p)
{
  int r1 = *(const short *) r1p, r2 = *(const short *) r2p;

// ...

  return r1 - r2;
}

(called as
static short reload_order[MAX_RELOADS];
// ...

  qsort (reload_order, n_reloads, sizeof (short), reload_reg_class_lower);
).

Such a comparator will still work with temporary arrays if both elements
are always in the same temp array (in deterministic order, etc.) array; but
it won't work if the comparator did e.g.

  return (r1 - reload_order) - (r2 - reload_order);

(it would be undefined behaviour, to start with).


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 10:35   ` Segher Boessenkool
@ 2018-05-11 10:44     ` Alexander Monakov
  2018-05-11 12:00       ` Segher Boessenkool
  2018-05-11 11:17     ` Jakub Jelinek
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2018-05-11 10:44 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Biener, gcc-patches

On Fri, 11 May 2018, Segher Boessenkool wrote:
> For example from reload1.c:

No, this example is not relevant, because ...

> static int
> reload_reg_class_lower (const void *r1p, const void *r2p)
> {
>   int r1 = *(const short *) r1p, r2 = *(const short *) r2p;
> 
> // ...
> 
>   return r1 - r2;

this subtracts integers, not pointers.

In general such address-based comparisons steps are invalid; they lack
anti-reflexivity. So comparators are not actually allowed to do that.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 10:35   ` Segher Boessenkool
  2018-05-11 10:44     ` Alexander Monakov
@ 2018-05-11 11:17     ` Jakub Jelinek
  1 sibling, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2018-05-11 11:17 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Biener, gcc-patches, Alexander Monakov

On Fri, May 11, 2018 at 05:29:05AM -0500, Segher Boessenkool wrote:
> Such a comparator will still work with temporary arrays if both elements
> are always in the same temp array (in deterministic order, etc.) array; but
> it won't work if the comparator did e.g.
> 
>   return (r1 - reload_order) - (r2 - reload_order);
> 
> (it would be undefined behaviour, to start with).

Yeah, UB, but likely would still "work".
What wouldn't work is if a comparator assumes certain sizes of the array
and e.g. stores the difference r1 - reload_order in char/short/int
because all callers guarantee that nmemb * size fits into such types.

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 10:44     ` Alexander Monakov
@ 2018-05-11 12:00       ` Segher Boessenkool
  2018-05-11 12:16         ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Segher Boessenkool @ 2018-05-11 12:00 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Richard Biener, gcc-patches

On Fri, May 11, 2018 at 01:35:00PM +0300, Alexander Monakov wrote:
> On Fri, 11 May 2018, Segher Boessenkool wrote:
> > For example from reload1.c:
> 
> No, this example is not relevant, because ...
> 
> > static int
> > reload_reg_class_lower (const void *r1p, const void *r2p)
> > {
> >   int r1 = *(const short *) r1p, r2 = *(const short *) r2p;
> > 
> > // ...
> > 
> >   return r1 - r2;
> 
> this subtracts integers, not pointers.

Oh whoops, I read that too fast.  Sorry.

> In general such address-based comparisons steps are invalid; they lack
> anti-reflexivity. So comparators are not actually allowed to do that.

I don't know what you mean here?  Every comparator is required to be
*reflexive*!  Subtracting two pointers to elements of the same array gives
a perfectly fine total order as far as I see (it is the same as the
difference between the array indices of those elements).

In any case, every qsort call that is converted to one with this weaker
contract will need to be checked.  Or we can wait for bug reports ;-)


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
  2018-05-10 17:01   ` David Malcolm
  2018-05-10 17:44   ` Richard Biener
@ 2018-05-11 12:03   ` Richard Biener
  2018-05-11 13:12     ` Alexander Monakov
  2018-05-13 23:56   ` H.J. Lu
  3 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2018-05-11 12:03 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: GCC Patches

On Thu, May 10, 2018 at 5:56 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
>         * sort.cc: New file.
>         * system.h [!CHECKING_P] (qsort): Redirect to gcc_qsort.
>         * vec.c (qsort_chk): Use gcc_qsort.

Looks good to me.  As additional enhancement we might want to provide
(even unconditionally?)
the glibc qsort_r() interface.  I remember adding various globals to
pass down state to the comparator...

I agree self-tests might be good to have.  Also it looks like the
qsort-checking may now be somehow
embedded within our qsort implementation?

But all these things can be done as followup I think.

Thanks,
Richard.

> ---
>  gcc/sort.cc  | 232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/system.h |   7 +-
>  gcc/vec.c    |   2 +-
>  3 files changed, 238 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/sort.cc
>
> diff --git a/gcc/sort.cc b/gcc/sort.cc
> new file mode 100644
> index 00000000000..4faf6d45dc6
> --- /dev/null
> +++ b/gcc/sort.cc
> @@ -0,0 +1,232 @@
> +/* Platform-independent deterministic sort function.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by Alexander Monakov.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* This implements a sort function suitable for GCC use cases:
> +   - signature-compatible to C qsort, but relaxed contract:
> +     - may apply the comparator to elements in a temporary buffer
> +     - may abort on allocation failure
> +   - deterministic (but not necessarily stable)
> +   - fast, especially for common cases (0-5 elements of size 8 or 4)
> +
> +   The implementation uses a network sort for up to 5 elements and
> +   a merge sort on top of that.  Neither stage has branches depending on
> +   comparator result, trading extra arithmetic for branch mispredictions.  */
> +
> +#ifdef GENERATOR_FILE
> +#include "bconfig.h"
> +#else
> +#include "config.h"
> +#endif
> +
> +#include "system.h"
> +
> +#define likely(cond) __builtin_expect ((cond), 1)
> +
> +#ifdef __GNUC__
> +#define noinline __attribute__ ((__noinline__))
> +#else
> +#define noinline
> +#endif
> +
> +/* C-style qsort comparator function type.  */
> +typedef int cmp_fn (const void *, const void *);
> +
> +/* Structure holding read-mostly (read-only in netsort) context. */
> +struct sort_ctx
> +{
> +  cmp_fn *cmp; // pointer to comparator
> +  char   *out; // output buffer
> +  size_t n;    // number of elements
> +  size_t size; // element size
> +};
> +
> +/* Helper for netsort. Permute, possibly in-place, 2 or 3 elements,
> +   placing E0 to C->OUT, E1 to C->OUT + C->SIZE, and so on. */
> +static void
> +reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
> +{
> +#define REORDER_23(SIZE, STRIDE, OFFSET)        \
> +do {                                            \
> +  size_t t0, t1;                                \
> +  memcpy (&t0, e0 + OFFSET, SIZE);              \
> +  memcpy (&t1, e1 + OFFSET, SIZE);              \
> +  char *out = c->out + OFFSET;                  \
> +  if (likely (c->n == 3))                       \
> +    memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
> +  memcpy (out, &t0, SIZE); out += STRIDE;       \
> +  memcpy (out, &t1, SIZE);                      \
> +} while (0)
> +
> +  if (sizeof (size_t) == 8 && likely (c->size == 8))
> +    REORDER_23 (8, 8, 0);
> +  else if (likely (c->size == 4))
> +    REORDER_23 (4, 4, 0);
> +  else
> +    {
> +      size_t offset = 0, step = sizeof (size_t);
> +      for (; offset + step <= c->size; offset += step)
> +       REORDER_23 (step, c->size, offset);
> +      for (; offset < c->size; offset++)
> +       REORDER_23 (1, c->size, offset);
> +    }
> +}
> +
> +/* Like reorder23, but permute 4 or 5 elements. */
> +static void
> +reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
> +{
> +#define REORDER_45(SIZE, STRIDE, OFFSET)        \
> +do {                                            \
> +  size_t t0, t1, t2, t3;                        \
> +  memcpy (&t0, e0 + OFFSET, SIZE);              \
> +  memcpy (&t1, e1 + OFFSET, SIZE);              \
> +  memcpy (&t2, e2 + OFFSET, SIZE);              \
> +  memcpy (&t3, e3 + OFFSET, SIZE);              \
> +  char *out = c->out + OFFSET;                  \
> +  if (likely (c->n == 5))                       \
> +    memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
> +  memcpy (out, &t0, SIZE); out += STRIDE;       \
> +  memcpy (out, &t1, SIZE); out += STRIDE;       \
> +  memcpy (out, &t2, SIZE); out += STRIDE;       \
> +  memcpy (out, &t3, SIZE);                      \
> +} while (0)
> +
> +  if (sizeof (size_t) == 8 && likely (c->size == 8))
> +    REORDER_45 (8, 8, 0);
> +  else if (likely(c->size == 4))
> +    REORDER_45 (4, 4, 0);
> +  else
> +    {
> +      size_t offset = 0, step = sizeof (size_t);
> +      for (; offset + step <= c->size; offset += step)
> +       REORDER_45 (step, c->size, offset);
> +      for (; offset < c->size; offset++)
> +       REORDER_45 (1, c->size, offset);
> +    }
> +}
> +
> +/* Helper for netsort. Invoke comparator CMP on E0 and E1.
> +   Return E0^E1 if E0 compares less than E1, zero otherwise.
> +   This is noinline to avoid code growth and confine invocation
> +   to a single call site, assisting indirect branch prediction. */
> +noinline static intptr_t
> +cmp1 (char *e0, char *e1, cmp_fn *cmp)
> +{
> +  intptr_t x = (intptr_t)e0 ^ (intptr_t)e1;
> +  return x & (cmp (e0, e1) >> 31);
> +}
> +
> +/* Execute network sort on 2 to 5 elements from IN, placing them into C->OUT.
> +   IN may be equal to C->OUT, in which case elements are sorted in place.  */
> +static void
> +netsort (char *in, sort_ctx *c)
> +{
> +#define CMP(e0, e1)                   \
> +do {                                  \
> +  intptr_t x = cmp1 (e1, e0, c->cmp); \
> +  e0 = (char *)((intptr_t)e0 ^ x);    \
> +  e1 = (char *)((intptr_t)e1 ^ x);    \
> +} while (0)
> +
> +  char *e0 = in, *e1 = e0 + c->size, *e2 = e1 + c->size;
> +  CMP (e0, e1);
> +  if (likely (c->n == 3))
> +    {
> +      CMP (e1, e2);
> +      CMP (e0, e1);
> +    }
> +  if (c->n <= 3)
> +    return reorder23 (c, e0, e1, e2);
> +  char *e3 = e2 + c->size, *e4 = e3 + c->size;
> +  if (likely (c->n == 5))
> +    {
> +      CMP (e3, e4);
> +      CMP (e2, e4);
> +    }
> +  CMP (e2, e3);
> +  if (likely (c->n == 5))
> +    {
> +      CMP (e0, e3);
> +      CMP (e1, e4);
> +    }
> +  CMP (e0, e2);
> +  CMP (e1, e3);
> +  CMP (e1, e2);
> +  reorder45 (c, e0, e1, e2, e3, e4);
> +}
> +
> +/* Execute merge sort on N elements from IN, placing them into OUT,
> +   using TMP as temporary storage if IN is equal to OUT.
> +   This is a stable sort if netsort is used only for 2 or 3 elements. */
> +static void
> +mergesort (char *in, sort_ctx *c, size_t n, char *out, char *tmp)
> +{
> +  if (likely (n <= 5))
> +    {
> +      c->out = out;
> +      c->n = n;
> +      return netsort (in, c);
> +    }
> +  size_t nl = n / 2, nr = n - nl, sz = nl * c->size;
> +  char *mid = in + sz, *r = out + sz, *l = in == out ? tmp : in;
> +  /* Sort the right half, outputting to right half of OUT. */
> +  mergesort (mid, c, nr, r, tmp);
> +  /* Sort the left half, leaving left half of OUT free.  */
> +  mergesort (in, c, nl, l, mid);
> +  /* Merge sorted halves given by L, R to [OUT, END). */
> +#define MERGE_ELTSIZE(SIZE)                     \
> +do {                                            \
> +  intptr_t mr = c->cmp (r, l) >> 31;            \
> +  intptr_t lr = (intptr_t)l ^ (intptr_t)r;      \
> +  lr = (intptr_t)l ^ (lr & mr);                 \
> +  out = (char *)memcpy (out, (char *)lr, SIZE); \
> +  out += SIZE;                                  \
> +  r += mr & SIZE;                               \
> +  if (r == out) return;                         \
> +  l += ~mr & SIZE;                              \
> +} while (r != end)
> +
> +  if (likely (c->cmp(r, l + (r - out) - c->size) < 0))
> +    {
> +      char *end = out + n * c->size;
> +      if (sizeof (size_t) == 8 && likely (c->size == 8))
> +       MERGE_ELTSIZE (8);
> +      else if (likely (c->size == 4))
> +       MERGE_ELTSIZE (4);
> +      else
> +       MERGE_ELTSIZE (c->size);
> +    }
> +  memcpy (out, l, r - out);
> +}
> +
> +void
> +gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
> +{
> +  if (n < 2)
> +    return;
> +  char *base = (char *)vbase;
> +  sort_ctx c = {cmp, base, n, size};
> +  long long scratch[32];
> +  size_t bufsz = (n / 2) * size;
> +  void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
> +  mergesort (base, &c, n, base, (char *)buf);
> +  if (buf != scratch)
> +    free (buf);
> +}
> diff --git a/gcc/system.h b/gcc/system.h
> index 4abc321c71d..88dffccb8ab 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -1202,11 +1202,14 @@ helper_const_non_const_cast (const char *p)
>  /* qsort comparator consistency checking: except in release-checking compilers,
>     redirect 4-argument qsort calls to qsort_chk; keep 1-argument invocations
>     corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
> -#if CHECKING_P
> +void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
> +void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
>  #define PP_5th(a1, a2, a3, a4, a5, ...) a5
>  #undef qsort
> +#if CHECKING_P
>  #define qsort(...) PP_5th (__VA_ARGS__, qsort_chk, 3, 2, qsort, 0) (__VA_ARGS__)
> -void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
> +#else
> +#define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) (__VA_ARGS__)
>  #endif
>
>  #endif /* ! GCC_SYSTEM_H */
> diff --git a/gcc/vec.c b/gcc/vec.c
> index 11924a80a2d..2941715a34a 100644
> --- a/gcc/vec.c
> +++ b/gcc/vec.c
> @@ -215,7 +215,7 @@ void
>  qsort_chk (void *base, size_t n, size_t size,
>            int (*cmp)(const void *, const void *))
>  {
> -  (qsort) (base, n, size, cmp);
> +  gcc_qsort (base, n, size, cmp);
>  #if 0
>  #define LIM(n) (n)
>  #else
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 12:00       ` Segher Boessenkool
@ 2018-05-11 12:16         ` Alexander Monakov
  2018-05-11 12:52           ` Segher Boessenkool
  2018-05-11 16:54           ` Eric Botcazou
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-11 12:16 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Biener, gcc-patches

On Fri, 11 May 2018, Segher Boessenkool wrote:
> > In general such address-based comparisons steps are invalid; they lack
> > anti-reflexivity. So comparators are not actually allowed to do that.
> 
> I don't know what you mean here?  Every comparator is required to be
> *reflexive*!  Subtracting two pointers to elements of the same array gives
> a perfectly fine total order as far as I see (it is the same as the
> difference between the array indices of those elements).

I meant "anti-commutativity"; sorry about the mixup. Such strategy does not
give a valid total order because the order changes while in-progress sorting
reorders elements:

suppose you have elements A and B such that A precedes B and compare A < B
via address test. At some point, the sort routine may swap A with some
unrelated element C, moving A past B. Now B < A. This is a contradiction.

> In any case, every qsort call that is converted to one with this weaker
> contract will need to be checked.  Or we can wait for bug reports ;-)

I think such comparators have a good chance of failing existing qsort_chk
validation.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 12:16         ` Alexander Monakov
@ 2018-05-11 12:52           ` Segher Boessenkool
  2018-05-11 16:54           ` Eric Botcazou
  1 sibling, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2018-05-11 12:52 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Richard Biener, gcc-patches

On Fri, May 11, 2018 at 03:03:09PM +0300, Alexander Monakov wrote:
> On Fri, 11 May 2018, Segher Boessenkool wrote:
> > > In general such address-based comparisons steps are invalid; they lack
> > > anti-reflexivity. So comparators are not actually allowed to do that.
> > 
> > I don't know what you mean here?  Every comparator is required to be
> > *reflexive*!  Subtracting two pointers to elements of the same array gives
> > a perfectly fine total order as far as I see (it is the same as the
> > difference between the array indices of those elements).
> 
> I meant "anti-commutativity"; sorry about the mixup. Such strategy does not
> give a valid total order because the order changes while in-progress sorting
> reorders elements:

Ah.  Right.  C11 7.22.5/4.  Sorry for being dense, I'll drink more coffee.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-11 12:03   ` Richard Biener
@ 2018-05-11 13:12     ` Alexander Monakov
  0 siblings, 0 replies; 24+ messages in thread
From: Alexander Monakov @ 2018-05-11 13:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Fri, 11 May 2018, Richard Biener wrote:
> Looks good to me.  As additional enhancement we might want to provide
> (even unconditionally?)
> the glibc qsort_r() interface.  I remember adding various globals to
> pass down state to the comparator...

Thanks. I have no plans w.r.t qsort_r, but OTOH a stable sort interface
can be added with tiny size/speed cost, and the sole in-tree use can be
converted :)

> I agree self-tests might be good to have.  Also it looks like the
> qsort-checking may now be somehow
> embedded within our qsort implementation?

I gave self-tests some thought after David's mail, and honestly I don't
see much value in that, given that we run qsort_chk on everything.

As for embedding, I don't think that's necessary. I prefer to keep them
separate.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Introduce gcc_qsort
  2018-05-11 12:16         ` Alexander Monakov
  2018-05-11 12:52           ` Segher Boessenkool
@ 2018-05-11 16:54           ` Eric Botcazou
  1 sibling, 0 replies; 24+ messages in thread
From: Eric Botcazou @ 2018-05-11 16:54 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, Segher Boessenkool, Richard Biener

> I meant "anti-commutativity"; sorry about the mixup. 

symmetry/antisymmetry is more appropriate for binary relations (commutativity 
is for binary operations).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
                     ` (2 preceding siblings ...)
  2018-05-11 12:03   ` Richard Biener
@ 2018-05-13 23:56   ` H.J. Lu
  2018-05-14  8:44     ` Alexander Monakov
  3 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-05-13 23:56 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: GCC Patches

On Thu, May 10, 2018 at 8:56 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
>         * sort.cc: New file.
>         * system.h [!CHECKING_P] (qsort): Redirect to gcc_qsort.
>         * vec.c (qsort_chk): Use gcc_qsort.
>

This breaks bootstrap on Fedora 28/i686:

https://gcc.gnu.org/ml/gcc-regression/2018-05/msg00088.html

../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:100:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (&t2, e2 + OFFSET, SIZE);              \
   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:18: note: ‘t2’ declared here
   size_t t0, t1, t2, t3;                        \
                  ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:101:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t3’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (&t3, e3 + OFFSET, SIZE);              \
   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:22: note: ‘t3’ declared here
   size_t t0, t1, t2, t3;                        \
                      ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:105:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t0’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (out, &t0, SIZE); out += STRIDE;       \
   ~~~~~~~^~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:10: note: ‘t0’ declared here
   size_t t0, t1, t2, t3;                        \
          ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:106:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t1’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (out, &t1, SIZE); out += STRIDE;       \
   ~~~~~~~^~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:14: note: ‘t1’ declared here
   size_t t0, t1, t2, t3;                        \
              ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:107:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (out, &t2, SIZE); out += STRIDE;       \
   ~~~~~~~^~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:18: note: ‘t2’ declared here
   size_t t0, t1, t2, t3;                        \
                  ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:108:10: error: ‘void* memcpy(void*, const
void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
object ‘t3’ with type ‘size_t’ {aka ‘unsigned int’}
[-Werror=array-bounds]
   memcpy (out, &t3, SIZE);                      \
   ~~~~~~~^~~~~~~~~~~~~~~~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~
../../src-trunk/gcc/sort.cc:97:22: note: ‘t3’ declared here
   size_t t0, t1, t2, t3;                        \
                      ^~
../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
     REORDER_45 (8, 8, 0);
     ^~~~~~~~~~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-13 23:56   ` H.J. Lu
@ 2018-05-14  8:44     ` Alexander Monakov
  2018-05-14  9:08       ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2018-05-14  8:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5510 bytes --]

On Sun, 13 May 2018, H.J. Lu wrote:
> This breaks bootstrap on Fedora 28/i686:
> 
> https://gcc.gnu.org/ml/gcc-regression/2018-05/msg00088.html
> 
> ../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
>      REORDER_45 (8, 8, 0);
>      ^~~~~~~~~~
> ../../src-trunk/gcc/sort.cc:100:10: error: ‘void* memcpy(void*, const
> void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
> object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
> [-Werror=array-bounds]
>    memcpy (&t2, e2 + OFFSET, SIZE);              \
>    ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~

Hm, on 32-bit this is trivially dead code, I wonder why we issue the warning?

In any case, due to PR 85757 it's desirable to use types with sizes matching
the memcpy size; is the following OK to apply? Bootstrapped on 32-bit x86.

	* sort.cc (REORDER_23): Pass the type for the temporaries instead of
        intended memcpy size.
        (REORDER_45): Likewise.
---
 gcc/sort.cc | 72 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/gcc/sort.cc b/gcc/sort.cc
index 4faf6d45dc6..c41683c91dd 100644
--- a/gcc/sort.cc
+++ b/gcc/sort.cc
@@ -62,29 +62,29 @@ struct sort_ctx
 static void
 reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
 {
-#define REORDER_23(SIZE, STRIDE, OFFSET)        \
-do {                                            \
-  size_t t0, t1;                                \
-  memcpy (&t0, e0 + OFFSET, SIZE);              \
-  memcpy (&t1, e1 + OFFSET, SIZE);              \
-  char *out = c->out + OFFSET;                  \
-  if (likely (c->n == 3))                       \
-    memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
-  memcpy (out, &t0, SIZE); out += STRIDE;       \
-  memcpy (out, &t1, SIZE);                      \
+#define REORDER_23(TYPE, STRIDE, OFFSET)                 \
+do {                                                     \
+  TYPE t0, t1;                                           \
+  memcpy (&t0, e0 + OFFSET, sizeof (TYPE));              \
+  memcpy (&t1, e1 + OFFSET, sizeof (TYPE));              \
+  char *out = c->out + OFFSET;                           \
+  if (likely (c->n == 3))                                \
+    memcpy (out + 2*STRIDE, e2 + OFFSET, sizeof (TYPE)); \
+  memcpy (out, &t0, sizeof (TYPE)); out += STRIDE;       \
+  memcpy (out, &t1, sizeof (TYPE));                      \
 } while (0)
 
-  if (sizeof (size_t) == 8 && likely (c->size == 8))
-    REORDER_23 (8, 8, 0);
-  else if (likely (c->size == 4))
-    REORDER_23 (4, 4, 0);
+  if (likely (c->size == sizeof (size_t)))
+    REORDER_23 (size_t, sizeof (size_t), 0);
+  else if (likely (c->size == sizeof (int)))
+    REORDER_23 (int, sizeof (int), 0);
   else
     {
       size_t offset = 0, step = sizeof (size_t);
       for (; offset + step <= c->size; offset += step)
-	REORDER_23 (step, c->size, offset);
+	REORDER_23 (size_t, c->size, offset);
       for (; offset < c->size; offset++)
-	REORDER_23 (1, c->size, offset);
+	REORDER_23 (char, c->size, offset);
     }
 }
 
@@ -92,33 +92,33 @@ do {                                            \
 static void
 reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
 {
-#define REORDER_45(SIZE, STRIDE, OFFSET)        \
-do {                                            \
-  size_t t0, t1, t2, t3;                        \
-  memcpy (&t0, e0 + OFFSET, SIZE);              \
-  memcpy (&t1, e1 + OFFSET, SIZE);              \
-  memcpy (&t2, e2 + OFFSET, SIZE);              \
-  memcpy (&t3, e3 + OFFSET, SIZE);              \
-  char *out = c->out + OFFSET;                  \
-  if (likely (c->n == 5))                       \
-    memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
-  memcpy (out, &t0, SIZE); out += STRIDE;       \
-  memcpy (out, &t1, SIZE); out += STRIDE;       \
-  memcpy (out, &t2, SIZE); out += STRIDE;       \
-  memcpy (out, &t3, SIZE);                      \
+#define REORDER_45(TYPE, STRIDE, OFFSET)                 \
+do {                                                     \
+  TYPE t0, t1, t2, t3;                                   \
+  memcpy (&t0, e0 + OFFSET, sizeof (TYPE));              \
+  memcpy (&t1, e1 + OFFSET, sizeof (TYPE));              \
+  memcpy (&t2, e2 + OFFSET, sizeof (TYPE));              \
+  memcpy (&t3, e3 + OFFSET, sizeof (TYPE));              \
+  char *out = c->out + OFFSET;                           \
+  if (likely (c->n == 5))                                \
+    memcpy (out + 4*STRIDE, e4 + OFFSET, sizeof (TYPE)); \
+  memcpy (out, &t0, sizeof (TYPE)); out += STRIDE;       \
+  memcpy (out, &t1, sizeof (TYPE)); out += STRIDE;       \
+  memcpy (out, &t2, sizeof (TYPE)); out += STRIDE;       \
+  memcpy (out, &t3, sizeof (TYPE));                      \
 } while (0)
 
-  if (sizeof (size_t) == 8 && likely (c->size == 8))
-    REORDER_45 (8, 8, 0);
-  else if (likely(c->size == 4))
-    REORDER_45 (4, 4, 0);
+  if (likely (c->size == sizeof (size_t)))
+    REORDER_45 (size_t, sizeof (size_t), 0);
+  else if (likely(c->size == sizeof (int)))
+    REORDER_45 (int,  sizeof (int), 0);
   else
     {
       size_t offset = 0, step = sizeof (size_t);
       for (; offset + step <= c->size; offset += step)
-	REORDER_45 (step, c->size, offset);
+	REORDER_45 (size_t, c->size, offset);
       for (; offset < c->size; offset++)
-	REORDER_45 (1, c->size, offset);
+	REORDER_45 (char, c->size, offset);
     }
 }
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] gcc_qsort: source code changes
  2018-05-14  8:44     ` Alexander Monakov
@ 2018-05-14  9:08       ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2018-05-14  9:08 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: H.J. Lu, GCC Patches

On Mon, May 14, 2018 at 8:37 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Sun, 13 May 2018, H.J. Lu wrote:
>> This breaks bootstrap on Fedora 28/i686:
>>
>> https://gcc.gnu.org/ml/gcc-regression/2018-05/msg00088.html
>>
>> ../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
>>      REORDER_45 (8, 8, 0);
>>      ^~~~~~~~~~
>> ../../src-trunk/gcc/sort.cc:100:10: error: ‘void* memcpy(void*, const
>> void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
>> object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
>> [-Werror=array-bounds]
>>    memcpy (&t2, e2 + OFFSET, SIZE);              \
>>    ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
>
> Hm, on 32-bit this is trivially dead code, I wonder why we issue the warning?
>
> In any case, due to PR 85757 it's desirable to use types with sizes matching
> the memcpy size; is the following OK to apply? Bootstrapped on 32-bit x86.

OK.

Richard.

>         * sort.cc (REORDER_23): Pass the type for the temporaries instead of
>         intended memcpy size.
>         (REORDER_45): Likewise.
> ---
>  gcc/sort.cc | 72 ++++++++++++++++++++++++++++++-------------------------------
>  1 file changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/gcc/sort.cc b/gcc/sort.cc
> index 4faf6d45dc6..c41683c91dd 100644
> --- a/gcc/sort.cc
> +++ b/gcc/sort.cc
> @@ -62,29 +62,29 @@ struct sort_ctx
>  static void
>  reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
>  {
> -#define REORDER_23(SIZE, STRIDE, OFFSET)        \
> -do {                                            \
> -  size_t t0, t1;                                \
> -  memcpy (&t0, e0 + OFFSET, SIZE);              \
> -  memcpy (&t1, e1 + OFFSET, SIZE);              \
> -  char *out = c->out + OFFSET;                  \
> -  if (likely (c->n == 3))                       \
> -    memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
> -  memcpy (out, &t0, SIZE); out += STRIDE;       \
> -  memcpy (out, &t1, SIZE);                      \
> +#define REORDER_23(TYPE, STRIDE, OFFSET)                 \
> +do {                                                     \
> +  TYPE t0, t1;                                           \
> +  memcpy (&t0, e0 + OFFSET, sizeof (TYPE));              \
> +  memcpy (&t1, e1 + OFFSET, sizeof (TYPE));              \
> +  char *out = c->out + OFFSET;                           \
> +  if (likely (c->n == 3))                                \
> +    memcpy (out + 2*STRIDE, e2 + OFFSET, sizeof (TYPE)); \
> +  memcpy (out, &t0, sizeof (TYPE)); out += STRIDE;       \
> +  memcpy (out, &t1, sizeof (TYPE));                      \
>  } while (0)
>
> -  if (sizeof (size_t) == 8 && likely (c->size == 8))
> -    REORDER_23 (8, 8, 0);
> -  else if (likely (c->size == 4))
> -    REORDER_23 (4, 4, 0);
> +  if (likely (c->size == sizeof (size_t)))
> +    REORDER_23 (size_t, sizeof (size_t), 0);
> +  else if (likely (c->size == sizeof (int)))
> +    REORDER_23 (int, sizeof (int), 0);
>    else
>      {
>        size_t offset = 0, step = sizeof (size_t);
>        for (; offset + step <= c->size; offset += step)
> -       REORDER_23 (step, c->size, offset);
> +       REORDER_23 (size_t, c->size, offset);
>        for (; offset < c->size; offset++)
> -       REORDER_23 (1, c->size, offset);
> +       REORDER_23 (char, c->size, offset);
>      }
>  }
>
> @@ -92,33 +92,33 @@ do {                                            \
>  static void
>  reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
>  {
> -#define REORDER_45(SIZE, STRIDE, OFFSET)        \
> -do {                                            \
> -  size_t t0, t1, t2, t3;                        \
> -  memcpy (&t0, e0 + OFFSET, SIZE);              \
> -  memcpy (&t1, e1 + OFFSET, SIZE);              \
> -  memcpy (&t2, e2 + OFFSET, SIZE);              \
> -  memcpy (&t3, e3 + OFFSET, SIZE);              \
> -  char *out = c->out + OFFSET;                  \
> -  if (likely (c->n == 5))                       \
> -    memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
> -  memcpy (out, &t0, SIZE); out += STRIDE;       \
> -  memcpy (out, &t1, SIZE); out += STRIDE;       \
> -  memcpy (out, &t2, SIZE); out += STRIDE;       \
> -  memcpy (out, &t3, SIZE);                      \
> +#define REORDER_45(TYPE, STRIDE, OFFSET)                 \
> +do {                                                     \
> +  TYPE t0, t1, t2, t3;                                   \
> +  memcpy (&t0, e0 + OFFSET, sizeof (TYPE));              \
> +  memcpy (&t1, e1 + OFFSET, sizeof (TYPE));              \
> +  memcpy (&t2, e2 + OFFSET, sizeof (TYPE));              \
> +  memcpy (&t3, e3 + OFFSET, sizeof (TYPE));              \
> +  char *out = c->out + OFFSET;                           \
> +  if (likely (c->n == 5))                                \
> +    memcpy (out + 4*STRIDE, e4 + OFFSET, sizeof (TYPE)); \
> +  memcpy (out, &t0, sizeof (TYPE)); out += STRIDE;       \
> +  memcpy (out, &t1, sizeof (TYPE)); out += STRIDE;       \
> +  memcpy (out, &t2, sizeof (TYPE)); out += STRIDE;       \
> +  memcpy (out, &t3, sizeof (TYPE));                      \
>  } while (0)
>
> -  if (sizeof (size_t) == 8 && likely (c->size == 8))
> -    REORDER_45 (8, 8, 0);
> -  else if (likely(c->size == 4))
> -    REORDER_45 (4, 4, 0);
> +  if (likely (c->size == sizeof (size_t)))
> +    REORDER_45 (size_t, sizeof (size_t), 0);
> +  else if (likely(c->size == sizeof (int)))
> +    REORDER_45 (int,  sizeof (int), 0);
>    else
>      {
>        size_t offset = 0, step = sizeof (size_t);
>        for (; offset + step <= c->size; offset += step)
> -       REORDER_45 (step, c->size, offset);
> +       REORDER_45 (size_t, c->size, offset);
>        for (; offset < c->size; offset++)
> -       REORDER_45 (1, c->size, offset);
> +       REORDER_45 (char, c->size, offset);
>      }
>  }
>
> --
> 2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-14  8:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 15:57 [PATCH 0/2] Introduce gcc_qsort Alexander Monakov
2018-05-10 15:57 ` [PATCH 2/2] gcc_qsort: build system changes Alexander Monakov
2018-05-11  9:32   ` Richard Biener
2018-05-10 17:01 ` [PATCH 1/2] gcc_qsort: source code changes Alexander Monakov
2018-05-10 17:01   ` David Malcolm
2018-05-10 17:44   ` Richard Biener
2018-05-10 18:08     ` Alexander Monakov
2018-05-10 18:57       ` DJ Delorie
2018-05-11 12:03   ` Richard Biener
2018-05-11 13:12     ` Alexander Monakov
2018-05-13 23:56   ` H.J. Lu
2018-05-14  8:44     ` Alexander Monakov
2018-05-14  9:08       ` Richard Biener
2018-05-10 17:35 ` [PATCH 0/2] Introduce gcc_qsort Jakub Jelinek
2018-05-10 17:48   ` Alexander Monakov
2018-05-10 17:43 ` Richard Biener
2018-05-10 17:57   ` Alexander Monakov
2018-05-11 10:35   ` Segher Boessenkool
2018-05-11 10:44     ` Alexander Monakov
2018-05-11 12:00       ` Segher Boessenkool
2018-05-11 12:16         ` Alexander Monakov
2018-05-11 12:52           ` Segher Boessenkool
2018-05-11 16:54           ` Eric Botcazou
2018-05-11 11:17     ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).