[PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
@ 2014-11-12 21:24 Iain Sandoe
  2014-11-12 22:34 ` Joseph Myers
  2014-11-13  7:41 ` Richard Henderson
  0 siblings, 2 replies; 12+ messages in thread
From: Iain Sandoe @ 2014-11-12 21:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches Patches, Mike Stump

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

Hi Richard,

The posix fallback for libatomic locks on unsupported item sizes (using pthreads) might be reliable, but is (not surprisingly) somewhat slow.

Whereas the built-in testsuite from libatomic passes ..
..  every Darwin platform from powerpc-darwin9 (on a G5) .. through x86-64-darwin14 (on Haswell) fails the gcc/exceptions.exp suite with timeouts.

So here's a platform port
(we don't [yet] have ifuncs).

----------------

The patch does the following:

1) When 4byte atomic operations are supported (will always be the case when the lib is built as part of GCC) these are used to guard accesses.

2) When atomic ops are not available, OSSpinLocks are used (available since forever on the OS).  These are also 4byte locks on all current platforms.

3) Some heuristic fiddling with the algorithm for hashing addresses - to try and avoid cache turbulence when items are close-ish together (not unlikely in a typical code).  This might yet bear some more tweaking, but seems OK enough for a first go.

4) We use low-level Mach interfaces to give up our timeslice when blocked, to keep overheads to a minimum.

5) We allow the port to specify the minimum processor that will be available.
   E.G. for x86 darwin defaults to core2, which means that we don't start guarding small items which we could lock natively.

================

Tested across the patch for a while on darwin9..13 (and by Dominique on darwin14)
I jammed the atomic version off to test the Spinlock-based timings.

Typical runtime results  for gcc: atomic.exp 

[all == timeout with trunk implementation]

Port Version		Spinlock 		Atomic
=======================================================
powerpc-darwin9		~30mins			~15mins	2G5 G5
x86-64-darwin{12,13}    ~15mins			~6mins	2G8 Xeon, 2G6 Ivy bridge
x86-64-darwin14		-------			~3mins	(Haswell, AFAIK)

This (for x86-64, Xeon) is about the same as I see on our Linux machines.

OK for trunk?

This is also functional on gcc-4.8 and gcc-4.9...
.. what would the feeling be about back-porting?

Iain

libatomic:

	* config/darwin/host-config.h New.
	* config/darwin/lock.c New.
	* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.


[-- Attachment #2: darwin-libatomic.txt --]
[-- Type: text/plain, Size: 9868 bytes --]

From fdbf91b9fb20992231a370f0e5cd803085b4f69e Mon Sep 17 00:00:00 2001
From: Iain Sandoe <iain@codesourcery.com>
Date: Wed, 15 Oct 2014 10:49:40 +0100
Subject: [PATCH] Initial draft of a Darwin port for libatomic

---
 libatomic/config/darwin/host-config.h |  55 ++++++++++
 libatomic/config/darwin/lock.c        | 187 ++++++++++++++++++++++++++++++++++
 libatomic/configure.tgt               |  21 +++-
 3 files changed, 260 insertions(+), 3 deletions(-)
 create mode 100644 libatomic/config/darwin/host-config.h
 create mode 100644 libatomic/config/darwin/lock.c

diff --git a/libatomic/config/darwin/host-config.h b/libatomic/config/darwin/host-config.h
new file mode 100644
index 0000000..db55d34
--- /dev/null
+++ b/libatomic/config/darwin/host-config.h
@@ -0,0 +1,55 @@
+/* Copyright (C) 2012-2014 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Included after all more target-specific host-config.h.  */
+
+
+#ifndef protect_start_end
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility push(hidden)
+# endif
+
+void libat_lock_1 (void *ptr);
+void libat_unlock_1 (void *ptr);
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  libat_lock_1 (ptr);
+  return 0;
+}
+
+static inline void
+protect_end (void *ptr, UWORD dummy UNUSED)
+{
+  libat_unlock_1 (ptr);
+}
+
+# define protect_start_end 1
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility pop
+# endif
+#endif /* protect_start_end */
+
+#include_next <host-config.h>
diff --git a/libatomic/config/darwin/lock.c b/libatomic/config/darwin/lock.c
new file mode 100644
index 0000000..286b9df
--- /dev/null
+++ b/libatomic/config/darwin/lock.c
@@ -0,0 +1,187 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Iain Sandoe <iain@codesourcery.com>.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include "libatomic_i.h"
+
+/* For items that must be guarded by a lock, we use the following strategy:
+   If atomic support is available for a unit32_t we use that.
+   If not we use the Darwin OSSpinLock implementation. */
+
+/* The target page size.  Must be no larger than the runtime page size,
+   lest locking fail with virtual address aliasing (i.e. a page mmaped
+   at two locations).  */
+#ifndef PAGE_SIZE
+#  define PAGE_SIZE 4096
+#endif
+
+/* The target cacheline size.  */
+#ifndef CACHLINE_SIZE
+#  define CACHLINE_SIZE 64
+#endif
+
+/* The granularity at which locks are applied when n > CACHLINE_SIZE.
+   We follow the posix pthreads implementation here.  */
+#ifndef WATCH_SIZE
+#  define WATCH_SIZE	CACHLINE_SIZE
+#endif
+
+#if HAVE_ATOMIC_EXCHANGE_4 && HAVE_ATOMIC_LDST_4
+
+#  include <stdatomic.h>
+#  include <mach/mach_traps.h>
+#  include <mach/thread_switch.h>
+
+#  ifndef USE_ATOMIC
+#    define USE_ATOMIC 1
+#  endif
+
+inline static void LockUnlock(uint32_t *l) {
+  __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
+}
+
+/* This is a number the number of tries we will make to acquire the lock
+   before giving up our time-slice (on the basis that we are guarding small
+   sections of code here and, therefore if we don't acquire the lock quickly,
+   that implies that the current holder is not active).  */
+#  define NSPINS 4
+inline static void LockLock(uint32_t *l) {
+  uint32_t old = 0;
+  uint32_t n = NSPINS;
+  while (!__atomic_compare_exchange_4((_Atomic(uint32_t)*)l, &old,
+        1, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+     old = 0;
+    if (--n == 0) {
+      /* Give up this time-slice, no hint to the scheduler about what to pick.
+         TODO: maybe see if it's worth preserving some info about presence of
+         waiting processes - to allow a similar "give up" time-slice scheme on
+         the unlock end.  */
+      thread_switch((mach_port_name_t)0, SWITCH_OPTION_NONE,
+		    MACH_MSG_TIMEOUT_NONE);
+      n = NSPINS;
+    }
+  }
+}
+
+#  define LOCK_SIZE sizeof(uint32_t)
+#  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
+static uint32_t locks[NLOCKS];
+
+#else
+
+#  include <libkern/OSAtomic.h>
+
+#  ifndef USE_ATOMIC
+#    define USE_ATOMIC 0
+#  endif
+
+#  define LOCK_SIZE sizeof(OSSpinLock)
+#  define NLOCKS		(PAGE_SIZE / LOCK_SIZE)
+static OSSpinLock locks[NLOCKS];
+
+#endif
+
+/* A hash function that assumes that entities of a given size are at least
+   aligned to that size, and tries to minimise the probability that adjacent
+   objects will end up using the same cache line in the locks.  */
+static inline uintptr_t
+addr_hash (void *ptr, size_t n)
+{
+  if (n <= CACHLINE_SIZE)
+    n = sizeof(unsigned int)*8 - __builtin_clz((unsigned int) n) -1;
+  else
+    n = 7;
+
+  uint16_t x = (((uintptr_t)ptr) >> n);
+  x ^= n;
+  x = ((x >> 8) & 0xff) | ((x << 8) & 0xff00);
+  return x % NLOCKS;
+}
+
+void
+libat_lock_1 (void *ptr)
+{
+#if USE_ATOMIC
+  LockLock (&locks[addr_hash (ptr, 1)]);
+#else
+  OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
+#endif
+}
+
+void
+libat_unlock_1 (void *ptr)
+{
+#if USE_ATOMIC
+  LockUnlock (&locks[addr_hash (ptr, 1)]);
+#else
+  OSSpinLockUnlock (&locks[addr_hash (ptr, 1)]);
+#endif
+}
+
+void
+libat_lock_n (void *ptr, size_t n)
+{
+  uintptr_t h = addr_hash (ptr, n);
+
+  /* Don't lock more than all the locks we have.  */
+  if (n > PAGE_SIZE)
+    n = PAGE_SIZE;
+
+  size_t i = 0;
+  do
+    {
+#if USE_ATOMIC
+      LockLock (&locks[h]);
+#else
+      OSSpinLockLock(&locks[h]);
+#endif
+      if (++h == NLOCKS)
+	h = 0;
+      i += WATCH_SIZE;
+    }
+  while (i < n);
+}
+
+void
+libat_unlock_n (void *ptr, size_t n)
+{
+  uintptr_t h = addr_hash (ptr, n);
+
+  if (n > PAGE_SIZE)
+    n = PAGE_SIZE;
+
+  size_t i = 0;
+  do
+    {
+#if USE_ATOMIC
+      LockUnlock (&locks[h]);
+#else
+      OSSpinLockUnlock (&locks[h]);
+#endif
+      if (++h == NLOCKS)
+	h = 0;
+      i += WATCH_SIZE;
+    }
+  while (i < n);
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b0344d5..8283012 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -26,6 +26,16 @@
 # Map the target cpu to an ARCH sub-directory.  At the same time,
 # work out any special compilation flags as necessary.
 
+case "${target}" in
+  *-*-darwin*)
+    # Use the same default as GCC.
+    DEFAULT_X86_CPU=core2
+    ;;
+  *)
+    DEFAULT_X86_CPU=i486
+    ;;
+esac
+
 case "${target_cpu}" in
   alpha*)
 	# fenv.c needs this option to generate inexact exceptions.
@@ -67,7 +77,7 @@ case "${target_cpu}" in
 	    ;;
 	  *)
 	    if test -z "$with_arch"; then
-	      XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
+	      XCFLAGS="${XCFLAGS} -march=$DEFAULT_X86_CPU -mtune=${target_cpu}"
 	      XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
 	    fi
 	esac
@@ -78,7 +88,7 @@ case "${target_cpu}" in
   x86_64)
 	case " ${CC} ${CFLAGS} " in
 	  *" -m32 "*)
-	    XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+	    XCFLAGS="${XCFLAGS} -march=$DEFAULT_X86_CPU -mtune=generic"
 	    XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
 	    ;;
 	  *)
@@ -107,11 +117,16 @@ case "${target}" in
   *-*-linux* | *-*-gnu* | *-*-k*bsd*-gnu \
   | *-*-netbsd* | *-*-freebsd* | *-*-openbsd* \
   | *-*-solaris2* | *-*-sysv4* | *-*-irix6* | *-*-osf* | *-*-hpux11* \
-  | *-*-darwin* | *-*-aix* | *-*-cygwin*)
+  | *-*-aix* | *-*-cygwin*)
 	# POSIX system.  The OS is supported.
 	config_path="${config_path} posix"
 	;;
 
+  *-*-darwin*)
+	# Darwin system.  The OS is supported.
+	config_path="${config_path} darwin"
+	;;
+
   *-*-mingw*)
 	# OS support for atomic primitives.
         case ${target_thread_file} in
-- 
1.8.4.2


[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-12 21:24 [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin* Iain Sandoe
@ 2014-11-12 22:34 ` Joseph Myers
  2014-11-13  7:41 ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Joseph Myers @ 2014-11-12 22:34 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Richard Henderson, gcc-patches Patches, Mike Stump

On Wed, 12 Nov 2014, Iain Sandoe wrote:

> libatomic:

	PR target/59305

(at least, I presume this fixes that bug; I don't know if there are any 
other relevant bugs in Bugzilla).

> 	* config/darwin/host-config.h New.
> 	* config/darwin/lock.c New.
> 	* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-12 21:24 [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin* Iain Sandoe
  2014-11-12 22:34 ` Joseph Myers
@ 2014-11-13  7:41 ` Richard Henderson
  2014-11-13 20:39   ` Iain Sandoe
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2014-11-13  7:41 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: gcc-patches Patches, Mike Stump

On 11/12/2014 10:18 PM, Iain Sandoe wrote:
> 	* config/darwin/host-config.h New.
> 	* config/darwin/lock.c New.
> 	* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

Looks pretty good.


> +#  ifndef USE_ATOMIC
> +#    define USE_ATOMIC 1
> +#  endif

Why would USE_ATOMIC be defined previously?

> +inline static void LockUnlock(uint32_t *l) {
> +  __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
> +}

Gnu coding style, please.  All through the file here.


> +#  define LOCK_SIZE sizeof(uint32_t)
> +#  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
> +static uint32_t locks[NLOCKS];

Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
target region that's at issue, not the size of the lock itself.


> +#if USE_ATOMIC
> +  LockLock (&locks[addr_hash (ptr, 1)]);
> +#else
> +  OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
> +#endif

Better to #define LockLock  OSSpinLockLock within the area above, so as to
avoid the ifdefs here.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-13  7:41 ` Richard Henderson
@ 2014-11-13 20:39   ` Iain Sandoe
  2014-11-14  8:12     ` Richard Henderson
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Iain Sandoe @ 2014-11-13 20:39 UTC (permalink / raw)
  To: Richard Henderson, Joseph Myers; +Cc: gcc-patches Patches, Mike Stump

[-- Attachment #1: Type: text/plain, Size: 4752 bytes --]

Hello Richard, Joseph,

Thanks for your reviews,

On 13 Nov 2014, at 07:40, Richard Henderson wrote:

> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
> 
>> #  ifndef USE_ATOMIC
>> #    define USE_ATOMIC 1
>> #  endif
> 
> Why would USE_ATOMIC be defined previously?

This was left-over from a mode where I allowed the User to jam the mode to OSSpinLocks to test the performance.

I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my trunk version of the patch had got behind my 4.9 one. (most of the work has been hammered out there while we try to get bootstrap restored).

re-synced and retested with a patched trunk that bootstraps with some band-aid.

>> inline static void LockUnlock(uint32_t *l) {
>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>> }
> 
> Gnu coding style, please.  All through the file here.

Fixed.

> #  define LOCK_SIZE sizeof(uint32_t)
>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>> static uint32_t locks[NLOCKS];
> 
> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
> target region that's at issue, not the size of the lock itself.

The algorithm I've used is intentionally different from the pthreads-based posix one, here's the rationale, as I see it:

/* Algorithm motivations.

Layout Assumptions:
  o Darwin has a number of sub-targets with common atomic types that have no
    'native' in-line handling, but are smaller than a cache-line.
    E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.

  o The _Atomic alignment of a "natural type" is no greater than the type size.

  o There are no special guarantees about the alignment of _Atomic aggregates
    other than those determined by the psABI.

  o There are no guarantees that placement of an entity won't cause it to
    straddle a cache-line boundary.

  o Realistic User code will likely place several _Atomic qualified types in
    close proximity (such that they fall within the same cache-line).
    Similarly, arrays of _Atomic qualified items.

Performance Assumptions:
  o Collisions of address hashes for items (which make up the lock keys)
    constitute the largest performance issue.

  o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
    items are accessed.

Implementation:
  We maintain a table of locks, each lock being 4 bytes (at present). 
  This choice of lock size gives some measure of equality in hash-collision
  statistics between the 'atomic' and 'OSSpinLock' implementations, since the
  lock size is fixed at 4 bytes for the latter.

  The table occupies one physical page, and we attempt to align it to a
  page boundary, appropriately.

  For entities that need a lock, with sizes < one cache line:
    Each entity that requires a lock, chooses the lock to use from the table on
  the basis of a hash determined by its size and address.  The lower log2(size)
  address bits are discarded on the assumption that the alignment of entities
  will not be smaller than their size.
  CHECKME: this is not verified for aggregates; it might be something that
  could/should be enforced from the front ends (since _Atomic types are
  allowed to have increased alignment c.f. 'normal').

  For entities that need a lock, with sizes >= one cacheline_size:
    We assume that the entity alignment >= log2(cacheline_size) and discard
  log2(cacheline_size) bits from the address.
  We then apply size/cacheline_size locks to cover the entity.

  The idea is that this will typically result in distinct hash keys for items
  placed close together.  The keys are mangled further such that the size is
  included in the hash.

  Finally, to attempt to make it such that the lock table entries are accessed
  in a scattered manner,to avoid repeated cacheline flushes, the hash is
  rearranged to attempt to maximise the most noise in the upper bits.
*/

NOTE that the CHECKME above doesn't put us in any worse position than the pthreads implementation (likely slightly better since we have a smaller granularity with the current scheme).

>> #if USE_ATOMIC
>>   LockLock (&locks[addr_hash (ptr, 1)]);
>> #else
>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>> #endif
> 
> Better to #define LockLock  OSSpinLockLock within the area above, so as to
> avoid the ifdefs here.
done.

Thoughts on the rationale - or OK now?

thanks
Iain

I'm not aware of any other PRs that relate, but will do a final scan through and ask around the darwin folks.

libatomic:

	PR target/59305
	* config/darwin/host-config.h New.
	* config/darwin/lock.c New.
	* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.


[-- Attachment #2: darwin-libatomic-v2.txt --]
[-- Type: text/plain, Size: 12183 bytes --]

diff --git a/libatomic/config/darwin/host-config.h b/libatomic/config/darwin/host-config.h
new file mode 100644
index 0000000..db55d34
--- /dev/null
+++ b/libatomic/config/darwin/host-config.h
@@ -0,0 +1,55 @@
+/* Copyright (C) 2012-2014 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Included after all more target-specific host-config.h.  */
+
+
+#ifndef protect_start_end
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility push(hidden)
+# endif
+
+void libat_lock_1 (void *ptr);
+void libat_unlock_1 (void *ptr);
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  libat_lock_1 (ptr);
+  return 0;
+}
+
+static inline void
+protect_end (void *ptr, UWORD dummy UNUSED)
+{
+  libat_unlock_1 (ptr);
+}
+
+# define protect_start_end 1
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility pop
+# endif
+#endif /* protect_start_end */
+
+#include_next <host-config.h>
diff --git a/libatomic/config/darwin/lock.c b/libatomic/config/darwin/lock.c
new file mode 100644
index 0000000..f0fdc7c
--- /dev/null
+++ b/libatomic/config/darwin/lock.c
@@ -0,0 +1,248 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Iain Sandoe <iain@codesourcery.com>.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include "libatomic_i.h"
+
+#include <mach/mach_traps.h>
+#include <mach/thread_switch.h>
+
+/* For items that must be guarded by a lock, we use the following strategy:
+   If atomic support is available for a unit32_t we use that.
+   If not, we use the Darwin OSSpinLock implementation. */
+
+/* Algorithm motivations.
+
+Layout Assumptions:
+  o Darwin has a number of sub-targets with common atomic types that have no
+    'native' in-line handling, but are smaller than a cache-line.
+    E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
+
+  o The _Atomic alignment of a "natural type" is no greater than the type size.
+
+  o There are no special guarantees about the alignment of _Atomic aggregates
+    other than those determined by the psABI.
+
+  o There are no guarantees that placement of an entity won't cause it to
+    straddle a cache-line boundary.
+
+  o Realistic User code will likely place several _Atomic qualified types in
+    close proximity (such that they fall within the same cache-line).
+    Similarly, arrays of _Atomic qualified items.
+
+Performance Assumptions:
+  o Collisions of address hashes for items (which make up the lock keys)
+    constitute the largest performance issue.
+
+  o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
+    items are accessed.
+
+Implementation:
+  We maintain a table of locks, each lock being 4 bytes (at present). 
+  This choice of lock size gives some measure of equality in hash-collision
+  statistics between the 'atomic' and 'OSSpinLock' implementations, since the
+  lock size is fixed at 4 bytes for the latter.
+
+  The table occupies one physical page, and we attempt to align it to a
+  page boundary, appropriately.
+
+  For entities that need a lock, with sizes < one cache line:
+    Each entity that requires a lock, chooses the lock to use from the table on
+  the basis of a hash determined by its size and address.  The lower log2(size)
+  address bits are discarded on the assumption that the alignment of entities
+  will not be smaller than their size.
+  CHECKME: this is not verified for aggregates; it might be something that
+  could/should be enforced from the front ends (since _Atomic types are
+  allowed to have increased alignment c.f. 'normal').
+
+  For entities that need a lock, with sizes >= one cacheline_size:
+    We assume that the entity alignment >= log2(cacheline_size) and discard
+  log2(cacheline_size) bits from the address.
+  We then apply size/cacheline_size locks to cover the entity.
+
+  The idea is that this will typically result in distinct hash keys for items
+  placed close together.  The keys are mangled further such that the size is
+  included in the hash.
+
+  Finally, to attempt to make it such that the lock table entries are accessed
+  in a scattered manner,to avoid repeated cacheline flushes, the hash is
+  rearranged to attempt to maximise the most noise in the upper bits.
+*/
+
+/* The target page size.  Must be no larger than the runtime page size,
+   lest locking fail with virtual address aliasing (i.e. a page mmaped
+   at two locations).  */
+#ifndef PAGE_SIZE
+#  define PAGE_SIZE 4096
+#endif
+
+/* The target cacheline size.  */
+#ifndef CACHLINE_SIZE
+#  define CACHLINE_SIZE 64
+#endif
+
+/* The granularity at which locks are applied when n > CACHLINE_SIZE.
+   We follow the posix pthreads implementation here.  */
+#ifndef WATCH_SIZE
+#  define WATCH_SIZE CACHLINE_SIZE
+#endif
+
+/* This is a number the number of tries we will make to acquire the lock
+   before giving up our time-slice (on the basis that we are guarding small
+   sections of code here and, therefore if we don't acquire the lock quickly,
+   that implies that the current holder is not active).  */
+#define NSPINS 4
+
+#if HAVE_ATOMIC_EXCHANGE_4 && HAVE_ATOMIC_LDST_4
+
+#  define LOCK_TYPE volatile uint32_t
+
+/* So that we work with gcc-4.8 we don't try to use _Atomic.
+   If _Atomic(uint32_t) ever gets greater alignment than 4 we'll need to
+   revise this.  */
+
+inline static void LockUnlock(LOCK_TYPE *l)
+{
+  __atomic_store_4((LOCK_TYPE *)l, 0, __ATOMIC_RELEASE);
+}
+
+inline static void LockLock(LOCK_TYPE *l)
+{
+  uint32_t old = 0;
+  unsigned n = NSPINS;
+  while (!__atomic_compare_exchange_4((LOCK_TYPE *)l, &old,
+         1, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    {
+      old = 0;
+      if (--n == 0)
+	{
+	  /* Give up this time-slice, no hint to the scheduler about what to
+	     pick.
+	     TODO: maybe see if it's worth preserving some info about presence
+	     of waiting processes - to allow a similar "give up" time-slice
+	     scheme on the unlock end.  */
+	  thread_switch((mach_port_name_t)0, SWITCH_OPTION_NONE,
+			MACH_MSG_TIMEOUT_NONE);
+	  n = NSPINS;
+	}
+    }
+}
+
+#else
+
+#  include <libkern/OSAtomic.h>
+#  define LOCK_TYPE OSSpinLock
+
+#define LockUnlock(L) OSSpinLockUnlock(L)
+
+inline static void LockLock(LOCK_TYPE *l)
+{
+  while (!OSSpinLockTry(l)) 
+    {
+      unsigned n = NSPINS;
+      if (--n == 0)
+	{
+	  /* Give up this time-slice, no hint to the scheduler about what to
+	     pick. (see comment above) */
+	  thread_switch((mach_port_name_t)0, SWITCH_OPTION_NONE,
+			MACH_MSG_TIMEOUT_NONE);
+	  n = NSPINS;
+	}
+    }
+}
+
+#endif
+
+#define LOCK_SIZE sizeof(LOCK_TYPE)
+#define NLOCKS (PAGE_SIZE / LOCK_SIZE)
+/* An array of locks, that should fill one physical page.  */
+static LOCK_TYPE locks[NLOCKS] __attribute__((aligned(PAGE_SIZE)));
+
+/* A hash function that assumes that entities of a given size are at least
+   aligned to that size, and tries to minimise the probability that adjacent
+   objects will end up using the same cache line in the locks.  */
+static inline uintptr_t
+addr_hash (void *ptr, size_t n)
+{
+  if (n <= CACHLINE_SIZE && n > 0)
+    n = sizeof(unsigned int)*8 - __builtin_clz((unsigned int) n) -1;
+  else
+    n = 7;
+
+  uint16_t x = (((uintptr_t)ptr) >> n);
+  x ^= n;
+  x = ((x >> 8) & 0xff) | ((x << 8) & 0xff00);
+  return x % NLOCKS;
+}
+
+void
+libat_lock_1 (void *ptr)
+{
+  LockLock (&locks[addr_hash (ptr, 1)]);
+}
+
+void
+libat_unlock_1 (void *ptr)
+{
+  LockUnlock (&locks[addr_hash (ptr, 1)]);
+}
+
+void
+libat_lock_n (void *ptr, size_t n)
+{
+  uintptr_t h = addr_hash (ptr, n);
+
+  /* Don't lock more than all the locks we have.  */
+  if (n > PAGE_SIZE)
+    n = PAGE_SIZE;
+
+  size_t i = 0;
+  do
+    {
+      LockLock (&locks[h]);
+      if (++h == NLOCKS)
+	h = 0;
+      i += WATCH_SIZE;
+    }
+  while (i < n);
+}
+
+void
+libat_unlock_n (void *ptr, size_t n)
+{
+  uintptr_t h = addr_hash (ptr, n);
+
+  if (n > PAGE_SIZE)
+    n = PAGE_SIZE;
+
+  size_t i = 0;
+  do
+    {
+      LockUnlock (&locks[h]);
+      if (++h == NLOCKS)
+	h = 0;
+      i += WATCH_SIZE;
+    }
+  while (i < n);
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b0344d5..8283012 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -26,6 +26,16 @@
 # Map the target cpu to an ARCH sub-directory.  At the same time,
 # work out any special compilation flags as necessary.
 
+case "${target}" in
+  *-*-darwin*)
+    # Use the same default as GCC.
+    DEFAULT_X86_CPU=core2
+    ;;
+  *)
+    DEFAULT_X86_CPU=i486
+    ;;
+esac
+
 case "${target_cpu}" in
   alpha*)
 	# fenv.c needs this option to generate inexact exceptions.
@@ -67,7 +77,7 @@ case "${target_cpu}" in
 	    ;;
 	  *)
 	    if test -z "$with_arch"; then
-	      XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
+	      XCFLAGS="${XCFLAGS} -march=$DEFAULT_X86_CPU -mtune=${target_cpu}"
 	      XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
 	    fi
 	esac
@@ -78,7 +88,7 @@ case "${target_cpu}" in
   x86_64)
 	case " ${CC} ${CFLAGS} " in
 	  *" -m32 "*)
-	    XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+	    XCFLAGS="${XCFLAGS} -march=$DEFAULT_X86_CPU -mtune=generic"
 	    XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
 	    ;;
 	  *)
@@ -107,11 +117,16 @@ case "${target}" in
   *-*-linux* | *-*-gnu* | *-*-k*bsd*-gnu \
   | *-*-netbsd* | *-*-freebsd* | *-*-openbsd* \
   | *-*-solaris2* | *-*-sysv4* | *-*-irix6* | *-*-osf* | *-*-hpux11* \
-  | *-*-darwin* | *-*-aix* | *-*-cygwin*)
+  | *-*-aix* | *-*-cygwin*)
 	# POSIX system.  The OS is supported.
 	config_path="${config_path} posix"
 	;;
 
+  *-*-darwin*)
+	# Darwin system.  The OS is supported.
+	config_path="${config_path} darwin"
+	;;
+
   *-*-mingw*)
 	# OS support for atomic primitives.
         case ${target_thread_file} in

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-13 20:39   ` Iain Sandoe
@ 2014-11-14  8:12     ` Richard Henderson
  2014-11-14  8:25       ` Iain Sandoe
  2014-12-12 15:46     ` Jack Howarth
  2015-01-08 21:56     ` Jack Howarth
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2014-11-14  8:12 UTC (permalink / raw)
  To: Iain Sandoe, Joseph Myers; +Cc: gcc-patches Patches, Mike Stump

On 11/13/2014 09:34 PM, Iain Sandoe wrote:
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
> 
> The algorithm I've used is intentionally different from the pthreads-based posix one...

All that would be fine if ...

> +/* The granularity at which locks are applied when n > CACHLINE_SIZE.
> +   We follow the posix pthreads implementation here.  */
> +#ifndef WATCH_SIZE
> +#  define WATCH_SIZE CACHLINE_SIZE
> +#endif

... you hadn't just said right here that the granularity at which you
want to lock is WATCH_SIZE,

> +#define LOCK_SIZE sizeof(LOCK_TYPE)
> +#define NLOCKS (PAGE_SIZE / LOCK_SIZE)
> +/* An array of locks, that should fill one physical page.  */
> +static LOCK_TYPE locks[NLOCKS] __attribute__((aligned(PAGE_SIZE)));

... but then go and use LOCK_SIZE instead.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-14  8:12     ` Richard Henderson
@ 2014-11-14  8:25       ` Iain Sandoe
  2014-11-14  8:36         ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Iain Sandoe @ 2014-11-14  8:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Joseph Myers, gcc-patches Patches, Mike Stump

Hello Richard,

On 14 Nov 2014, at 08:01, Richard Henderson wrote:

> On 11/13/2014 09:34 PM, Iain Sandoe wrote:
>>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>>> target region that's at issue, not the size of the lock itself.
>> 
>> The algorithm I've used is intentionally different from the pthreads-based posix one...
> 
> All that would be fine if ...
> 
>> +/* The granularity at which locks are applied when n > CACHLINE_SIZE.
>> +   We follow the posix pthreads implementation here.  */
>> +#ifndef WATCH_SIZE
>> +#  define WATCH_SIZE CACHLINE_SIZE
>> +#endif
> 
> ... you hadn't just said right here that the granularity at which you
> want to lock is WATCH_SIZE,

That granularity *is* applied to items >= on cache line in size.

>> +#define LOCK_SIZE sizeof(LOCK_TYPE)
>> +#define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>> +/* An array of locks, that should fill one physical page.  */
>> +static LOCK_TYPE locks[NLOCKS] __attribute__((aligned(PAGE_SIZE)));
> 
> ... but then go and use LOCK_SIZE instead.

my locks are only 4 bytes [whereas they are rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix implementation].
The items that they are locking are of arbitrary size (at least up to one page).

hmmm .. there's something I'm not following about what you are seeing as a problem here.

In the posix implementation the granularity calculation is also used to round up the space allocated in the locks table for each pthreads mutex (i.e. it has two uses, AFAICT).

thanks
Iain

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-14  8:25       ` Iain Sandoe
@ 2014-11-14  8:36         ` Richard Henderson
  2014-11-14  8:51           ` Iain Sandoe
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2014-11-14  8:36 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Joseph Myers, gcc-patches Patches, Mike Stump

On 11/14/2014 09:12 AM, Iain Sandoe wrote:
> my locks are only 4 bytes [whereas they are rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix implementation].
> The items that they are locking are of arbitrary size (at least up to one page).
> 
> hmmm .. there's something I'm not following about what you are seeing as a problem here.
> 
> In the posix implementation the granularity calculation is also used to
> round up the space allocated in the locks table for each pthreads mutex
> (i.e. it has two uses, AFAICT).

No, there's only one use: How large an area is *protected* by the lock.

Since we need to protect one page of these areas, we need NLOCKS = PAGE_SIZE /
WATCH_SIZE locks, which are then allocated in an array.  We do not care how
large that array is.

So if you'd like to differ from the posix implementation in protecting
4 bytes at a time, rather than one cacheline at a time, then just change
WATCH_SIZE to 4.  The fact that WATCH_SIZE happens to equal to the lock size is
simply a coincidence.

r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-14  8:36         ` Richard Henderson
@ 2014-11-14  8:51           ` Iain Sandoe
  2014-11-14 10:34             ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Iain Sandoe @ 2014-11-14  8:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Joseph Myers, gcc-patches Patches, Mike Stump


On 14 Nov 2014, at 08:25, Richard Henderson wrote:

> On 11/14/2014 09:12 AM, Iain Sandoe wrote:
>> my locks are only 4 bytes [whereas they are rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix implementation].
>> The items that they are locking are of arbitrary size (at least up to one page).
>> 
>> hmmm .. there's something I'm not following about what you are seeing as a problem here.
>> 
>> In the posix implementation the granularity calculation is also used to
>> round up the space allocated in the locks table for each pthreads mutex
>> (i.e. it has two uses, AFAICT).
> 
> No, there's only one use: How large an area is *protected* by the lock.
> 
> Since we need to protect one page of these areas, we need NLOCKS = PAGE_SIZE /
> WATCH_SIZE locks, which are then allocated in an array.  We do not care how
> large that array is.
> 
> So if you'd like to differ from the posix implementation in protecting
> 4 bytes at a time, rather than one cacheline at a time, then just change
> WATCH_SIZE to 4.  The fact that WATCH_SIZE happens to equal to the lock size is
> simply a coincidence.

Indeed (the use to round up the space allocated for the mutex also happens to be another co-incidence)

However, my intention is to have a variable-sized area protected by each locks.
The nummber of locks allocated exceeds (page-size/watch-size) [unless watch-sizes was reduced to 4bytes, of course]

Only when the size of the area to be protected exceeds one cache-line do I split it up into cache-line-sized chunks.

I happened to allocate one-page-worth of locks (as a somewhat arbitrary choice in the absence of metrics to guide otherwise) - which is another source of co-incidence.

Perhaps some re-naming of things would help, or do you think that a scheme to lock variable-sized chunks cannot work?

Iain

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-14  8:51           ` Iain Sandoe
@ 2014-11-14 10:34             ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2014-11-14 10:34 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Joseph Myers, gcc-patches Patches, Mike Stump

On 11/14/2014 09:44 AM, Iain Sandoe wrote:
> or do you think that a scheme to lock variable-sized chunks cannot work?

It certainly can't work while

> +void
> +libat_lock_1 (void *ptr)
> +{
> +  LockLock (&locks[addr_hash (ptr, 1)]);
> +}

doesn't have the true size.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-13 20:39   ` Iain Sandoe
  2014-11-14  8:12     ` Richard Henderson
@ 2014-12-12 15:46     ` Jack Howarth
  2015-01-08 21:56     ` Jack Howarth
  2 siblings, 0 replies; 12+ messages in thread
From: Jack Howarth @ 2014-12-12 15:46 UTC (permalink / raw)
  To: Iain Sandoe
  Cc: Richard Henderson, Joseph Myers, gcc-patches Patches, Mike Stump

Iain,
    What is the status of this patch?
            Jack

On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <iain@codesourcery.com> wrote:
> Hello Richard, Joseph,
>
> Thanks for your reviews,
>
> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>
>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>
>>> #  ifndef USE_ATOMIC
>>> #    define USE_ATOMIC 1
>>> #  endif
>>
>> Why would USE_ATOMIC be defined previously?
>
> This was left-over from a mode where I allowed the User to jam the mode to OSSpinLocks to test the performance.
>
> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my trunk version of the patch had got behind my 4.9 one. (most of the work has been hammered out there while we try to get bootstrap restored).
>
> re-synced and retested with a patched trunk that bootstraps with some band-aid.
>
>>> inline static void LockUnlock(uint32_t *l) {
>>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>> }
>>
>> Gnu coding style, please.  All through the file here.
>
> Fixed.
>
>> #  define LOCK_SIZE sizeof(uint32_t)
>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>> static uint32_t locks[NLOCKS];
>>
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
>
> The algorithm I've used is intentionally different from the pthreads-based posix one, here's the rationale, as I see it:
>
> /* Algorithm motivations.
>
> Layout Assumptions:
>   o Darwin has a number of sub-targets with common atomic types that have no
>     'native' in-line handling, but are smaller than a cache-line.
>     E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>
>   o The _Atomic alignment of a "natural type" is no greater than the type size.
>
>   o There are no special guarantees about the alignment of _Atomic aggregates
>     other than those determined by the psABI.
>
>   o There are no guarantees that placement of an entity won't cause it to
>     straddle a cache-line boundary.
>
>   o Realistic User code will likely place several _Atomic qualified types in
>     close proximity (such that they fall within the same cache-line).
>     Similarly, arrays of _Atomic qualified items.
>
> Performance Assumptions:
>   o Collisions of address hashes for items (which make up the lock keys)
>     constitute the largest performance issue.
>
>   o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>     items are accessed.
>
> Implementation:
>   We maintain a table of locks, each lock being 4 bytes (at present).
>   This choice of lock size gives some measure of equality in hash-collision
>   statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>   lock size is fixed at 4 bytes for the latter.
>
>   The table occupies one physical page, and we attempt to align it to a
>   page boundary, appropriately.
>
>   For entities that need a lock, with sizes < one cache line:
>     Each entity that requires a lock, chooses the lock to use from the table on
>   the basis of a hash determined by its size and address.  The lower log2(size)
>   address bits are discarded on the assumption that the alignment of entities
>   will not be smaller than their size.
>   CHECKME: this is not verified for aggregates; it might be something that
>   could/should be enforced from the front ends (since _Atomic types are
>   allowed to have increased alignment c.f. 'normal').
>
>   For entities that need a lock, with sizes >= one cacheline_size:
>     We assume that the entity alignment >= log2(cacheline_size) and discard
>   log2(cacheline_size) bits from the address.
>   We then apply size/cacheline_size locks to cover the entity.
>
>   The idea is that this will typically result in distinct hash keys for items
>   placed close together.  The keys are mangled further such that the size is
>   included in the hash.
>
>   Finally, to attempt to make it such that the lock table entries are accessed
>   in a scattered manner,to avoid repeated cacheline flushes, the hash is
>   rearranged to attempt to maximise the most noise in the upper bits.
> */
>
> NOTE that the CHECKME above doesn't put us in any worse position than the pthreads implementation (likely slightly better since we have a smaller granularity with the current scheme).
>
>>> #if USE_ATOMIC
>>>   LockLock (&locks[addr_hash (ptr, 1)]);
>>> #else
>>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>> #endif
>>
>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>> avoid the ifdefs here.
> done.
>
> Thoughts on the rationale - or OK now?
>
> thanks
> Iain
>
> I'm not aware of any other PRs that relate, but will do a final scan through and ask around the darwin folks.
>
> libatomic:
>
>         PR target/59305
>         * config/darwin/host-config.h New.
>         * config/darwin/lock.c New.
>         * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2014-11-13 20:39   ` Iain Sandoe
  2014-11-14  8:12     ` Richard Henderson
  2014-12-12 15:46     ` Jack Howarth
@ 2015-01-08 21:56     ` Jack Howarth
  2015-01-08 22:03       ` Iain Sandoe
  2 siblings, 1 reply; 12+ messages in thread
From: Jack Howarth @ 2015-01-08 21:56 UTC (permalink / raw)
  To: Iain Sandoe
  Cc: Richard Henderson, Joseph Myers, gcc-patches Patches, Mike Stump

Iain,
     Were you planning to try to get this committed before stage4 or
will it have to wait for the next major gcc release?
                  Jack

On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <iain@codesourcery.com> wrote:
> Hello Richard, Joseph,
>
> Thanks for your reviews,
>
> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>
>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>
>>> #  ifndef USE_ATOMIC
>>> #    define USE_ATOMIC 1
>>> #  endif
>>
>> Why would USE_ATOMIC be defined previously?
>
> This was left-over from a mode where I allowed the User to jam the mode to OSSpinLocks to test the performance.
>
> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my trunk version of the patch had got behind my 4.9 one. (most of the work has been hammered out there while we try to get bootstrap restored).
>
> re-synced and retested with a patched trunk that bootstraps with some band-aid.
>
>>> inline static void LockUnlock(uint32_t *l) {
>>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>> }
>>
>> Gnu coding style, please.  All through the file here.
>
> Fixed.
>
>> #  define LOCK_SIZE sizeof(uint32_t)
>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>> static uint32_t locks[NLOCKS];
>>
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
>
> The algorithm I've used is intentionally different from the pthreads-based posix one, here's the rationale, as I see it:
>
> /* Algorithm motivations.
>
> Layout Assumptions:
>   o Darwin has a number of sub-targets with common atomic types that have no
>     'native' in-line handling, but are smaller than a cache-line.
>     E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>
>   o The _Atomic alignment of a "natural type" is no greater than the type size.
>
>   o There are no special guarantees about the alignment of _Atomic aggregates
>     other than those determined by the psABI.
>
>   o There are no guarantees that placement of an entity won't cause it to
>     straddle a cache-line boundary.
>
>   o Realistic User code will likely place several _Atomic qualified types in
>     close proximity (such that they fall within the same cache-line).
>     Similarly, arrays of _Atomic qualified items.
>
> Performance Assumptions:
>   o Collisions of address hashes for items (which make up the lock keys)
>     constitute the largest performance issue.
>
>   o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>     items are accessed.
>
> Implementation:
>   We maintain a table of locks, each lock being 4 bytes (at present).
>   This choice of lock size gives some measure of equality in hash-collision
>   statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>   lock size is fixed at 4 bytes for the latter.
>
>   The table occupies one physical page, and we attempt to align it to a
>   page boundary, appropriately.
>
>   For entities that need a lock, with sizes < one cache line:
>     Each entity that requires a lock, chooses the lock to use from the table on
>   the basis of a hash determined by its size and address.  The lower log2(size)
>   address bits are discarded on the assumption that the alignment of entities
>   will not be smaller than their size.
>   CHECKME: this is not verified for aggregates; it might be something that
>   could/should be enforced from the front ends (since _Atomic types are
>   allowed to have increased alignment c.f. 'normal').
>
>   For entities that need a lock, with sizes >= one cacheline_size:
>     We assume that the entity alignment >= log2(cacheline_size) and discard
>   log2(cacheline_size) bits from the address.
>   We then apply size/cacheline_size locks to cover the entity.
>
>   The idea is that this will typically result in distinct hash keys for items
>   placed close together.  The keys are mangled further such that the size is
>   included in the hash.
>
>   Finally, to attempt to make it such that the lock table entries are accessed
>   in a scattered manner,to avoid repeated cacheline flushes, the hash is
>   rearranged to attempt to maximise the most noise in the upper bits.
> */
>
> NOTE that the CHECKME above doesn't put us in any worse position than the pthreads implementation (likely slightly better since we have a smaller granularity with the current scheme).
>
>>> #if USE_ATOMIC
>>>   LockLock (&locks[addr_hash (ptr, 1)]);
>>> #else
>>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>> #endif
>>
>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>> avoid the ifdefs here.
> done.
>
> Thoughts on the rationale - or OK now?
>
> thanks
> Iain
>
> I'm not aware of any other PRs that relate, but will do a final scan through and ask around the darwin folks.
>
> libatomic:
>
>         PR target/59305
>         * config/darwin/host-config.h New.
>         * config/darwin/lock.c New.
>         * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.
  2015-01-08 21:56     ` Jack Howarth
@ 2015-01-08 22:03       ` Iain Sandoe
  0 siblings, 0 replies; 12+ messages in thread
From: Iain Sandoe @ 2015-01-08 22:03 UTC (permalink / raw)
  To: Jack Howarth
  Cc: Richard Henderson, Joseph Myers, gcc-patches Patches, Mike Stump

Jack,

I haven't had time to look at the outstanding issues, and won't realistically before stage3 ends - so I don't suppose it will be eligible for GCC5 :(

Iain

On 8 Jan 2015, at 21:56, Jack Howarth wrote:

> Iain,
>     Were you planning to try to get this committed before stage4 or
> will it have to wait for the next major gcc release?
>                  Jack
> 
> On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <iain@codesourcery.com> wrote:
>> Hello Richard, Joseph,
>> 
>> Thanks for your reviews,
>> 
>> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>> 
>>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>> 
>>>> #  ifndef USE_ATOMIC
>>>> #    define USE_ATOMIC 1
>>>> #  endif
>>> 
>>> Why would USE_ATOMIC be defined previously?
>> 
>> This was left-over from a mode where I allowed the User to jam the mode to OSSpinLocks to test the performance.
>> 
>> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my trunk version of the patch had got behind my 4.9 one. (most of the work has been hammered out there while we try to get bootstrap restored).
>> 
>> re-synced and retested with a patched trunk that bootstraps with some band-aid.
>> 
>>>> inline static void LockUnlock(uint32_t *l) {
>>>>  __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>>> }
>>> 
>>> Gnu coding style, please.  All through the file here.
>> 
>> Fixed.
>> 
>>> #  define LOCK_SIZE sizeof(uint32_t)
>>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>>> static uint32_t locks[NLOCKS];
>>> 
>>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>>> target region that's at issue, not the size of the lock itself.
>> 
>> The algorithm I've used is intentionally different from the pthreads-based posix one, here's the rationale, as I see it:
>> 
>> /* Algorithm motivations.
>> 
>> Layout Assumptions:
>>  o Darwin has a number of sub-targets with common atomic types that have no
>>    'native' in-line handling, but are smaller than a cache-line.
>>    E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>> 
>>  o The _Atomic alignment of a "natural type" is no greater than the type size.
>> 
>>  o There are no special guarantees about the alignment of _Atomic aggregates
>>    other than those determined by the psABI.
>> 
>>  o There are no guarantees that placement of an entity won't cause it to
>>    straddle a cache-line boundary.
>> 
>>  o Realistic User code will likely place several _Atomic qualified types in
>>    close proximity (such that they fall within the same cache-line).
>>    Similarly, arrays of _Atomic qualified items.
>> 
>> Performance Assumptions:
>>  o Collisions of address hashes for items (which make up the lock keys)
>>    constitute the largest performance issue.
>> 
>>  o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>>    items are accessed.
>> 
>> Implementation:
>>  We maintain a table of locks, each lock being 4 bytes (at present).
>>  This choice of lock size gives some measure of equality in hash-collision
>>  statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>>  lock size is fixed at 4 bytes for the latter.
>> 
>>  The table occupies one physical page, and we attempt to align it to a
>>  page boundary, appropriately.
>> 
>>  For entities that need a lock, with sizes < one cache line:
>>    Each entity that requires a lock, chooses the lock to use from the table on
>>  the basis of a hash determined by its size and address.  The lower log2(size)
>>  address bits are discarded on the assumption that the alignment of entities
>>  will not be smaller than their size.
>>  CHECKME: this is not verified for aggregates; it might be something that
>>  could/should be enforced from the front ends (since _Atomic types are
>>  allowed to have increased alignment c.f. 'normal').
>> 
>>  For entities that need a lock, with sizes >= one cacheline_size:
>>    We assume that the entity alignment >= log2(cacheline_size) and discard
>>  log2(cacheline_size) bits from the address.
>>  We then apply size/cacheline_size locks to cover the entity.
>> 
>>  The idea is that this will typically result in distinct hash keys for items
>>  placed close together.  The keys are mangled further such that the size is
>>  included in the hash.
>> 
>>  Finally, to attempt to make it such that the lock table entries are accessed
>>  in a scattered manner,to avoid repeated cacheline flushes, the hash is
>>  rearranged to attempt to maximise the most noise in the upper bits.
>> */
>> 
>> NOTE that the CHECKME above doesn't put us in any worse position than the pthreads implementation (likely slightly better since we have a smaller granularity with the current scheme).
>> 
>>>> #if USE_ATOMIC
>>>>  LockLock (&locks[addr_hash (ptr, 1)]);
>>>> #else
>>>>  OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>>> #endif
>>> 
>>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>>> avoid the ifdefs here.
>> done.
>> 
>> Thoughts on the rationale - or OK now?
>> 
>> thanks
>> Iain
>> 
>> I'm not aware of any other PRs that relate, but will do a final scan through and ask around the darwin folks.
>> 
>> libatomic:
>> 
>>        PR target/59305
>>        * config/darwin/host-config.h New.
>>        * config/darwin/lock.c New.
>>        * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.
>> 
>> 
>> 
>> 
>> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-01-08 22:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 21:24 [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin* Iain Sandoe
2014-11-12 22:34 ` Joseph Myers
2014-11-13  7:41 ` Richard Henderson
2014-11-13 20:39   ` Iain Sandoe
2014-11-14  8:12     ` Richard Henderson
2014-11-14  8:25       ` Iain Sandoe
2014-11-14  8:36         ` Richard Henderson
2014-11-14  8:51           ` Iain Sandoe
2014-11-14 10:34             ` Richard Henderson
2014-12-12 15:46     ` Jack Howarth
2015-01-08 21:56     ` Jack Howarth
2015-01-08 22:03       ` Iain Sandoe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).