public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Help needed for glibc software transaction memory algorithm
@ 2022-01-04 14:12 Florian Weimer
  2022-01-04 22:14 ` Szabolcs Nagy
  2022-01-05 13:07 ` Tulio Magno Quites Machado Filho
  0 siblings, 2 replies; 4+ messages in thread
From: Florian Weimer @ 2022-01-04 14:12 UTC (permalink / raw)
  To: libc-alpha; +Cc: Tulio Magno Quites Machado Filho, Paul E. McKenney, gcc-help

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

I've tried to implement a software transactional memory algorithm

The characteristics are:

* Single writer, multiple readers.

* Two copies of the data.

* A 64-bit version counter tracks modifications.  The lowest bit of the
  counter tells the reader which copy of the data to read.

* The writer increments the counter by 2, modifies the STM-protected
  data, and increments the counter by 1.

* The reader loads the counter, reads the STM-protected data, loads the
  counter for a second time, and retries if the counter does not match.

I've attached a model implementation.  The glibc implementation has a
wrapper around the counter updates to support 32-bit implementations as
well.  In both implementations, the writer uses release MO stores for
the version update, and the reader uses acquire MO loads.  The stores
and loads of the STM data itself are unordered (not even volatile).

It turns out that this does not work: In the reader, loads of the
STM-protected data can be reordered past the final acquire MO load of
the STM version.  As a result, the reader can observe incoherent data.
In practice, I only saw this on powerpc64le, where the *compiler*
performed the reordering:

  _dl_find_object miscompilation on powerpc64le
  <https://sourceware.org/bugzilla/show_bug.cgi?id=28745>

Emprically, my stm.c model does not exhibit this behavior.

To fix this, I think it is sufficient to add a release fence just before
the second version load in the reader.  However, from a C memory model
perspective, I don't quite see what this fence would synchronize
with. 8-/  And of course once there is one concurrency bug, there might
be others as well.  Do I need to change the writer to use
acquire-release MO for the version updates?

I think there should be a canned recipe for this scenario (single
writer, two data copies), but I couldn't find one.

Thanks,
Florian

[-- Attachment #2: stm.c --]
[-- Type: text/plain, Size: 5778 bytes --]

#include <err.h>
#include <errno.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void
check_pthread (const char *func, int ret)
{
  if (ret != 0)
    {
      errno = ret;
      err (1, "%s", func);
    }
}

/* Parallel arrays of keys and values.  */
enum { key_count = 1024 };
struct storage
{
  unsigned long int keys[key_count];
  unsigned long int values[key_count];
};

/* Use a fixed offset from keys to values, to enable easy
   checking.  */
enum { value_offset = 0x12345 };

/* stm_version & 1 is the index into storage_versions.  */
static unsigned long int stm_version;
struct storage storage_versions[2];

enum mismatch_kind
  {
    none,                       /* No consistency problem detected.  */
    key_range,                  /* Key is >= key_count.  */
    duplicate_key,              /* The same key is present multiple times.  */
    wrong_value,                /* A key has an unexpected value. */
    missing_key,                /* A key is missing (not a permuation).  */
  };

/* These counters are just for statistics.  */
static unsigned long int verifier_read_success;
static unsigned long int verifier_read_failure;
static unsigned long int mismatch_counters[missing_key + 1];

/* Thread that continously reads the TM storage and checks for
   consistent read results.  */
static void *
verifier_thread (void *closure)
{
  while (true)
    {
      /* Start of the transactional read region.  */
      unsigned long int start_version
        = __atomic_load_n (&stm_version, __ATOMIC_ACQUIRE);

      bool seen[key_count] = { 0 };
      struct storage *st = &storage_versions[start_version & 1];
      enum mismatch_kind mismatch = none;

      for (int i = 0; i < key_count; ++i)
        {
          unsigned long int key
            = __atomic_load_n (&st->keys[i], __ATOMIC_RELAXED);

          if (key >= key_count)
            {
              mismatch = key_range;
              break;
            }

          if (seen[key])
            {
              mismatch = duplicate_key;
              break;
            }
          seen[key] = true;

           if (key + value_offset
               != __atomic_load_n (&st->values[i], __ATOMIC_RELAXED))
             {
               mismatch = wrong_value;
               break;
             }
        }

      if (mismatch == none)
        for (int i = 0; i < key_count; ++i)
          if (!seen[i])
            mismatch = missing_key;

      /* End of the transaction read region.  */
      unsigned long int end_version
        = __atomic_load_n (&stm_version, __ATOMIC_ACQUIRE);
      if (start_version == end_version)
        {
          /* If the version is unchanged, the read was successful.
             Any inconsistency is an algorithmic failure.  */
          __atomic_fetch_add (&verifier_read_success, 1, __ATOMIC_RELAXED);
          if (mismatch != none)
            errx (2, "mismatch: %d", (int) mismatch);
        }
      else
        __atomic_fetch_add (&verifier_read_failure, 1, __ATOMIC_RELAXED);
      __atomic_fetch_add (&mismatch_counters[mismatch], 1, __ATOMIC_RELAXED);
    }
}

static void *
reporter_thread (void *closure)
{
  while (true)
    {
      usleep (1000 * 1000);
      printf ("stm_version=%lu success=%lu failure=%lu\n"
              "  range=%lu duplicate=%lu value=%lu missing=%lu\n",
              __atomic_load_n (&stm_version, __ATOMIC_RELAXED),
              __atomic_load_n (&verifier_read_success, __ATOMIC_RELAXED),
              __atomic_load_n (&verifier_read_failure, __ATOMIC_RELAXED),
              __atomic_load_n (&mismatch_counters[key_range],
                               __ATOMIC_RELAXED),
              __atomic_load_n (&mismatch_counters[duplicate_key],
                               __ATOMIC_RELAXED),
              __atomic_load_n (&mismatch_counters[wrong_value],
                               __ATOMIC_RELAXED),
              __atomic_load_n (&mismatch_counters[missing_key],
                               __ATOMIC_RELAXED));
    }
  return NULL;
}

struct drand48_data rand_state;

/* Write a random permutation to *ST.  This simulates a workload that
   produces inconsistency during updates.  */
static void
update_storage (struct storage *st)
{
  /* Fisher-Yates shuffle.  Very slight bias due to %.  */
  unsigned long int permutation[key_count];
  for (int i = 0; i < key_count; ++i)
    permutation[i] = i;
  for (int i = 0; i < key_count - 1; ++i)
    {
      long int r;
      lrand48_r (&rand_state, &r);
      int j = i + (r % (key_count - i));
      unsigned long int tmp = permutation[i];
      permutation[i] = permutation[j];
      permutation[j] = tmp;
    }

  for (int i = 0; i < key_count; ++i)
    {
      st->keys[i] = permutation[i];
      st->values[i] = permutation[i] + value_offset;
    }
}

int
main (int argc, char **argv)
{
  int thread_count;
  if (argc != 2 || (thread_count = atoi (argv[1])) <= 0)
    {
      fprintf (stderr, "usage: %s THREAD-COUNT\n", argv[0]);
      return 1;
    }

  srand48_r (1, &rand_state);
  update_storage (&storage_versions[0]);

  for (int i = 0; i < thread_count; ++i)
    {
      pthread_t thr;
      check_pthread ("pthread_create",
                     pthread_create (&thr, NULL, verifier_thread, NULL));
    }

  {
    pthread_t thr;
    check_pthread ("pthread_create",
                   pthread_create (&thr, NULL, reporter_thread, NULL));
  }

  while (true)
    {
      /* Start new STM round, but do not switch versions yet.  */
      unsigned long int ver
        = __atomic_fetch_add (&stm_version, 2, __ATOMIC_RELEASE);
      update_storage (&storage_versions[(ver + 1) & 1]);
      /* Commit and switch versions. */
      __atomic_fetch_add (&stm_version, 1, __ATOMIC_RELEASE);
    }
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help needed for glibc software transaction memory algorithm
  2022-01-04 14:12 Help needed for glibc software transaction memory algorithm Florian Weimer
@ 2022-01-04 22:14 ` Szabolcs Nagy
  2022-01-05 11:38   ` Florian Weimer
  2022-01-05 13:07 ` Tulio Magno Quites Machado Filho
  1 sibling, 1 reply; 4+ messages in thread
From: Szabolcs Nagy @ 2022-01-04 22:14 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, gcc-help, Paul E. McKenney, Tulio Magno Quites Machado Filho

The 01/04/2022 15:12, Florian Weimer via Libc-alpha wrote:
> I've tried to implement a software transactional memory algorithm
> 
> The characteristics are:
> 
> * Single writer, multiple readers.
> 
> * Two copies of the data.
> 
> * A 64-bit version counter tracks modifications.  The lowest bit of the
>   counter tells the reader which copy of the data to read.
> 
> * The writer increments the counter by 2, modifies the STM-protected
>   data, and increments the counter by 1.
> 
> * The reader loads the counter, reads the STM-protected data, loads the
>   counter for a second time, and retries if the counter does not match.
> 
> I've attached a model implementation.  The glibc implementation has a
> wrapper around the counter updates to support 32-bit implementations as
> well.  In both implementations, the writer uses release MO stores for
> the version update, and the reader uses acquire MO loads.  The stores
> and loads of the STM data itself are unordered (not even volatile).
> 
> It turns out that this does not work: In the reader, loads of the
> STM-protected data can be reordered past the final acquire MO load of
> the STM version.  As a result, the reader can observe incoherent data.
> In practice, I only saw this on powerpc64le, where the *compiler*
> performed the reordering:
> 
>   _dl_find_object miscompilation on powerpc64le
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=28745>
> 
> Emprically, my stm.c model does not exhibit this behavior.
> 
> To fix this, I think it is sufficient to add a release fence just before
> the second version load in the reader.  However, from a C memory model
> perspective, I don't quite see what this fence would synchronize
> with. 8-/  And of course once there is one concurrency bug, there might
> be others as well.  Do I need to change the writer to use
> acquire-release MO for the version updates?
> 
> I think there should be a canned recipe for this scenario (single
> writer, two data copies), but I couldn't find one.

this reminds me the discussion in section 4 and 5 of

https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf

there are no 2 data copies here but i think the reasoning
about the synchronization may apply.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help needed for glibc software transaction memory algorithm
  2022-01-04 22:14 ` Szabolcs Nagy
@ 2022-01-05 11:38   ` Florian Weimer
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2022-01-05 11:38 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: libc-alpha, gcc-help, Paul E. McKenney, Tulio Magno Quites Machado Filho

* Szabolcs Nagy:

> this reminds me the discussion in section 4 and 5 of
>
> https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf
>
> there are no 2 data copies here but i think the reasoning
> about the synchronization may apply.

Yes, I'll reference that and put in the acquire fence.  Thanks.

Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help needed for glibc software transaction memory algorithm
  2022-01-04 14:12 Help needed for glibc software transaction memory algorithm Florian Weimer
  2022-01-04 22:14 ` Szabolcs Nagy
@ 2022-01-05 13:07 ` Tulio Magno Quites Machado Filho
  1 sibling, 0 replies; 4+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2022-01-05 13:07 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha; +Cc: gcc-help, Paul E. McKenney

Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> writes:

> To fix this, I think it is sufficient to add a release fence just before
> the second version load in the reader.  However, from a C memory model
> perspective, I don't quite see what this fence would synchronize
> with.

AFAIU, it would synchronize with the previous relaxed load as if it were a
release operation.  Quoting N2731:

    A release fence A synchronizes with an atomic operation B that performs an
    acquire operation on an atomic object M if there exists an atomic operation
    X such that *X is sequenced before A*, X modifies M , and B reads the value
    written by X or a value written by any side effect in the hypothetical
    release sequence X would head if it were a release operation.

Where:

 - X is the read of the STM-protected data;
 - B is the load of the counter for a second time;

Notice that I fixed a typo in the original text that says "A is sequenced before
X".  Which is impossible because we would end up with:

    Fence A
    Operation X
    Operation B

Source: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf

> And of course once there is one concurrency bug, there might
> be others as well.  Do I need to change the writer to use
> acquire-release MO for the version updates?

I don't think this is necessary.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-05 13:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04 14:12 Help needed for glibc software transaction memory algorithm Florian Weimer
2022-01-04 22:14 ` Szabolcs Nagy
2022-01-05 11:38   ` Florian Weimer
2022-01-05 13:07 ` Tulio Magno Quites Machado Filho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).