public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] elf: Simplify software TM implementation in _dl_find_object
@ 2022-01-10 10:10 Florian Weimer
  2022-01-10 12:29 ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-01-10 10:10 UTC (permalink / raw)
  To: libc-alpha; +Cc: Szabolcs Nagy

With the current set of fences, the version update at the start
of the TM write operation is redundant, and the version update
at the end does not need to use an atomic read-modify-write
operation.

Also use relaxed MO stores during the dlclose update, and skip any
version changes there.

Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

---
v2: Use __atomic_wide_counter_add_relaxed.

This has been running over the weekend without any failures.  The
reasoning for only one version update (it becomes visible before
subsequent updates to the data that has been read) makes sense to me.

 elf/dl-find_object.c | 56 +++++++++++++++++++---------------------------------
 1 file changed, 20 insertions(+), 36 deletions(-)

diff --git a/elf/dl-find_object.c b/elf/dl-find_object.c
index 0eb8607edd..2b8df2fd67 100644
--- a/elf/dl-find_object.c
+++ b/elf/dl-find_object.c
@@ -123,9 +123,9 @@ struct dlfo_mappings_segment
 
 /* To achieve async-signal-safety, two copies of the data structure
    are used, so that a signal handler can still use this data even if
-   dlopen or dlclose modify the other copy.  The the MSB in
-   _dlfo_loaded_mappings_version determines which array element is the
-   currently active region.  */
+   dlopen or dlclose modify the other copy.  The the least significant
+   bit in _dlfo_loaded_mappings_version determines which array element
+   is the currently active region.  */
 static struct dlfo_mappings_segment *_dlfo_loaded_mappings[2];
 
 /* Returns the number of actually used elements in all segments
@@ -248,7 +248,7 @@ _dlfo_read_start_version (void)
 {
   /* Acquire MO load synchronizes with the fences at the beginning and
      end of the TM update region in _dlfo_mappings_begin_update,
-     _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch.  */
+     _dlfo_mappings_end_update.  */
   return __atomic_wide_counter_load_acquire (&_dlfo_loaded_mappings_version);
 }
 
@@ -261,35 +261,25 @@ _dlfo_read_version_locked (void)
 }
 
 /* Update the version to reflect that an update is happening.  This
-   does not change the bit that controls the active segment chain.
-   Returns the index of the currently active segment chain.  */
-static inline unsigned int
+   does not change the bit that controls the active segment chain.  */
+static inline void
 _dlfo_mappings_begin_update (void)
 {
-  /* The store synchronizes with loads in _dlfo_read_start_version
+  /* The fence synchronizes with loads in _dlfo_read_start_version
      (also called from _dlfo_read_success).  */
   atomic_thread_fence_release ();
-  return __atomic_wide_counter_fetch_add_relaxed
-    (&_dlfo_loaded_mappings_version, 2);
 }
 
 /* Installs the just-updated version as the active version.  */
 static inline void
 _dlfo_mappings_end_update (void)
 {
-  /* The store synchronizes with loads in _dlfo_read_start_version
+  /* The fence synchronizes with loads in _dlfo_read_start_version
      (also called from _dlfo_read_success).  */
   atomic_thread_fence_release ();
-  __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
-}
-/* Completes an in-place update without switching versions.  */
-static inline void
-_dlfo_mappings_end_update_no_switch (void)
-{
-  /* The store synchronizes with loads in _dlfo_read_start_version
-     (also called from _dlfo_read_success).  */
-  atomic_thread_fence_release ();
-  __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 2);
+  /* No atomic read-modify-write update needed because of the loader
+     lock.  */
+  __atomic_wide_counter_add_relaxed (&_dlfo_loaded_mappings_version, 1);
 }
 
 /* Return true if the read was successful, given the start
@@ -302,10 +292,11 @@ _dlfo_read_success (uint64_t start_version)
      the TM region are not ordered past the version check below.  */
   atomic_thread_fence_acquire ();
 
-  /* Synchronizes with stores in _dlfo_mappings_begin_update,
-     _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch.
-     It is important that all stores from the last update have been
-     visible, and stores from the next updates are not.
+  /* Synchronizes with the fences in _dlfo_mappings_begin_update,
+     _dlfo_mappings_end_update.  It is important that all stores from
+     the last update have become visible, and stores from the next
+     update to this version are not before the version number is
+     updated.
 
      Unlike with seqlocks, there is no check for odd versions here
      because we have read the unmodified copy (confirmed to be
@@ -839,17 +830,10 @@ _dl_find_object_dlclose (struct link_map *map)
              issues around __libc_freeres.  */
             return;
 
-        /* The update happens in-place, but given that we do not use
-           atomic accesses on the read side, update the version around
-           the update to trigger re-validation in concurrent
-           readers.  */
-        _dlfo_mappings_begin_update ();
-
-        /* Mark as closed.  */
-        obj->map_end = obj->map_start;
-        obj->map = NULL;
-
-        _dlfo_mappings_end_update_no_switch ();
+        /* Mark as closed.  This does not change the overall data
+           structure, so no TM cycle is needed.  */
+        atomic_store_relaxed (&obj->map_end, obj->map_start);
+        atomic_store_relaxed (&obj->map, NULL);
         return;
       }
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] elf: Simplify software TM implementation in _dl_find_object
  2022-01-10 10:10 [PATCH] elf: Simplify software TM implementation in _dl_find_object Florian Weimer
@ 2022-01-10 12:29 ` Szabolcs Nagy
  0 siblings, 0 replies; 7+ messages in thread
From: Szabolcs Nagy @ 2022-01-10 12:29 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 01/10/2022 11:10, Florian Weimer wrote:
> With the current set of fences, the version update at the start
> of the TM write operation is redundant, and the version update
> at the end does not need to use an atomic read-modify-write
> operation.
> 
> Also use relaxed MO stores during the dlclose update, and skip any
> version changes there.
> 
> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> ---
> v2: Use __atomic_wide_counter_add_relaxed.
> 
> This has been running over the weekend without any failures.  The
> reasoning for only one version update (it becomes visible before
> subsequent updates to the data that has been read) makes sense to me.
> 
>  elf/dl-find_object.c | 56 +++++++++++++++++++---------------------------------
>  1 file changed, 20 insertions(+), 36 deletions(-)

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] elf: Simplify software TM implementation in _dl_find_object
  2022-01-07 16:08     ` Szabolcs Nagy
@ 2022-01-07 16:29       ` Florian Weimer
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2022-01-07 16:29 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 01/07/2022 16:56, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>> > The 01/07/2022 14:41, Florian Weimer wrote:
>> >> With the current set of fences, the version update at the start
>> >> of the TM write operation is redundant.  Also use relaxed MO stores
>> >> during the dlclose update, and skip any version changes there.
>> >> 
>> >> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
>> >> 
>> >> ---
>> >> Initial testing looks good, but I'll keep the aarch64 and powerpc64le
>> >> tests running in a loop for a while.
>> >
>> > looks good to me except
>> >
>> >> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void)
>> >>    atomic_thread_fence_release ();
>> >>    __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
>> >>  }
>> >
>> > i think the v += 1 does not have to be atomic so
>> >
>> >  start_version = load_relaxed (&ver);
>> >  store_relaxed (&ver, start_version + 1);
>> >
>> > or if we can change the api to pass start_version
>> > that may be clearer.
>> 
>> The 32-bit wide counter needs to be atomic.  We can't use a straight
>> store in that case.
>> 
>> Should I break abstraction and write through for the 64-bit case?
>
> i didn't think about that.
>
> but it seems there is a
>
> __atomic_wide_counter_add_relaxed
>
> which seems to do the right thing.

Good point.  I'll restart my tests with that.

Thanks,
Flroian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] elf: Simplify software TM implementation in _dl_find_object
  2022-01-07 15:56   ` Florian Weimer
@ 2022-01-07 16:08     ` Szabolcs Nagy
  2022-01-07 16:29       ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2022-01-07 16:08 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 01/07/2022 16:56, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
> > The 01/07/2022 14:41, Florian Weimer wrote:
> >> With the current set of fences, the version update at the start
> >> of the TM write operation is redundant.  Also use relaxed MO stores
> >> during the dlclose update, and skip any version changes there.
> >> 
> >> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> >> 
> >> ---
> >> Initial testing looks good, but I'll keep the aarch64 and powerpc64le
> >> tests running in a loop for a while.
> >
> > looks good to me except
> >
> >> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void)
> >>    atomic_thread_fence_release ();
> >>    __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
> >>  }
> >
> > i think the v += 1 does not have to be atomic so
> >
> >  start_version = load_relaxed (&ver);
> >  store_relaxed (&ver, start_version + 1);
> >
> > or if we can change the api to pass start_version
> > that may be clearer.
> 
> The 32-bit wide counter needs to be atomic.  We can't use a straight
> store in that case.
> 
> Should I break abstraction and write through for the 64-bit case?

i didn't think about that.

but it seems there is a

__atomic_wide_counter_add_relaxed

which seems to do the right thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] elf: Simplify software TM implementation in _dl_find_object
  2022-01-07 15:49 ` Szabolcs Nagy
@ 2022-01-07 15:56   ` Florian Weimer
  2022-01-07 16:08     ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-01-07 15:56 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 01/07/2022 14:41, Florian Weimer wrote:
>> With the current set of fences, the version update at the start
>> of the TM write operation is redundant.  Also use relaxed MO stores
>> during the dlclose update, and skip any version changes there.
>> 
>> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
>> 
>> ---
>> Initial testing looks good, but I'll keep the aarch64 and powerpc64le
>> tests running in a loop for a while.
>
> looks good to me except
>
>> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void)
>>    atomic_thread_fence_release ();
>>    __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
>>  }
>
> i think the v += 1 does not have to be atomic so
>
>  start_version = load_relaxed (&ver);
>  store_relaxed (&ver, start_version + 1);
>
> or if we can change the api to pass start_version
> that may be clearer.

The 32-bit wide counter needs to be atomic.  We can't use a straight
store in that case.

Should I break abstraction and write through for the 64-bit case?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] elf: Simplify software TM implementation in _dl_find_object
  2022-01-07 13:41 Florian Weimer
@ 2022-01-07 15:49 ` Szabolcs Nagy
  2022-01-07 15:56   ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2022-01-07 15:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 01/07/2022 14:41, Florian Weimer wrote:
> With the current set of fences, the version update at the start
> of the TM write operation is redundant.  Also use relaxed MO stores
> during the dlclose update, and skip any version changes there.
> 
> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> ---
> Initial testing looks good, but I'll keep the aarch64 and powerpc64le
> tests running in a loop for a while.

looks good to me except

> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void)
>    atomic_thread_fence_release ();
>    __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
>  }

i think the v += 1 does not have to be atomic so

 start_version = load_relaxed (&ver);
 store_relaxed (&ver, start_version + 1);

or if we can change the api to pass start_version
that may be clearer.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] elf: Simplify software TM implementation in _dl_find_object
@ 2022-01-07 13:41 Florian Weimer
  2022-01-07 15:49 ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-01-07 13:41 UTC (permalink / raw)
  To: libc-alpha; +Cc: Szabolcs Nagy

With the current set of fences, the version update at the start
of the TM write operation is redundant.  Also use relaxed MO stores
during the dlclose update, and skip any version changes there.

Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

---
Initial testing looks good, but I'll keep the aarch64 and powerpc64le
tests running in a loop for a while.

 elf/dl-find_object.c | 50 ++++++++++++++++----------------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)

diff --git a/elf/dl-find_object.c b/elf/dl-find_object.c
index 0eb8607edd..7b906c8f1d 100644
--- a/elf/dl-find_object.c
+++ b/elf/dl-find_object.c
@@ -123,9 +123,9 @@ struct dlfo_mappings_segment
 
 /* To achieve async-signal-safety, two copies of the data structure
    are used, so that a signal handler can still use this data even if
-   dlopen or dlclose modify the other copy.  The the MSB in
-   _dlfo_loaded_mappings_version determines which array element is the
-   currently active region.  */
+   dlopen or dlclose modify the other copy.  The the least significant
+   bit in _dlfo_loaded_mappings_version determines which array element
+   is the currently active region.  */
 static struct dlfo_mappings_segment *_dlfo_loaded_mappings[2];
 
 /* Returns the number of actually used elements in all segments
@@ -248,7 +248,7 @@ _dlfo_read_start_version (void)
 {
   /* Acquire MO load synchronizes with the fences at the beginning and
      end of the TM update region in _dlfo_mappings_begin_update,
-     _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch.  */
+     _dlfo_mappings_end_update.  */
   return __atomic_wide_counter_load_acquire (&_dlfo_loaded_mappings_version);
 }
 
@@ -261,16 +261,13 @@ _dlfo_read_version_locked (void)
 }
 
 /* Update the version to reflect that an update is happening.  This
-   does not change the bit that controls the active segment chain.
-   Returns the index of the currently active segment chain.  */
-static inline unsigned int
+   does not change the bit that controls the active segment chain.  */
+static inline void
 _dlfo_mappings_begin_update (void)
 {
-  /* The store synchronizes with loads in _dlfo_read_start_version
+  /* The fence synchronizes with loads in _dlfo_read_start_version
      (also called from _dlfo_read_success).  */
   atomic_thread_fence_release ();
-  return __atomic_wide_counter_fetch_add_relaxed
-    (&_dlfo_loaded_mappings_version, 2);
 }
 
 /* Installs the just-updated version as the active version.  */
@@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void)
   atomic_thread_fence_release ();
   __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1);
 }
-/* Completes an in-place update without switching versions.  */
-static inline void
-_dlfo_mappings_end_update_no_switch (void)
-{
-  /* The store synchronizes with loads in _dlfo_read_start_version
-     (also called from _dlfo_read_success).  */
-  atomic_thread_fence_release ();
-  __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 2);
-}
 
 /* Return true if the read was successful, given the start
    version.  */
@@ -302,10 +290,11 @@ _dlfo_read_success (uint64_t start_version)
      the TM region are not ordered past the version check below.  */
   atomic_thread_fence_acquire ();
 
-  /* Synchronizes with stores in _dlfo_mappings_begin_update,
-     _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch.
-     It is important that all stores from the last update have been
-     visible, and stores from the next updates are not.
+  /* Synchronizes with the fences in _dlfo_mappings_begin_update,
+     _dlfo_mappings_end_update.  It is important that all stores from
+     the last update have become visible, and stores from the next
+     update to this version are not before the version number is
+     updated.
 
      Unlike with seqlocks, there is no check for odd versions here
      because we have read the unmodified copy (confirmed to be
@@ -839,17 +828,10 @@ _dl_find_object_dlclose (struct link_map *map)
              issues around __libc_freeres.  */
             return;
 
-        /* The update happens in-place, but given that we do not use
-           atomic accesses on the read side, update the version around
-           the update to trigger re-validation in concurrent
-           readers.  */
-        _dlfo_mappings_begin_update ();
-
-        /* Mark as closed.  */
-        obj->map_end = obj->map_start;
-        obj->map = NULL;
-
-        _dlfo_mappings_end_update_no_switch ();
+        /* Mark as closed.  This does not change the overall data
+           structure, so no TM cycle is needed.  */
+        atomic_store_relaxed (&obj->map_end, obj->map_start);
+        atomic_store_relaxed (&obj->map, NULL);
         return;
       }
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-10 12:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 10:10 [PATCH] elf: Simplify software TM implementation in _dl_find_object Florian Weimer
2022-01-10 12:29 ` Szabolcs Nagy
  -- strict thread matches above, loose matches on Subject: below --
2022-01-07 13:41 Florian Weimer
2022-01-07 15:49 ` Szabolcs Nagy
2022-01-07 15:56   ` Florian Weimer
2022-01-07 16:08     ` Szabolcs Nagy
2022-01-07 16:29       ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).