Re:

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Re:
@ 2023-01-13  5:41 father.dominic
  0 siblings, 0 replies; 7+ messages in thread
From: father.dominic @ 2023-01-13  5:41 UTC (permalink / raw)
  To: father.dominic

Hello 

I'm obliged to inform you that a monetary donation of $800,000.00 USD has
been awarded to you. Contact  stef@stefaniekoren.com for more information.

Regards.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v10 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific
@ 2023-05-27 18:46 Noah Goldstein
  2023-07-10  5:23 ` Sajan Karumanchi
  0 siblings, 1 reply; 7+ messages in thread
From: Noah Goldstein @ 2023-05-27 18:46 UTC (permalink / raw)
  To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos

Different systems prefer a different divisors.

From benchmarks[1] so far the following divisors have been found:
    ICX     : 2
    SKX     : 2
    BWD     : 8

For Intel, we are generalizing that BWD and older prefers 8 as a
divisor, and SKL and newer prefers 2. This number can be further tuned
as benchmarks are run.

[1]: https://github.com/goldsteinn/memcpy-nt-benchmarks
---
 sysdeps/x86/cpu-features.c         | 31 ++++++++++++++++++++---------
 sysdeps/x86/dl-cacheinfo.h         | 32 ++++++++++++++++++------------
 sysdeps/x86/dl-diagnostics-cpu.c   | 11 ++++++----
 sysdeps/x86/include/cpu-features.h |  3 +++
 4 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 1b6e00c88f..325ec2b825 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -636,6 +636,7 @@ init_cpu_features (struct cpu_features *cpu_features)
   unsigned int stepping = 0;
   enum cpu_features_kind kind;
 
+  cpu_features->cachesize_non_temporal_divisor = 4;
 #if !HAS_CPUID
   if (__get_cpuid_max (0, 0) == 0)
     {
@@ -716,13 +717,13 @@ init_cpu_features (struct cpu_features *cpu_features)
 
 	      /* Bigcore/Default Tuning.  */
 	    default:
+	    default_tuning:
 	      /* Unknown family 0x06 processors.  Assuming this is one
 		 of Core i3/i5/i7 processors if AVX is available.  */
 	      if (!CPU_FEATURES_CPU_P (cpu_features, AVX))
 		break;
-	      /* Fall through.  */
-	    case INTEL_BIGCORE_NEHALEM:
-	    case INTEL_BIGCORE_WESTMERE:
+
+	    enable_modern_features:
 	      /* Rep string instructions, unaligned load, unaligned copy,
 		 and pminub are fast on Intel Core i3, i5 and i7.  */
 	      cpu_features->preferred[index_arch_Fast_Rep_String]
@@ -732,12 +733,23 @@ init_cpu_features (struct cpu_features *cpu_features)
 		      | bit_arch_Prefer_PMINUB_for_stringop);
 	      break;
 
-	   /*
-	    Default tuned Bigcore microarch.
+	    case INTEL_BIGCORE_NEHALEM:
+	    case INTEL_BIGCORE_WESTMERE:
+	      /* Older CPUs prefer non-temporal stores at lower threshold.  */
+	      cpu_features->cachesize_non_temporal_divisor = 8;
+	      goto enable_modern_features;
+
+	      /* Older Bigcore microarch (smaller non-temporal store
+		 threshold).  */
 	    case INTEL_BIGCORE_SANDYBRIDGE:
 	    case INTEL_BIGCORE_IVYBRIDGE:
 	    case INTEL_BIGCORE_HASWELL:
 	    case INTEL_BIGCORE_BROADWELL:
+	      cpu_features->cachesize_non_temporal_divisor = 8;
+	      goto default_tuning;
+
+	      /* Newer Bigcore microarch (larger non-temporal store
+		 threshold).  */
 	    case INTEL_BIGCORE_SKYLAKE:
 	    case INTEL_BIGCORE_KABYLAKE:
 	    case INTEL_BIGCORE_COMETLAKE:
@@ -753,13 +765,14 @@ init_cpu_features (struct cpu_features *cpu_features)
 	    case INTEL_BIGCORE_SAPPHIRERAPIDS:
 	    case INTEL_BIGCORE_EMERALDRAPIDS:
 	    case INTEL_BIGCORE_GRANITERAPIDS:
-	    */
+	      cpu_features->cachesize_non_temporal_divisor = 2;
+	      goto default_tuning;
 
-	   /*
-	    Default tuned Mixed (bigcore + atom SOC).
+	      /* Default tuned Mixed (bigcore + atom SOC). */
 	    case INTEL_MIXED_LAKEFIELD:
 	    case INTEL_MIXED_ALDERLAKE:
-	    */
+	      cpu_features->cachesize_non_temporal_divisor = 2;
+	      goto default_tuning;
 	    }
 
 	      /* Disable TSX on some processors to avoid TSX on kernels that
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index 4a1a5423ff..8292a4a50d 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -738,19 +738,25 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
   cpu_features->level3_cache_linesize = level3_cache_linesize;
   cpu_features->level4_cache_size = level4_cache_size;
 
-  /* The default setting for the non_temporal threshold is 1/4 of size
-     of the chip's cache. For most Intel and AMD processors with an
-     initial release date between 2017 and 2023, a thread's typical
-     share of the cache is from 18-64MB. Using the 1/4 L3 is meant to
-     estimate the point where non-temporal stores begin outcompeting
-     REP MOVSB. As well the point where the fact that non-temporal
-     stores are forced back to main memory would already occurred to the
-     majority of the lines in the copy. Note, concerns about the
-     entire L3 cache being evicted by the copy are mostly alleviated
-     by the fact that modern HW detects streaming patterns and
-     provides proper LRU hints so that the maximum thrashing
-     capped at 1/associativity. */
-  unsigned long int non_temporal_threshold = shared / 4;
+  unsigned long int cachesize_non_temporal_divisor
+      = cpu_features->cachesize_non_temporal_divisor;
+  if (cachesize_non_temporal_divisor <= 0)
+    cachesize_non_temporal_divisor = 4;
+
+  /* The default setting for the non_temporal threshold is [1/8, 1/2] of size
+     of the chip's cache (depending on `cachesize_non_temporal_divisor` which
+     is microarch specific. The defeault is 1/4). For most Intel and AMD
+     processors with an initial release date between 2017 and 2023, a thread's
+     typical share of the cache is from 18-64MB. Using a reasonable size
+     fraction of L3 is meant to estimate the point where non-temporal stores
+     begin outcompeting REP MOVSB. As well the point where the fact that
+     non-temporal stores are forced back to main memory would already occurred
+     to the majority of the lines in the copy. Note, concerns about the entire
+     L3 cache being evicted by the copy are mostly alleviated by the fact that
+     modern HW detects streaming patterns and provides proper LRU hints so that
+     the maximum thrashing capped at 1/associativity. */
+  unsigned long int non_temporal_threshold
+      = shared / cachesize_non_temporal_divisor;
   /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run
      a higher risk of actually thrashing the cache as they don't have a HW LRU
      hint. As well, there performance in highly parallel situations is
diff --git a/sysdeps/x86/dl-diagnostics-cpu.c b/sysdeps/x86/dl-diagnostics-cpu.c
index a1578e4665..5aab63e532 100644
--- a/sysdeps/x86/dl-diagnostics-cpu.c
+++ b/sysdeps/x86/dl-diagnostics-cpu.c
@@ -113,8 +113,11 @@ _dl_diagnostics_cpu (void)
                             cpu_features->level3_cache_linesize);
   print_cpu_features_value ("level4_cache_size",
                             cpu_features->level4_cache_size);
-  _Static_assert (offsetof (struct cpu_features, level4_cache_size)
-                  + sizeof (cpu_features->level4_cache_size)
-                  == sizeof (*cpu_features),
-                  "last cpu_features field has been printed");
+  print_cpu_features_value ("cachesize_non_temporal_divisor",
+			    cpu_features->cachesize_non_temporal_divisor);
+  _Static_assert (
+      offsetof (struct cpu_features, cachesize_non_temporal_divisor)
+	      + sizeof (cpu_features->cachesize_non_temporal_divisor)
+	  == sizeof (*cpu_features),
+      "last cpu_features field has been printed");
 }
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index 40b8129d6a..c740e1a5fc 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -945,6 +945,9 @@ struct cpu_features
   unsigned long int level3_cache_linesize;
   /* /_SC_LEVEL4_CACHE_SIZE.  */
   unsigned long int level4_cache_size;
+  /* When no user non_temporal_threshold is specified. We default to
+     cachesize / cachesize_non_temporal_divisor.  */
+  unsigned long int cachesize_non_temporal_divisor;
 };
 
 /* Get a pointer to the CPU features structure.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* (no subject)
  2023-05-27 18:46 [PATCH v10 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
@ 2023-07-10  5:23 ` Sajan Karumanchi
  2023-07-10 15:58   ` Noah Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Sajan Karumanchi @ 2023-07-10  5:23 UTC (permalink / raw)
  To: goldstein.w.n; +Cc: premachandra.mallappa, dj, hjl.tools, libc-alpha, carlos

Noah,
I verified your patches on the master branch that impacts the non-threshold
 parameter on x86 CPUs. This patch modifies the non-temporal threshold value
from 24MB(3/4th of L3$) to 8MB(1/4th of L3$) on ZEN4.
From the Glibc benchmarks, we saw a significant performance drop ranging
from 15% to 99% for size ranges of 8MB to 16MB.
I also ran the new tool developed by you on all Zen architectures and the
results conclude that 3/4th L3 size holds good on AMD CPUs.
Hence the current patch degrades the performance of AMD CPUs.
We strongly recommend marking this change to Intel CPUs only.

Thanks,
Sajan K.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:
  2023-07-10  5:23 ` Sajan Karumanchi
@ 2023-07-10 15:58   ` Noah Goldstein
  2023-07-14  2:21     ` Re: Noah Goldstein
  2023-07-14  7:39     ` Re: sajan karumanchi
  0 siblings, 2 replies; 7+ messages in thread
From: Noah Goldstein @ 2023-07-10 15:58 UTC (permalink / raw)
  To: Sajan Karumanchi; +Cc: premachandra.mallappa, dj, hjl.tools, libc-alpha, carlos

On Mon, Jul 10, 2023 at 12:23 AM Sajan Karumanchi
<sajan.karumanchi@gmail.com> wrote:
>
> Noah,
> I verified your patches on the master branch that impacts the non-threshold
>  parameter on x86 CPUs. This patch modifies the non-temporal threshold value
> from 24MB(3/4th of L3$) to 8MB(1/4th of L3$) on ZEN4.
> From the Glibc benchmarks, we saw a significant performance drop ranging
> from 15% to 99% for size ranges of 8MB to 16MB.
> I also ran the new tool developed by you on all Zen architectures and the
> results conclude that 3/4th L3 size holds good on AMD CPUs.
> Hence the current patch degrades the performance of AMD CPUs.
> We strongly recommend marking this change to Intel CPUs only.
>

So it shouldn't actually go down. I think what is missing is:
```
get_common_cache_info (&shared, &shared_per_thread, &threads, core);
```

In the AMD case shared == shared_per_thread which shouldn't really
be the case.

The intended new calculation is: Total_L3_Size / Scale
as opposed to: (L3_Size / NThread) / Scale"

Before just going with default for AMD, maybe try out the following patch?

```
---
 sysdeps/x86/dl-cacheinfo.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index c98fa57a7b..c1866ca898 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -717,6 +717,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
       level3_cache_assoc = handle_amd (_SC_LEVEL3_CACHE_ASSOC);
       level3_cache_linesize = handle_amd (_SC_LEVEL3_CACHE_LINESIZE);

+      get_common_cache_info (&shared, &shared_per_thread, &threads, core);
       if (shared <= 0)
         /* No shared L3 cache.  All we have is the L2 cache.  */
  shared = core;
-- 
2.34.1
```
> Thanks,
> Sajan K.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:
  2023-07-10 15:58   ` Noah Goldstein
@ 2023-07-14  2:21     ` Noah Goldstein
  2023-07-14  7:39     ` Re: sajan karumanchi
  1 sibling, 0 replies; 7+ messages in thread
From: Noah Goldstein @ 2023-07-14  2:21 UTC (permalink / raw)
  To: Sajan Karumanchi; +Cc: premachandra.mallappa, dj, hjl.tools, libc-alpha, carlos

On Mon, Jul 10, 2023 at 10:58 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 12:23 AM Sajan Karumanchi
> <sajan.karumanchi@gmail.com> wrote:
> >
> > Noah,
> > I verified your patches on the master branch that impacts the non-threshold
> >  parameter on x86 CPUs. This patch modifies the non-temporal threshold value
> > from 24MB(3/4th of L3$) to 8MB(1/4th of L3$) on ZEN4.
> > From the Glibc benchmarks, we saw a significant performance drop ranging
> > from 15% to 99% for size ranges of 8MB to 16MB.
> > I also ran the new tool developed by you on all Zen architectures and the
> > results conclude that 3/4th L3 size holds good on AMD CPUs.
> > Hence the current patch degrades the performance of AMD CPUs.
> > We strongly recommend marking this change to Intel CPUs only.
> >
>
> So it shouldn't actually go down. I think what is missing is:
> ```
> get_common_cache_info (&shared, &shared_per_thread, &threads, core);
> ```
>
> In the AMD case shared == shared_per_thread which shouldn't really
> be the case.
>
> The intended new calculation is: Total_L3_Size / Scale
> as opposed to: (L3_Size / NThread) / Scale"
>
> Before just going with default for AMD, maybe try out the following patch?
>
> ```
> ---
>  sysdeps/x86/dl-cacheinfo.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
> index c98fa57a7b..c1866ca898 100644
> --- a/sysdeps/x86/dl-cacheinfo.h
> +++ b/sysdeps/x86/dl-cacheinfo.h
> @@ -717,6 +717,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>        level3_cache_assoc = handle_amd (_SC_LEVEL3_CACHE_ASSOC);
>        level3_cache_linesize = handle_amd (_SC_LEVEL3_CACHE_LINESIZE);
>
> +      get_common_cache_info (&shared, &shared_per_thread, &threads, core);
>        if (shared <= 0)
>          /* No shared L3 cache.  All we have is the L2 cache.  */
>   shared = core;
> --
> 2.34.1
> ```
> > Thanks,
> > Sajan K.
> >

ping. 2.38 is approaching and I expect you want to get any fixes in before
that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:
  2023-07-10 15:58   ` Noah Goldstein
  2023-07-14  2:21     ` Re: Noah Goldstein
@ 2023-07-14  7:39     ` sajan karumanchi
  1 sibling, 0 replies; 7+ messages in thread
From: sajan karumanchi @ 2023-07-14  7:39 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: premachandra.mallappa, dj, hjl.tools, libc-alpha, carlos,
	Sajan Karumanchi

* Noah,
On Mon, Jul 10, 2023 at 9:28 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 12:23 AM Sajan Karumanchi
> <sajan.karumanchi@gmail.com> wrote:
> >
> > Noah,
> > I verified your patches on the master branch that impacts the non-threshold
> >  parameter on x86 CPUs. This patch modifies the non-temporal threshold value
> > from 24MB(3/4th of L3$) to 8MB(1/4th of L3$) on ZEN4.
> > From the Glibc benchmarks, we saw a significant performance drop ranging
> > from 15% to 99% for size ranges of 8MB to 16MB.
> > I also ran the new tool developed by you on all Zen architectures and the
> > results conclude that 3/4th L3 size holds good on AMD CPUs.
> > Hence the current patch degrades the performance of AMD CPUs.
> > We strongly recommend marking this change to Intel CPUs only.
> >
>
> So it shouldn't actually go down. I think what is missing is:
> ```
> get_common_cache_info (&shared, &shared_per_thread, &threads, core);
> ```
>
The cache info of AMD CPUs is spread across CPUID registers:
0x80000005,  0x80000006, and  0x8000001D.
But, 'get_common_cache_info(...)' is using CPUID register 0x00000004
for enumerating cache details. This leads to an infinite loop in the
initialization stage for enumerating the cache details on AMD CPUs.

> In the AMD case shared == shared_per_thread which shouldn't really
> be the case.
>
> The intended new calculation is: Total_L3_Size / Scale
> as opposed to: (L3_Size / NThread) / Scale"
>
AMD Zen CPUs are chiplet based, so we consider only L3/CCX for
computing the nt_threshold.
* handle_amd(_SC_LEVEL3_CACHE_SIZE) initializes 'shared' variable with
'l3_cache_per_ccx' for Zen architectures and 'l3_cache_per_thread' for
pre-Zen architectures.

> Before just going with default for AMD, maybe try out the following patch?
>
Since the cache info registers and the approach to compute the cache
details on AMD are different from Intel, we cannot use the below
patch.
> ```
> ---
>  sysdeps/x86/dl-cacheinfo.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
> index c98fa57a7b..c1866ca898 100644
> --- a/sysdeps/x86/dl-cacheinfo.h
> +++ b/sysdeps/x86/dl-cacheinfo.h
> @@ -717,6 +717,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>        level3_cache_assoc = handle_amd (_SC_LEVEL3_CACHE_ASSOC);
>        level3_cache_linesize = handle_amd (_SC_LEVEL3_CACHE_LINESIZE);
>
> +      get_common_cache_info (&shared, &shared_per_thread, &threads, core);
>        if (shared <= 0)
>          /* No shared L3 cache.  All we have is the L2 cache.  */
>   shared = core;
> --
> 2.34.1
> ```
> > Thanks,
> > Sajan K.
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* (no subject)
@ 2021-06-06 19:19 Davidlohr Bueso
  2021-06-07 16:02 ` André Almeida
  0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2021-06-06 19:19 UTC (permalink / raw)
  To: Andrï¿½ Almeida
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	linux-kernel, Steven Rostedt, Sebastian Andrzej Siewior, kernel,
	krisman, pgriffais, z.figura12, joel, malteskarupke, linux-api,
	fweimer, libc-alpha, linux-kselftest, shuah, acme, corbet,
	Peter Oskolkov, Andrey Semashev, mtk.manpages

Bcc:
Subject: Re: [PATCH v4 07/15] docs: locking: futex2: Add documentation
Reply-To:
In-Reply-To: <20210603195924.361327-8-andrealmeid@collabora.com>

On Thu, 03 Jun 2021, Andrï¿½ Almeida wrote:

>Add a new documentation file specifying both userspace API and internal
>implementation details of futex2 syscalls.

I think equally important would be to provide a manpage for each new
syscall you are introducing, and keep mkt in the loop as in the past he
extensively documented and improved futex manpages, and overall has a
lot of experience with dealing with kernel interfaces.

Thanks,
Davidlohr

>
>Signed-off-by: André Almeida <andrealmeid@collabora.com>
>---
> Documentation/locking/futex2.rst | 198 +++++++++++++++++++++++++++++++
> Documentation/locking/index.rst  |   1 +
> 2 files changed, 199 insertions(+)
> create mode 100644 Documentation/locking/futex2.rst
>
>diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst
>new file mode 100644
>index 000000000000..2f74d7c97a55
>--- /dev/null
>+++ b/Documentation/locking/futex2.rst
>@@ -0,0 +1,198 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+======
>+futex2
>+======
>+
>+:Author: André Almeida <andrealmeid@collabora.com>
>+
>+futex, or fast user mutex, is a set of syscalls to allow userspace to create
>+performant synchronization mechanisms, such as mutexes, semaphores and
>+conditional variables in userspace. C standard libraries, like glibc, uses it
>+as a means to implement more high level interfaces like pthreads.
>+
>+The interface
>+=============
>+
>+uAPI functions
>+--------------
>+
>+.. kernel-doc:: kernel/futex2.c
>+   :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>+
>+uAPI structures
>+---------------
>+
>+.. kernel-doc:: include/uapi/linux/futex.h
>+
>+The ``flag`` argument
>+---------------------
>+
>+The flag is used to specify the size of the futex word
>+(FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no
>+default size.
>+
>+By default, the timeout uses a monotonic clock, but can be used as a realtime
>+one by using the FUTEX_REALTIME_CLOCK flag.
>+
>+By default, futexes are of the private type, that means that this user address
>+will be accessed by threads that share the same memory region. This allows for
>+some internal optimizations, so they are faster. However, if the address needs
>+to be shared with different processes (like using ``mmap()`` or ``shm()``), they
>+need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that.
>+
>+By default, the operation has no NUMA-awareness, meaning that the user can't
>+choose the memory node where the kernel side futex data will be stored. The
>+user can choose the node where it wants to operate by setting the
>+FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, 32 or
>+64)::
>+
>+ struct futexX_numa {
>+         __uX value;
>+         __sX hint;
>+ };
>+
>+This structure should be passed at the ``void *uaddr`` of futex functions. The
>+address of the structure will be used to be waited on/waken on, and the
>+``value`` will be compared to ``val`` as usual. The ``hint`` member is used to
>+define which node the futex will use. When waiting, the futex will be
>+registered on a kernel-side table stored on that node; when waking, the futex
>+will be searched for on that given table. That means that there's no redundancy
>+between tables, and the wrong ``hint`` value will lead to undesired behavior.
>+Userspace is responsible for dealing with node migrations issues that may
>+occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a node, or
>+-1, to use the same node the current process is using.
>+
>+When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a
>+global table on allocated on the first node.
>+
>+The ``timo`` argument
>+---------------------
>+
>+As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout
>+options known to be buggy. Given that, ``timo`` should be a 64-bit timeout at
>+all platforms, using an absolute timeout value.
>+
>+Implementation
>+==============
>+
>+The internal implementation follows a similar design to the original futex.
>+Given that we want to replicate the same external behavior of current futex,
>+this should be somewhat expected.
>+
>+Waiting
>+-------
>+
>+For the wait operations, they are all treated as if you want to wait on N
>+futexes, so the path for futex_wait and futex_waitv is the basically the same.
>+For both syscalls, the first step is to prepare an internal list for the list
>+of futexes to wait for (using struct futexv_head). For futex_wait() calls, this
>+list will have a single object.
>+
>+We have a hash table, where waiters register themselves before sleeping. Then
>+the wake function checks this table looking for waiters at uaddr.  The hash
>+bucket to be used is determined by a struct futex_key, that stores information
>+to uniquely identify an address from a given process. Given the huge address
>+space, there'll be hash collisions, so we store information to be later used on
>+collision treatment.
>+
>+First, for every futex we want to wait on, we check if (``*uaddr == val``).
>+This check is done holding the bucket lock, so we are correctly serialized with
>+any futex_wake() calls. If any waiter fails the check above, we dequeue all
>+futexes. The check (``*uaddr == val``) can fail for two reasons:
>+
>+- The values are different, and we return -EAGAIN. However, if while
>+  dequeueing we found that some futexes were awakened, we prioritize this
>+  and return success.
>+
>+- When trying to access the user address, we do so with page faults
>+  disabled because we are holding a bucket's spin lock (and can't sleep
>+  while holding a spin lock). If there's an error, it might be a page
>+  fault, or an invalid address. We release the lock, dequeue everyone
>+  (because it's illegal to sleep while there are futexes enqueued, we
>+  could lose wakeups) and try again with page fault enabled. If we
>+  succeed, this means that the address is valid, but we need to do
>+  all the work again. For serialization reasons, we need to have the
>+  spin lock when getting the user value. Additionally, for shared
>+  futexes, we also need to recalculate the hash, since the underlying
>+  mapping mechanisms could have changed when dealing with page fault.
>+  If, even with page fault enabled, we can't access the address, it
>+  means it's an invalid user address, and we return -EFAULT. For this
>+  case, we prioritize the error, even if some futexes were awaken.
>+
>+If the check is OK, they are enqueued on a linked list in our bucket, and
>+proceed to the next one. If all waiters succeed, we put the thread to sleep
>+until a futex_wake() call, timeout expires or we get a signal. After waking up,
>+we dequeue everyone, and check if some futex was awakened. This dequeue is done
>+by iteratively walking at each element of struct futex_head list.
>+
>+All enqueuing/dequeuing operations requires to hold the bucket lock, to avoid
>+racing while modifying the list.
>+
>+Waking
>+------
>+
>+We get the bucket that's storing the waiters at uaddr, and wake the required
>+number of waiters, checking for hash collision.
>+
>+There's an optimization that makes futex_wake() not take the bucket lock if
>+there's no one to be woken on that bucket. It checks an atomic counter that each
>+bucket has, if it says 0, then the syscall exits. In order for this to work, the
>+waiter thread increases it before taking the lock, so the wake thread will
>+correctly see that there's someone waiting and will continue the path to take
>+the bucket lock. To get the correct serialization, the waiter issues a memory
>+barrier after increasing the bucket counter and the waker issues a memory
>+barrier before checking it.
>+
>+Requeuing
>+---------
>+
>+The requeue path first checks for each struct futex_requeue and their flags.
>+Then, it will compare the expected value with the one at uaddr1::uaddr.
>+Following the same serialization explained at Waking_, we increase the atomic
>+counter for the bucket of uaddr2 before taking the lock. We need to have both
>+buckets locks at same time so we don't race with other futex operation. To
>+ensure the locks are taken in the same order for all threads (and thus avoiding
>+deadlocks), every requeue operation takes the "smaller" bucket first, when
>+comparing both addresses.
>+
>+If the compare with user value succeeds, we proceed by waking ``nr_wake``
>+futexes, and then requeuing ``nr_requeue`` from bucket of uaddr1 to the uaddr2.
>+This consists in a simple list deletion/addition and replacing the old futex key
>+with the new one.
>+
>+Futex keys
>+----------
>+
>+There are two types of futexes: private and shared ones. The private are futexes
>+meant to be used by threads that share the same memory space, are easier to be
>+uniquely identified and thus can have some performance optimization. The
>+elements for identifying one are: the start address of the page where the
>+address is, the address offset within the page and the current->mm pointer.
>+
>+Now, for uniquely identifying a shared futex:
>+
>+- If the page containing the user address is an anonymous page, we can
>+  just use the same data used for private futexes (the start address of
>+  the page, the address offset within the page and the current->mm
>+  pointer); that will be enough for uniquely identifying such futex. We
>+  also set one bit at the key to differentiate if a private futex is
>+  used on the same address (mixing shared and private calls does not
>+  work).
>+
>+- If the page is file-backed, current->mm maybe isn't the same one for
>+  every user of this futex, so we need to use other data: the
>+  page->index, a UUID for the struct inode and the offset within the
>+  page.
>+
>+Note that members of futex_key don't have any particular meaning after they
>+are part of the struct - they are just bytes to identify a futex.  Given that,
>+we don't need to use a particular name or type that matches the original data,
>+we only need to care about the bitsize of each component and make both private
>+and shared fit in the same memory space.
>+
>+Source code documentation
>+=========================
>+
>+.. kernel-doc:: kernel/futex2.c
>+   :no-identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
>index 7003bd5aeff4..9bf03c7fa1ec 100644
>--- a/Documentation/locking/index.rst
>+++ b/Documentation/locking/index.rst
>@@ -24,6 +24,7 @@ locking
>     percpu-rw-semaphore
>     robust-futexes
>     robust-futex-ABI
>+    futex2
>
> .. only::  subproject and html
>
>--
>2.31.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:
  2021-06-06 19:19 Davidlohr Bueso
@ 2021-06-07 16:02 ` André Almeida
  0 siblings, 0 replies; 7+ messages in thread
From: André Almeida @ 2021-06-07 16:02 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	linux-kernel, Steven Rostedt, Sebastian Andrzej Siewior, kernel,
	krisman, pgriffais, z.figura12, joel, malteskarupke, linux-api,
	fweimer, libc-alpha, linux-kselftest, shuah, acme, corbet,
	Peter Oskolkov, Andrey Semashev, mtk.manpages

Às 16:19 de 06/06/21, Davidlohr Bueso escreveu:
> Bcc:
> Subject: Re: [PATCH v4 07/15] docs: locking: futex2: Add documentation
> Reply-To:
> In-Reply-To: <20210603195924.361327-8-andrealmeid@collabora.com>
> 
> On Thu, 03 Jun 2021, Andrï¿½ Almeida wrote:
> 
>> Add a new documentation file specifying both userspace API and internal
>> implementation details of futex2 syscalls.
> 
> I think equally important would be to provide a manpage for each new
> syscall you are introducing, and keep mkt in the loop as in the past he
> extensively documented and improved futex manpages, and overall has a
> lot of experience with dealing with kernel interfaces.

Right, I'll add the man pages in a future version and make sure to have
mkt in the loop, thanks for the tip.

> 
> Thanks,
> Davidlohr
> 
>>
>> Signed-off-by: André Almeida <andrealmeid@collabora.com>
>> ---
>> Documentation/locking/futex2.rst | 198 +++++++++++++++++++++++++++++++
>> Documentation/locking/index.rst  |   1 +
>> 2 files changed, 199 insertions(+)
>> create mode 100644 Documentation/locking/futex2.rst
>>
>> diff --git a/Documentation/locking/futex2.rst
>> b/Documentation/locking/futex2.rst
>> new file mode 100644
>> index 000000000000..2f74d7c97a55
>> --- /dev/null
>> +++ b/Documentation/locking/futex2.rst
>> @@ -0,0 +1,198 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +======
>> +futex2
>> +======
>> +
>> +:Author: André Almeida <andrealmeid@collabora.com>
>> +
>> +futex, or fast user mutex, is a set of syscalls to allow userspace to
>> create
>> +performant synchronization mechanisms, such as mutexes, semaphores and
>> +conditional variables in userspace. C standard libraries, like glibc,
>> uses it
>> +as a means to implement more high level interfaces like pthreads.
>> +
>> +The interface
>> +=============
>> +
>> +uAPI functions
>> +--------------
>> +
>> +.. kernel-doc:: kernel/futex2.c
>> +   :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv
>> sys_futex_requeue
>> +
>> +uAPI structures
>> +---------------
>> +
>> +.. kernel-doc:: include/uapi/linux/futex.h
>> +
>> +The ``flag`` argument
>> +---------------------
>> +
>> +The flag is used to specify the size of the futex word
>> +(FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no
>> +default size.
>> +
>> +By default, the timeout uses a monotonic clock, but can be used as a
>> realtime
>> +one by using the FUTEX_REALTIME_CLOCK flag.
>> +
>> +By default, futexes are of the private type, that means that this
>> user address
>> +will be accessed by threads that share the same memory region. This
>> allows for
>> +some internal optimizations, so they are faster. However, if the
>> address needs
>> +to be shared with different processes (like using ``mmap()`` or
>> ``shm()``), they
>> +need to be defined as shared and the flag FUTEX_SHARED_FLAG is used
>> to set that.
>> +
>> +By default, the operation has no NUMA-awareness, meaning that the
>> user can't
>> +choose the memory node where the kernel side futex data will be
>> stored. The
>> +user can choose the node where it wants to operate by setting the
>> +FUTEX_NUMA_FLAG and using the following structure (where X can be 8,
>> 16, 32 or
>> +64)::
>> +
>> + struct futexX_numa {
>> +         __uX value;
>> +         __sX hint;
>> + };
>> +
>> +This structure should be passed at the ``void *uaddr`` of futex
>> functions. The
>> +address of the structure will be used to be waited on/waken on, and the
>> +``value`` will be compared to ``val`` as usual. The ``hint`` member
>> is used to
>> +define which node the futex will use. When waiting, the futex will be
>> +registered on a kernel-side table stored on that node; when waking,
>> the futex
>> +will be searched for on that given table. That means that there's no
>> redundancy
>> +between tables, and the wrong ``hint`` value will lead to undesired
>> behavior.
>> +Userspace is responsible for dealing with node migrations issues that
>> may
>> +occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a
>> node, or
>> +-1, to use the same node the current process is using.
>> +
>> +When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be
>> stored on a
>> +global table on allocated on the first node.
>> +
>> +The ``timo`` argument
>> +---------------------
>> +
>> +As per the Y2038 work done in the kernel, new interfaces shouldn't
>> add timeout
>> +options known to be buggy. Given that, ``timo`` should be a 64-bit
>> timeout at
>> +all platforms, using an absolute timeout value.
>> +
>> +Implementation
>> +==============
>> +
>> +The internal implementation follows a similar design to the original
>> futex.
>> +Given that we want to replicate the same external behavior of current
>> futex,
>> +this should be somewhat expected.
>> +
>> +Waiting
>> +-------
>> +
>> +For the wait operations, they are all treated as if you want to wait
>> on N
>> +futexes, so the path for futex_wait and futex_waitv is the basically
>> the same.
>> +For both syscalls, the first step is to prepare an internal list for
>> the list
>> +of futexes to wait for (using struct futexv_head). For futex_wait()
>> calls, this
>> +list will have a single object.
>> +
>> +We have a hash table, where waiters register themselves before
>> sleeping. Then
>> +the wake function checks this table looking for waiters at uaddr. 
>> The hash
>> +bucket to be used is determined by a struct futex_key, that stores
>> information
>> +to uniquely identify an address from a given process. Given the huge
>> address
>> +space, there'll be hash collisions, so we store information to be
>> later used on
>> +collision treatment.
>> +
>> +First, for every futex we want to wait on, we check if (``*uaddr ==
>> val``).
>> +This check is done holding the bucket lock, so we are correctly
>> serialized with
>> +any futex_wake() calls. If any waiter fails the check above, we
>> dequeue all
>> +futexes. The check (``*uaddr == val``) can fail for two reasons:
>> +
>> +- The values are different, and we return -EAGAIN. However, if while
>> +  dequeueing we found that some futexes were awakened, we prioritize
>> this
>> +  and return success.
>> +
>> +- When trying to access the user address, we do so with page faults
>> +  disabled because we are holding a bucket's spin lock (and can't sleep
>> +  while holding a spin lock). If there's an error, it might be a page
>> +  fault, or an invalid address. We release the lock, dequeue everyone
>> +  (because it's illegal to sleep while there are futexes enqueued, we
>> +  could lose wakeups) and try again with page fault enabled. If we
>> +  succeed, this means that the address is valid, but we need to do
>> +  all the work again. For serialization reasons, we need to have the
>> +  spin lock when getting the user value. Additionally, for shared
>> +  futexes, we also need to recalculate the hash, since the underlying
>> +  mapping mechanisms could have changed when dealing with page fault.
>> +  If, even with page fault enabled, we can't access the address, it
>> +  means it's an invalid user address, and we return -EFAULT. For this
>> +  case, we prioritize the error, even if some futexes were awaken.
>> +
>> +If the check is OK, they are enqueued on a linked list in our bucket,
>> and
>> +proceed to the next one. If all waiters succeed, we put the thread to
>> sleep
>> +until a futex_wake() call, timeout expires or we get a signal. After
>> waking up,
>> +we dequeue everyone, and check if some futex was awakened. This
>> dequeue is done
>> +by iteratively walking at each element of struct futex_head list.
>> +
>> +All enqueuing/dequeuing operations requires to hold the bucket lock,
>> to avoid
>> +racing while modifying the list.
>> +
>> +Waking
>> +------
>> +
>> +We get the bucket that's storing the waiters at uaddr, and wake the
>> required
>> +number of waiters, checking for hash collision.
>> +
>> +There's an optimization that makes futex_wake() not take the bucket
>> lock if
>> +there's no one to be woken on that bucket. It checks an atomic
>> counter that each
>> +bucket has, if it says 0, then the syscall exits. In order for this
>> to work, the
>> +waiter thread increases it before taking the lock, so the wake thread
>> will
>> +correctly see that there's someone waiting and will continue the path
>> to take
>> +the bucket lock. To get the correct serialization, the waiter issues
>> a memory
>> +barrier after increasing the bucket counter and the waker issues a
>> memory
>> +barrier before checking it.
>> +
>> +Requeuing
>> +---------
>> +
>> +The requeue path first checks for each struct futex_requeue and their
>> flags.
>> +Then, it will compare the expected value with the one at uaddr1::uaddr.
>> +Following the same serialization explained at Waking_, we increase
>> the atomic
>> +counter for the bucket of uaddr2 before taking the lock. We need to
>> have both
>> +buckets locks at same time so we don't race with other futex
>> operation. To
>> +ensure the locks are taken in the same order for all threads (and
>> thus avoiding
>> +deadlocks), every requeue operation takes the "smaller" bucket first,
>> when
>> +comparing both addresses.
>> +
>> +If the compare with user value succeeds, we proceed by waking
>> ``nr_wake``
>> +futexes, and then requeuing ``nr_requeue`` from bucket of uaddr1 to
>> the uaddr2.
>> +This consists in a simple list deletion/addition and replacing the
>> old futex key
>> +with the new one.
>> +
>> +Futex keys
>> +----------
>> +
>> +There are two types of futexes: private and shared ones. The private
>> are futexes
>> +meant to be used by threads that share the same memory space, are
>> easier to be
>> +uniquely identified and thus can have some performance optimization. The
>> +elements for identifying one are: the start address of the page where
>> the
>> +address is, the address offset within the page and the current->mm
>> pointer.
>> +
>> +Now, for uniquely identifying a shared futex:
>> +
>> +- If the page containing the user address is an anonymous page, we can
>> +  just use the same data used for private futexes (the start address of
>> +  the page, the address offset within the page and the current->mm
>> +  pointer); that will be enough for uniquely identifying such futex. We
>> +  also set one bit at the key to differentiate if a private futex is
>> +  used on the same address (mixing shared and private calls does not
>> +  work).
>> +
>> +- If the page is file-backed, current->mm maybe isn't the same one for
>> +  every user of this futex, so we need to use other data: the
>> +  page->index, a UUID for the struct inode and the offset within the
>> +  page.
>> +
>> +Note that members of futex_key don't have any particular meaning
>> after they
>> +are part of the struct - they are just bytes to identify a futex. 
>> Given that,
>> +we don't need to use a particular name or type that matches the
>> original data,
>> +we only need to care about the bitsize of each component and make
>> both private
>> +and shared fit in the same memory space.
>> +
>> +Source code documentation
>> +=========================
>> +
>> +.. kernel-doc:: kernel/futex2.c
>> +   :no-identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv
>> sys_futex_requeue
>> diff --git a/Documentation/locking/index.rst
>> b/Documentation/locking/index.rst
>> index 7003bd5aeff4..9bf03c7fa1ec 100644
>> --- a/Documentation/locking/index.rst
>> +++ b/Documentation/locking/index.rst
>> @@ -24,6 +24,7 @@ locking
>>     percpu-rw-semaphore
>>     robust-futexes
>>     robust-futex-ABI
>> +    futex2
>>
>> .. only::  subproject and html
>>
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <1481545990-7247-1-git-send-email-adhemerval.zanella@linaro.org>]

* Re:
       [not found] <1481545990-7247-1-git-send-email-adhemerval.zanella@linaro.org>
@ 2016-12-19 18:18 ` Adhemerval Zanella
  0 siblings, 0 replies; 7+ messages in thread
From: Adhemerval Zanella @ 2016-12-19 18:18 UTC (permalink / raw)
  To: libc-alpha

Is any other blocker for this patchset besides the already ones
addresses for ipc_priv.h on aarch64 [1] and the old definition
for semctl [2]?

I would like to include it for 2.25 (I already added on the
desirable features).

[1] https://sourceware.org/ml/libc-alpha/2016-12/msg00610.html
[2] https://sourceware.org/ml/libc-alpha/2016-12/msg00611.html

On 12/12/2016 10:32, Adhemerval Zanella wrote:
> Subject: [PATCH v4 00/17] Consolidate Linux sysvipc implementation
> 
> Changes from previous version:
> 
>  - Change __ASSUME_SYSVIPC_SYSCALL to __ASSUME_DIRECT_SYSVIPC_SYSCALL.
>  - Remove some misplaced comments.
>  - Fixed some misspelling and grammatical mistakes.
>  - Adjust to use the new subdirectory for test infrastructure (commit
>    c23de0aacbea).
> 
> Also, I did not add AArch64/ILP32 __IPC_64 definition because I would
> to confirm that 1 is really the expected value for the architecture.
> 
> --
> 
> This patchset is a continuation of my Linux syscall consolidation
> implementation and aimed for SySV IPC (message queue, semaphore,
> and shared memory).
> 
> Current Linux default implementation only defines the old ipc
> syscall method.  Architectures need to either to imply the generic
> syscalls.list or reimplement the syscall definition.  To simplify
> and allow to remove some old arch-specific implementation, I added
> the direct syscall method for all supported IPC mechanisms.
> 
> Other changes are simple code reorganization to simplify and all
> compatibility required for various ports.
> 
> The patchset also adds 3 simple tests that aims to check for correct
> argument passing on syscall.  The idea is not to be an extensive
> testing of all supported IPC.
> 
> Checked on x86_64, i686, armhf, aarch64, and powerpc64le.
> 
> Adhemerval Zanella (17):
>   Add __ASSUME_DIRECT_SYSVIPC_SYSCALL for Linux
>   Refactor Linux ipc_priv header
>   Consolidate Linux msgctl implementation
>   Consolidate Linux msgrcv implementation
>   Use msgsnd syscall for Linux implementation
>   Use msgget syscall for Linux implementation
>   Add SYSV message queue test
>   Consolidate Linux semctl implementation
>   Use semget syscall for Linux implementation
>   Use semop syscall for Linux implementation
>   Consolidate Linux semtimedop implementation
>   Add SYSV semaphore test
>   Use shmat syscall for Linux implementation
>   Consolidate Linux shmctl implementation
>   Use shmdt syscall for linux implementation
>   Use shmget syscall for linux implementation
>   Add SYSV shared memory test
> 
>  ChangeLog                                          | 229 +++++++++++++++++++++
>  support/check.h                                    |   5 +
>  sysdeps/unix/sysv/linux/aarch64/ipc_priv.h         |  32 +++
>  sysdeps/unix/sysv/linux/alpha/Makefile             |   3 -
>  sysdeps/unix/sysv/linux/alpha/ipc_priv.h           |  33 ++-
>  sysdeps/unix/sysv/linux/alpha/kernel-features.h    |   3 +
>  sysdeps/unix/sysv/linux/alpha/msgctl.c             |   1 -
>  sysdeps/unix/sysv/linux/alpha/semctl.c             |   1 -
>  sysdeps/unix/sysv/linux/alpha/shmctl.c             |   1 -
>  sysdeps/unix/sysv/linux/alpha/syscalls.list        |  13 --
>  sysdeps/unix/sysv/linux/arm/msgctl.c               |  33 ---
>  sysdeps/unix/sysv/linux/arm/semctl.c               |  54 -----
>  sysdeps/unix/sysv/linux/arm/shmctl.c               |  34 ---
>  sysdeps/unix/sysv/linux/arm/syscalls.list          |  12 --
>  sysdeps/unix/sysv/linux/generic/syscalls.list      |  14 --
>  sysdeps/unix/sysv/linux/hppa/syscalls.list         |  14 --
>  sysdeps/unix/sysv/linux/i386/kernel-features.h     |   3 +
>  sysdeps/unix/sysv/linux/ia64/syscalls.list         |  14 --
>  sysdeps/unix/sysv/linux/ipc_ops.h                  |  30 +++
>  sysdeps/unix/sysv/linux/ipc_priv.h                 |  23 +--
>  sysdeps/unix/sysv/linux/kernel-features.h          |   4 +
>  sysdeps/unix/sysv/linux/m68k/kernel-features.h     |   3 +
>  sysdeps/unix/sysv/linux/m68k/semtimedop.S          |  69 -------
>  sysdeps/unix/sysv/linux/microblaze/msgctl.c        |   1 -
>  sysdeps/unix/sysv/linux/microblaze/semctl.c        |   1 -
>  sysdeps/unix/sysv/linux/microblaze/shmctl.c        |   1 -
>  sysdeps/unix/sysv/linux/microblaze/syscalls.list   |  12 --
>  sysdeps/unix/sysv/linux/mips/ipc_priv.h            |   1 -
>  sysdeps/unix/sysv/linux/mips/kernel-features.h     |   2 +
>  sysdeps/unix/sysv/linux/mips/mips64/ipc_priv.h     |  32 +++
>  sysdeps/unix/sysv/linux/mips/mips64/msgctl.c       |  17 +-
>  sysdeps/unix/sysv/linux/mips/mips64/semctl.c       |  38 +---
>  sysdeps/unix/sysv/linux/mips/mips64/shmctl.c       |  17 +-
>  sysdeps/unix/sysv/linux/mips/mips64/syscalls.list  |  13 --
>  sysdeps/unix/sysv/linux/msgctl.c                   |  45 ++--
>  sysdeps/unix/sysv/linux/msgget.c                   |  11 +-
>  sysdeps/unix/sysv/linux/msgrcv.c                   |  26 +--
>  sysdeps/unix/sysv/linux/msgsnd.c                   |   9 +-
>  sysdeps/unix/sysv/linux/powerpc/ipc_priv.h         |  23 +--
>  sysdeps/unix/sysv/linux/powerpc/kernel-features.h  |   3 +
>  sysdeps/unix/sysv/linux/s390/kernel-features.h     |   3 +
>  sysdeps/unix/sysv/linux/s390/s390-64/syscalls.list |  14 --
>  sysdeps/unix/sysv/linux/s390/semtimedop.c          |  12 +-
>  sysdeps/unix/sysv/linux/semctl.c                   |  58 +++---
>  sysdeps/unix/sysv/linux/semget.c                   |  11 +-
>  sysdeps/unix/sysv/linux/semop.c                    |  10 +-
>  sysdeps/unix/sysv/linux/semtimedop.c               |  13 +-
>  sysdeps/unix/sysv/linux/sh/kernel-features.h       |   3 +
>  sysdeps/unix/sysv/linux/shmat.c                    |  17 +-
>  sysdeps/unix/sysv/linux/shmctl.c                   |  59 +++---
>  sysdeps/unix/sysv/linux/shmdt.c                    |  12 +-
>  sysdeps/unix/sysv/linux/shmget.c                   |  13 +-
>  sysdeps/unix/sysv/linux/sparc/kernel-features.h    |   3 +
>  sysdeps/unix/sysv/linux/sparc/sparc64/ipc_priv.h   |  41 ++++
>  sysdeps/unix/sysv/linux/sparc/sparc64/msgrcv.c     |  32 ---
>  sysdeps/unix/sysv/linux/sparc/sparc64/semctl.c     |  54 -----
>  sysdeps/unix/sysv/linux/x86_64/ipc_priv.h          |  32 +++
>  sysdeps/unix/sysv/linux/x86_64/syscalls.list       |  12 --
>  sysvipc/Makefile                                   |   2 +
>  sysvipc/test-sysvmsg.c                             | 128 ++++++++++++
>  sysvipc/test-sysvsem.c                             | 116 +++++++++++
>  sysvipc/test-sysvshm.c                             | 131 ++++++++++++
>  62 files changed, 1013 insertions(+), 643 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/ipc_priv.h
>  delete mode 100644 sysdeps/unix/sysv/linux/alpha/msgctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/alpha/semctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/alpha/shmctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/arm/msgctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/arm/semctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/arm/shmctl.c
>  create mode 100644 sysdeps/unix/sysv/linux/ipc_ops.h
>  delete mode 100644 sysdeps/unix/sysv/linux/m68k/semtimedop.S
>  delete mode 100644 sysdeps/unix/sysv/linux/microblaze/msgctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/microblaze/semctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/microblaze/shmctl.c
>  delete mode 100644 sysdeps/unix/sysv/linux/mips/ipc_priv.h
>  create mode 100644 sysdeps/unix/sysv/linux/mips/mips64/ipc_priv.h
>  delete mode 100644 sysdeps/unix/sysv/linux/mips/mips64/syscalls.list
>  delete mode 100644 sysdeps/unix/sysv/linux/s390/s390-64/syscalls.list
>  create mode 100644 sysdeps/unix/sysv/linux/sparc/sparc64/ipc_priv.h
>  delete mode 100644 sysdeps/unix/sysv/linux/sparc/sparc64/msgrcv.c
>  delete mode 100644 sysdeps/unix/sysv/linux/sparc/sparc64/semctl.c
>  create mode 100644 sysdeps/unix/sysv/linux/x86_64/ipc_priv.h
>  create mode 100644 sysvipc/test-sysvmsg.c
>  create mode 100644 sysvipc/test-sysvsem.c
>  create mode 100644 sysvipc/test-sysvshm.c
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:
@ 2005-07-15 21:51 ИнфоПространство
  0 siblings, 0 replies; 7+ messages in thread
From: ИнфоПространство @ 2005-07-15 21:51 UTC (permalink / raw)
  To: libc-alpha

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; format=flowed; charset="windows-1251"; reply-type=original, Size: 2487 bytes --]

  =- ÊÎÐÏÎÐÀÒÈÂÍÛÅ ÌÅÐÎÏÐÈßÒÈß -=
  =- ÍÀ ÎÑÒÎÆÅÍÊÅ ÍÀ 2000 êâ.ì -=

 • êîíôåðåíöèè, ñåìèíàðû, ñîáðàíèÿ
 • âûñòàâêè, ïðåçåíòàöèè, ïðàçäíèêè
 • áàíêåòû, ôóðøåòû

 1. Ôóíêöèîíàëüíàÿ ïðèíàäëåæíîñòü: Öåíòð ïðåäíàçíà÷åí äëÿ ïðîâåäåíèÿ âûñòàâîê, êîíôåðåíöèé, ñåìèíàðîâ, ïðåçåíòàöèé, ïîêàçîâ è ïðàçäíè÷íûõ ìåðîïðèÿòèé.

 2. Ìåñòîíàõîæäåíèå: Èñòîðè÷åñêèé öåíòð, 300 ìåòðîâ îò Õðàìà Õðèñòà Ñïàñèòåëÿ, ì.Êðîïîòêèíñêàÿ, ìèêð-í Îñòîæåíêà, 50 ìåòðîâ îò Ïðå÷èñòåíñêîé íàá.

 3. Òåõíè÷åñêèå õàðàêòåðèñòèêè: Îáùàÿ ïëîùàäü öåíòðà 2000 êâ.ì, óíèâåðñàëüíûé çàë-òðàíñôîðìåð ñ äèàïàçîíîì ïëîùàäåé îò 20 äî 1500 êâ.ì, 2 VIP-çàëà, Êàôå-ïèööåðèÿ-êîíäèòåðñêàÿ.

 4. Îñîáåííîñòè:
 - Ñïåöèàëüíûå âûñòàâî÷íûå ñòåíäû äî ïîòîëêà ñ ðàçëè÷íîé öâåòîâîé è ôóíêöèîíàëüíîé ãàììîé.
 - Âîçìîæíîñòü ýêñïîíèðîâàíèÿ àâòîìîáèëåé.
 - Âñòðîåííîå â ïîòîëîê âûñòàâî÷íîå îñâåùåíèå.
 - Øèðîêèé âûáîð ìåáåëè.
 - 2 ñöåíû
 - 2 âõîäà: öåíòðàëüíûé è òåõíè÷åñêèé.
 - Âûñîêîêà÷åñòâåííîå êîâðîâîå ïîêðûòèå
 - Ïðîôåññèîíàëüíîå çâóêîóñèëèòåëüíîå îáîðóäîâàíèå.

 Êîíôåðåíö-ïàêåò îò 36 ó.å. Âíóòð. êóðñ êîìïàíèè: 1 ó.å.=30 ðóá.

 Äèðåêòîð ïî ðàçâèòèþ áèçíåñà è îðãàíèçàöèè êîðïîðàòèâíûõ ìåðîïðèÿòèé:
 Ñàâðàñîâà Íàòàëüÿ  òåë.: 290~06~21, 290~7241, 290~0066; ô.: 290-06-49
-------------------------------------------
                 Ìåæäóíàðîäíûé 
                 èíôîðìàöèîííî-
                 âûñòàâî÷íûé öåíòð 

                 =- ÈíôîÏðîñòðàíñòâî -=

                 1-é Çà÷àòüåâñêèé ïåð.,4
                 òåë. (095) 290-7-241
                 ôàêñ (095) 202-92-45
\0

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-14  7:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13  5:41 father.dominic
  -- strict thread matches above, loose matches on Subject: below --
2023-05-27 18:46 [PATCH v10 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-07-10  5:23 ` Sajan Karumanchi
2023-07-10 15:58   ` Noah Goldstein
2023-07-14  2:21     ` Re: Noah Goldstein
2023-07-14  7:39     ` Re: sajan karumanchi
2021-06-06 19:19 Davidlohr Bueso
2021-06-07 16:02 ` André Almeida
     [not found] <1481545990-7247-1-git-send-email-adhemerval.zanella@linaro.org>
2016-12-19 18:18 ` Re: Adhemerval Zanella
2005-07-15 21:51 Re: ИнфоПространство

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).