public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
@ 2023-10-11 16:43 Adhemerval Zanella
  2023-10-11 18:54 ` DJ Delorie
  2023-11-21 22:41 ` DJ Delorie
  0 siblings, 2 replies; 7+ messages in thread
From: Adhemerval Zanella @ 2023-10-11 16:43 UTC (permalink / raw)
  To: libc-alpha

This restore the 2.33 semantic for arena_get2.  It was changed by
11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc).  The
__get_nproc was refactored over then and now it also avoid to call
malloc.

The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly.  The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.

Checked on x86_64-linux-gnu.
---
 include/sys/sysinfo.h                 | 4 ----
 malloc/arena.c                        | 2 +-
 misc/getsysstats.c                    | 6 ------
 sysdeps/mach/getsysstats.c            | 6 ------
 sysdeps/unix/sysv/linux/getsysstats.c | 2 +-
 5 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/include/sys/sysinfo.h b/include/sys/sysinfo.h
index c490561581..65742b1036 100644
--- a/include/sys/sysinfo.h
+++ b/include/sys/sysinfo.h
@@ -14,10 +14,6 @@ libc_hidden_proto (__get_nprocs_conf)
 extern int __get_nprocs (void);
 libc_hidden_proto (__get_nprocs)
 
-/* Return the number of available processors which the process can
-   be scheduled.  */
-extern int __get_nprocs_sched (void) attribute_hidden;
-
 /* Return number of physical pages of memory in the system.  */
 extern long int __get_phys_pages (void);
 libc_hidden_proto (__get_phys_pages)
diff --git a/malloc/arena.c b/malloc/arena.c
index 6f03955ff2..82b09adb47 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -820,7 +820,7 @@ arena_get2 (size_t size, mstate avoid_arena)
             narenas_limit = mp_.arena_max;
           else if (narenas > mp_.arena_test)
             {
-              int n = __get_nprocs_sched ();
+              int n = __get_nprocs ();
 
               if (n >= 1)
                 narenas_limit = NARENAS_FROM_NCORES (n);
diff --git a/misc/getsysstats.c b/misc/getsysstats.c
index 5f36adc0e8..23cc112074 100644
--- a/misc/getsysstats.c
+++ b/misc/getsysstats.c
@@ -44,12 +44,6 @@ weak_alias (__get_nprocs, get_nprocs)
 link_warning (get_nprocs, "warning: get_nprocs will always return 1")
 
 
-int
-__get_nprocs_sched (void)
-{
-  return 1;
-}
-
 long int
 __get_phys_pages (void)
 {
diff --git a/sysdeps/mach/getsysstats.c b/sysdeps/mach/getsysstats.c
index 5184e5eee1..d3834f3b69 100644
--- a/sysdeps/mach/getsysstats.c
+++ b/sysdeps/mach/getsysstats.c
@@ -62,12 +62,6 @@ __get_nprocs (void)
 libc_hidden_def (__get_nprocs)
 weak_alias (__get_nprocs, get_nprocs)
 
-int
-__get_nprocs_sched (void)
-{
-  return __get_nprocs ();
-}
-
 /* Return the number of physical pages on the system. */
 long int
 __get_phys_pages (void)
diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
index b0b6c154ac..1ea7f1f01f 100644
--- a/sysdeps/unix/sysv/linux/getsysstats.c
+++ b/sysdeps/unix/sysv/linux/getsysstats.c
@@ -29,7 +29,7 @@
 #include <sys/sysinfo.h>
 #include <sysdep.h>
 
-int
+static int
 __get_nprocs_sched (void)
 {
   enum
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 16:43 [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945) Adhemerval Zanella
@ 2023-10-11 18:54 ` DJ Delorie
  2023-10-11 19:10   ` Adhemerval Zanella Netto
  2023-11-21 22:41 ` DJ Delorie
  1 sibling, 1 reply; 7+ messages in thread
From: DJ Delorie @ 2023-10-11 18:54 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha


Part of the reason for using get_nprocs_sched() is that cpu affinity can
limit the application to fewer processors, and the logic needs to take
that into account.

If you have a 192-core system but use setaffinity to limit a process to
4 cores, there will be *way* too many arenas created.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 18:54 ` DJ Delorie
@ 2023-10-11 19:10   ` Adhemerval Zanella Netto
  2023-10-11 20:34     ` DJ Delorie
  2023-10-30 21:28     ` DJ Delorie
  0 siblings, 2 replies; 7+ messages in thread
From: Adhemerval Zanella Netto @ 2023-10-11 19:10 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha



On 11/10/23 15:54, DJ Delorie wrote:
> 
> Part of the reason for using get_nprocs_sched() is that cpu affinity can
> limit the application to fewer processors, and the logic needs to take
> that into account.
> 
> If you have a 192-core system but use setaffinity to limit a process to
> 4 cores, there will be *way* too many arenas created.
> 

Yes, that was your rationale on the previous attempt [1].  Maybe adding a
tunable, so if you are using affinity you can change thie behavior? 

In any case, I think we should either close the bug or fix this and work
on a better heuristic to handle process that set affinity.

[1] https://sourceware.org/pipermail/libc-alpha/2022-June/140161.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 19:10   ` Adhemerval Zanella Netto
@ 2023-10-11 20:34     ` DJ Delorie
  2023-10-18 12:44       ` Adhemerval Zanella Netto
  2023-10-30 21:28     ` DJ Delorie
  1 sibling, 1 reply; 7+ messages in thread
From: DJ Delorie @ 2023-10-11 20:34 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
> Yes, that was your rationale on the previous attempt [1].  Maybe adding a
> tunable, so if you are using affinity you can change thie behavior? 

We have a tunable to set the max arenas already.

Measuring the number of runnable CPUs seems more of a "correctness"
thing to me.  This code is only called when we need to create a new
arena, so it's not in a performance-critical path.  It's more important
to get this right so that the rest of the code maximizes performance.

> In any case, I think we should either close the bug or fix this and work
> on a better heuristic to handle process that set affinity.

It sound like we need something between get_nprocs and get_nprocs_sched
that counts the number of cores the program is assigned, not the number
of cores the thread is assigned?  I thought that's what
sched_getaffinity() did, but the man page says "process" most of the
time, but sometimes "thread".


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 20:34     ` DJ Delorie
@ 2023-10-18 12:44       ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 7+ messages in thread
From: Adhemerval Zanella Netto @ 2023-10-18 12:44 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha



On 11/10/23 17:34, DJ Delorie wrote:
> Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>> Yes, that was your rationale on the previous attempt [1].  Maybe adding a
>> tunable, so if you are using affinity you can change thie behavior? 
> 
> We have a tunable to set the max arenas already.
> 
> Measuring the number of runnable CPUs seems more of a "correctness"
> thing to me.  This code is only called when we need to create a new
> arena, so it's not in a performance-critical path.  It's more important
> to get this right so that the rest of the code maximizes performance.

I agree, my main problem with this change it was originally unintentional.
My suggestion would to revert it back, and proper change with a meaningful
commit message that it is intentional and process that use setaffinity 
and see contention increase the max arenas.

> 
>> In any case, I think we should either close the bug or fix this and work
>> on a better heuristic to handle process that set affinity.
> 
> It sound like we need something between get_nprocs and get_nprocs_sched
> that counts the number of cores the program is assigned, not the number
> of cores the thread is assigned?  I thought that's what
> sched_getaffinity() did, but the man page says "process" most of the
> time, but sometimes "thread".
> 

The sched_getaffinity on Linux is essentially pthread_getaffinity_np,
since the cpu mask is per process attribute.  And I couldn't find any
procfs way to access the AND mask of all mask withins the thread group.

And I see no easy way to accomplish it, iterating over the /proc/self/task
is way too slow and do not scale.  Best option I can think of is add
a global mask and keep track of set bits on each sched_setaffinity
successful call; but it would require at least some global locking to
avoid racy conditions.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 19:10   ` Adhemerval Zanella Netto
  2023-10-11 20:34     ` DJ Delorie
@ 2023-10-30 21:28     ` DJ Delorie
  1 sibling, 0 replies; 7+ messages in thread
From: DJ Delorie @ 2023-10-30 21:28 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
> On 11/10/23 15:54, DJ Delorie wrote:
>> 
>> Part of the reason for using get_nprocs_sched() is that cpu affinity can
>> limit the application to fewer processors, and the logic needs to take
>> that into account.
>> 
>> If you have a 192-core system but use setaffinity to limit a process to
>> 4 cores, there will be *way* too many arenas created.
>> 
>
> Yes, that was your rationale on the previous attempt [1].  Maybe adding a
> tunable, so if you are using affinity you can change thie behavior? 
>
> In any case, I think we should either close the bug or fix this and work
> on a better heuristic to handle process that set affinity.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2022-June/140161.html

Rather than let this issue rot, I'm withdrawing my dissent.  I still
think we need to resolve this issue with something between "nprocs" and
"thread procs" - perhaps store the sched mask at startup and use that?
But that could be deferred to some new development.

I think the max-arenas tunable is more useful if you err on the side of
too many arenas, too.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945)
  2023-10-11 16:43 [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945) Adhemerval Zanella
  2023-10-11 18:54 ` DJ Delorie
@ 2023-11-21 22:41 ` DJ Delorie
  1 sibling, 0 replies; 7+ messages in thread
From: DJ Delorie @ 2023-11-21 22:41 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
> @@ -29,7 +29,7 @@
>  #include <sys/sysinfo.h>
>  #include <sysdep.h>
>  
> -int
> +static int
>  __get_nprocs_sched (void)
>  {
>    enum

So we're making this function internal again... ok.

> diff --git a/include/sys/sysinfo.h b/include/sys/sysinfo.h
>  
> -/* Return the number of available processors which the process can
> -   be scheduled.  */
> -extern int __get_nprocs_sched (void) attribute_hidden;
> -

which means we no longer need this declaration.  Ok.

> diff --git a/malloc/arena.c b/malloc/arena.c
> @@ -820,7 +820,7 @@ arena_get2 (size_t size, mstate avoid_arena)
>              narenas_limit = mp_.arena_max;
>            else if (narenas > mp_.arena_test)
>              {
> -              int n = __get_nprocs_sched ();
> +              int n = __get_nprocs ();

And can't use it here.  Ok.

> diff --git a/misc/getsysstats.c b/misc/getsysstats.c
> @@ -44,12 +44,6 @@ weak_alias (__get_nprocs, get_nprocs)
>  link_warning (get_nprocs, "warning: get_nprocs will always return 1")
>  
>  
> -int
> -__get_nprocs_sched (void)
> -{
> -  return 1;
> -}
> -

Nor need it here.  Ok.


> diff --git a/sysdeps/mach/getsysstats.c b/sysdeps/mach/getsysstats.c
> @@ -62,12 +62,6 @@ __get_nprocs (void)
>  libc_hidden_def (__get_nprocs)
>  weak_alias (__get_nprocs, get_nprocs)
>  
> -int
> -__get_nprocs_sched (void)
> -{
> -  return __get_nprocs ();
> -}
> -

Or here.  Ok.


LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-21 22:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 16:43 [PATCH] malloc: Use __get_nprocs on arena_get2 (BZ 30945) Adhemerval Zanella
2023-10-11 18:54 ` DJ Delorie
2023-10-11 19:10   ` Adhemerval Zanella Netto
2023-10-11 20:34     ` DJ Delorie
2023-10-18 12:44       ` Adhemerval Zanella Netto
2023-10-30 21:28     ` DJ Delorie
2023-11-21 22:41 ` DJ Delorie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).