* malloc: Optimize the number of arenas for better application performance
@ 2022-06-28 9:40 Yang Yanchao
2022-06-28 11:18 ` Florian Weimer
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Yang Yanchao @ 2022-06-28 9:40 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, glebfm, ldv, Carlos O'Donell, dj,
siddhesh, linfeilong, liqingqing3
At Kunpeng920 platform, tpcc-mysql scores decreased by about 11.2%
between glibc-2.36 and glibc2.28.
Comparing the code, I find that the two commits causes performance
degradation.
11a02b035b46 (misc: Add __get_nprocs_sched)
97ba273b5057 (linux: __get_nprocs_sched: do not feed CPU_COUNT_S with
garbage [BZ #28850])
These two patches modify the default behavior.
However, my machine is 96 cores and I have 91 cores bound.
It means that perhaps the current way of computing arenas is not optimal.
So I roll back some of the code submitted by 11a02b035(misc: Add
__get_nprocs_sched).
---
malloc/arena.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/malloc/arena.c b/malloc/arena.c
index 0a684a720d..a1ee7928d3 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -937,7 +937,7 @@ arena_get2 (size_t size, mstate avoid_arena)
narenas_limit = mp_.arena_max;
else if (narenas > mp_.arena_test)
{
- int n = __get_nprocs_sched ();
+ int n = __get_nprocs ();
if (n >= 1)
narenas_limit = NARENAS_FROM_NCORES (n);
--
2.33.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: malloc: Optimize the number of arenas for better application performance
2022-06-28 9:40 malloc: Optimize the number of arenas for better application performance Yang Yanchao
@ 2022-06-28 11:18 ` Florian Weimer
2022-06-28 12:38 ` Siddhesh Poyarekar
2022-06-28 13:35 ` Adhemerval Zanella
2022-06-28 18:56 ` DJ Delorie
2 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-06-28 11:18 UTC (permalink / raw)
To: Yang Yanchao via Libc-alpha; +Cc: Yang Yanchao, ldv, linfeilong, siddhesh
* Yang Yanchao via Libc-alpha:
> At Kunpeng920 platform, tpcc-mysql scores decreased by about 11.2%
> between glibc-2.36 and glibc2.28.
> Comparing the code, I find that the two commits causes performance
> degradation.
> 11a02b035b46 (misc: Add __get_nprocs_sched)
> 97ba273b5057 (linux: __get_nprocs_sched: do not feed CPU_COUNT_S with
> garbage [BZ #28850])
>
> These two patches modify the default behavior.
> However, my machine is 96 cores and I have 91 cores bound.
> It means that perhaps the current way of computing arenas is not optimal.
> So I roll back some of the code submitted by 11a02b035(misc: Add
> __get_nprocs_sched).
>
> ---
> malloc/arena.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 0a684a720d..a1ee7928d3 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -937,7 +937,7 @@ arena_get2 (size_t size, mstate avoid_arena)
> narenas_limit = mp_.arena_max;
> else if (narenas > mp_.arena_test)
> {
> - int n = __get_nprocs_sched ();
> + int n = __get_nprocs ();
>
> if (n >= 1)
> narenas_limit = NARENAS_FROM_NCORES (n);
How many threads does tpcc-mysql create?
I wonder if all threads get their own arena with the larger count, and
there is some arena sharing with the smaller count.
Thanks,
Florian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: malloc: Optimize the number of arenas for better application performance
2022-06-28 11:18 ` Florian Weimer
@ 2022-06-28 12:38 ` Siddhesh Poyarekar
0 siblings, 0 replies; 9+ messages in thread
From: Siddhesh Poyarekar @ 2022-06-28 12:38 UTC (permalink / raw)
To: Florian Weimer, Yang Yanchao via Libc-alpha; +Cc: Yang Yanchao, ldv, linfeilong
On 28/06/2022 16:48, Florian Weimer wrote:
> * Yang Yanchao via Libc-alpha:
>
>> At Kunpeng920 platform, tpcc-mysql scores decreased by about 11.2%
>> between glibc-2.36 and glibc2.28.
>> Comparing the code, I find that the two commits causes performance
>> degradation.
>> 11a02b035b46 (misc: Add __get_nprocs_sched)
>> 97ba273b5057 (linux: __get_nprocs_sched: do not feed CPU_COUNT_S with
>> garbage [BZ #28850])
>>
>> These two patches modify the default behavior.
>> However, my machine is 96 cores and I have 91 cores bound.
>> It means that perhaps the current way of computing arenas is not optimal.
>> So I roll back some of the code submitted by 11a02b035(misc: Add
>> __get_nprocs_sched).
>>
>> ---
>> malloc/arena.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/malloc/arena.c b/malloc/arena.c
>> index 0a684a720d..a1ee7928d3 100644
>> --- a/malloc/arena.c
>> +++ b/malloc/arena.c
>> @@ -937,7 +937,7 @@ arena_get2 (size_t size, mstate avoid_arena)
>> narenas_limit = mp_.arena_max;
>> else if (narenas > mp_.arena_test)
>> {
>> - int n = __get_nprocs_sched ();
>> + int n = __get_nprocs ();
>>
>> if (n >= 1)
>> narenas_limit = NARENAS_FROM_NCORES (n);
>
> How many threads does tpcc-mysql create?
>
> I wonder if all threads get their own arena with the larger count, and
> there is some arena sharing with the smaller count.
A simple test to determine this could be to repeat the test with
different values of arena_max to see if there's a trend.
Siddhesh
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: malloc: Optimize the number of arenas for better application performance
2022-06-28 9:40 malloc: Optimize the number of arenas for better application performance Yang Yanchao
2022-06-28 11:18 ` Florian Weimer
@ 2022-06-28 13:35 ` Adhemerval Zanella
2022-06-28 18:56 ` DJ Delorie
2 siblings, 0 replies; 9+ messages in thread
From: Adhemerval Zanella @ 2022-06-28 13:35 UTC (permalink / raw)
To: Yang Yanchao
Cc: libc-alpha, glebfm, ldv, Carlos O'Donell, dj, siddhesh,
linfeilong, liqingqing3
> On 28 Jun 2022, at 06:40, Yang Yanchao <yangyanchao6@huawei.com> wrote:
>
> At Kunpeng920 platform, tpcc-mysql scores decreased by about 11.2% between glibc-2.36 and glibc2.28.
> Comparing the code, I find that the two commits causes performance degradation.
> 11a02b035b46 (misc: Add __get_nprocs_sched)
> 97ba273b5057 (linux: __get_nprocs_sched: do not feed CPU_COUNT_S with garbage [BZ #28850])
>
> These two patches modify the default behavior.
> However, my machine is 96 cores and I have 91 cores bound.
> It means that perhaps the current way of computing arenas is not optimal.
> So I roll back some of the code submitted by 11a02b035(misc: Add __get_nprocs_sched).
>
> ---
> malloc/arena.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 0a684a720d..a1ee7928d3 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -937,7 +937,7 @@ arena_get2 (size_t size, mstate avoid_arena)
> narenas_limit = mp_.arena_max;
> else if (narenas > mp_.arena_test)
> {
> - int n = __get_nprocs_sched ();
> + int n = __get_nprocs ();
>
> if (n >= 1)
> narenas_limit = NARENAS_FROM_NCORES (n);
> --
> 2.33.0
In fact 11a02b035b46 only changed __get_nprocs_sched to call __get_nproc,
33099d72e41c was the one that actually changed __get_nproc to use
sched_getaffinity.
I think it makes sense to get back to old behavior, since this change
was motivated mainly to avoid the malloc call in arena initialization.
The __get_nprocs does not call malloc through opendir anymore, so it should
be safe.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: malloc: Optimize the number of arenas for better application performance
2022-06-28 9:40 malloc: Optimize the number of arenas for better application performance Yang Yanchao
2022-06-28 11:18 ` Florian Weimer
2022-06-28 13:35 ` Adhemerval Zanella
@ 2022-06-28 18:56 ` DJ Delorie
2022-06-28 19:17 ` Adhemerval Zanella
2 siblings, 1 reply; 9+ messages in thread
From: DJ Delorie @ 2022-06-28 18:56 UTC (permalink / raw)
To: Yang Yanchao
Cc: libc-alpha, adhemerval.zanella, glebfm, ldv, carlos, siddhesh,
linfeilong, liqingqing3
Yang Yanchao <yangyanchao6@huawei.com> writes:
> However, my machine is 96 cores and I have 91 cores bound.
One benchmark on one uncommon configuration is not sufficient reason to
change a core tunable. What about other platforms? Other benchmarks?
Other percentages of cores scheduled?
I would reject this patch based solely on the lack of data backing up
your claims.
> - int n = __get_nprocs_sched ();
> + int n = __get_nprocs ();
I've heard complaints about how our code leads to hundreds of arenas on
processes scheduled on only two CPUs. I think using the number of
*schedulable* cores makes more sense than using the number of *unusable*
cores.
I think this change warrants more research.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: malloc: Optimize the number of arenas for better application performance
2022-06-28 18:56 ` DJ Delorie
@ 2022-06-28 19:17 ` Adhemerval Zanella
[not found] ` <1a8f10e034e7489c8e9f090e9c90b396@huawei.com>
0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella @ 2022-06-28 19:17 UTC (permalink / raw)
To: DJ Delorie
Cc: Yang Yanchao, libc-alpha, glebfm, ldv, carlos, siddhesh,
linfeilong, liqingqing3
> On 28 Jun 2022, at 15:56, DJ Delorie <dj@redhat.com> wrote:
>
> Yang Yanchao <yangyanchao6@huawei.com> writes:
>> However, my machine is 96 cores and I have 91 cores bound.
>
> One benchmark on one uncommon configuration is not sufficient reason to
> change a core tunable. What about other platforms? Other benchmarks?
> Other percentages of cores scheduled?
>
> I would reject this patch based solely on the lack of data backing up
> your claims.
>
>> - int n = __get_nprocs_sched ();
>> + int n = __get_nprocs ();
>
> I've heard complaints about how our code leads to hundreds of arenas on
> processes scheduled on only two CPUs. I think using the number of
> *schedulable* cores makes more sense than using the number of *unusable*
> cores.
>
> I think this change warrants more research.
I think this patch make sense mainly because we changed to use the
schedulable cores without much though either. Maybe we can revert
to previous semantic and investigate that using the schedulable
number makes more sense.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-06-29 8:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 9:40 malloc: Optimize the number of arenas for better application performance Yang Yanchao
2022-06-28 11:18 ` Florian Weimer
2022-06-28 12:38 ` Siddhesh Poyarekar
2022-06-28 13:35 ` Adhemerval Zanella
2022-06-28 18:56 ` DJ Delorie
2022-06-28 19:17 ` Adhemerval Zanella
[not found] ` <1a8f10e034e7489c8e9f090e9c90b396@huawei.com>
2022-06-29 2:37 ` 转发: " Qingqing Li
2022-06-29 5:25 ` Siddhesh Poyarekar
2022-06-29 8:05 ` [PATCH] malloc: Optimize the number of arenas for better application performance [BZ# 29296] Yang Yanchao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).