public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
@ 2013-06-14 19:31 carlos at redhat dot com
  2014-06-13 15:05 ` [Bug libc/15630] " fweimer at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: carlos at redhat dot com @ 2013-06-14 19:31 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15630

            Bug ID: 15630
           Summary: Fix use of cpu_set_t with sched_getaffinity when
                    booted on a system with more than 1024 possible cpus.
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: carlos at redhat dot com
                CC: drepper.fsp at gmail dot com

The glibc functions sched_getaffinity and sched_setaffinity have slightly
different semantics than the kernel sched_getaffinity and sched_setaffinity
functions.

The result is that if you boot in a system with more than 1024 possible cpus,
and you use a fixed cpu_set_t with sched_getaffinity, the call will never
succeed and will always return EINVAL. The glibc manual page does not document
sched_getaffinity as returning EINVAL. The call to sched_getaffinity should
always succeed.

This is not a hypothetical problem, I am already seeing users with this
problem.

Let us talk more about the API differences and what can be done in glibc to
mitigate the problem.

The most important difference is that if you call either of the kernel routines
with a cpusetsize that is smaller than the kernel's possible cpu mask size the
kernel routines return EINVAL. The kernel previously did accounting based on
the configured maximum rather than possible cpus, leading to problems if you'd
simply compiled with NR_CPUS > 1024 instead of actually booting on a system
where the low-level firmware detected > 1024 possible CPUs.

There are 3 ways to determine the correct size of the possible cpu mask size:

(a) Read it from sysfs /sys/devices/system/cpu/online, which has the actual
number of possibly online cpus.

(b) Interpret /proc/cpuinfo or /proc/stat.

(c) Call the kernel syscall sched_getaffinity with increasingly larger values
for cpusetsize in an attempt to manually determine the cpu mask size.

Methods (a) and (b) are already used by sysconf(_SC_PROCESSORS_ONLN) to
determine the value to return.

Method (c) is used by sched_setaffinity to determine the size of the kernel
mask and then reject any bits which are set outside of the mask and return
EINVAL.

Method (c) is recommended by a patched RHEL man page [1] for sched_getaffinity,
but that patch has not made it upstream to the Linux Kernel man pages project.

The goal is therefore to make using a fixed cpu_set_t work at all times, but
only support the first 1024 cpus. To support more than 1024 cpus you need to
use the dynamically sized macros and method (a) (if you want all the cpus).

In order to make a fixed cpu_set_t size work all the time the following changes
need to be made to glibc:

(1) Enhance sysconf(_SC_PROCESSORS_ONLN) to additionally use method (c) as a
last resort to determine the number of online cpus. In addition sysconf should
cache the value for the lifetime of the process. The code in sysconf should be
the only place we cache the value (currently we also cache it in
sched_setaffinity).

(2) Cleanup sched_setaffinity to call sysconf to determine the number of online
cpus and use that to check if the incoming bitmask is valid. Additionally if
possible we should check for non-zero entries a long at a time instead of a
byte at a time.

(3) Fix sched_getaffinity and have it call sysconf to determine the number of
online cpus and use that to get the kernel cpu mask affinity values, copying
back the minimum of the sizes, either user or kernel, and zeroing the rest.
This call should never fail.

Static applications can't easily be fixed to work around this problem. The only
solution there is to have the kernel stop returning EINVAL and instead do what
glibc does which is to copy only the part of the buffer that the user
requested. However, doing that would break existing glibc's which rely on
EINVAL to compute the mask size. Therefore changing the kernel semantics are
not a good solution (except on a system-by-system basis in the extreme case
where a single static application was being supported).

Step (3) ensures that using a fixed cpu_set_t size works when you are booted on
hardware that has more than 1024 possible cpus.

Unfortunately it breaks the recommended pattern of using sched_getaffinity and
looking for EINVAL to determine the size of the mask, but this was never a
method that glibc documented or supported. The patched man page has the
starting buffer size of 1024, so at least such a pattern would allow access to
the first 1024 cpus. It is strongly recommended that users use sysconf to
determine the number of possible cpus.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=974679

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
  2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
@ 2014-06-13 15:05 ` fweimer at redhat dot com
  2015-05-18 15:56 ` fweimer at redhat dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 15:05 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15630

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
  2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
  2014-06-13 15:05 ` [Bug libc/15630] " fweimer at redhat dot com
@ 2015-05-18 15:56 ` fweimer at redhat dot com
  2015-05-18 18:37 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2015-05-18 15:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15630

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
Maybe addressing this should also fix this design issue with the cpu_set_t
functions:

CPU_ALLOC returns a set which is capable of storing more CPUs than the user
requested because the request is rounded up to the next multiple of sizeof
(long) * CHAR_BIT CPUs.  This is reflected in the return value of
CPU_ALLOC_SIZE.  As a result, the sched_getaffinity call in

  set = CPU_ALLOC (count);
  sched_getaffinity (0, CPU_ALLOC_SIZE (count), set);

can succeed even if count is smaller than the maximum number of relevant CPUs. 
This defeats the purpose of the EINVAL error return code in case the specified
CPU set is not large enough.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
  2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
  2014-06-13 15:05 ` [Bug libc/15630] " fweimer at redhat dot com
  2015-05-18 15:56 ` fweimer at redhat dot com
@ 2015-05-18 18:37 ` fweimer at redhat dot com
  2015-05-18 18:59 ` fweimer at redhat dot com
  2015-05-20  2:54 ` carlos at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2015-05-18 18:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15630

--- Comment #3 from Florian Weimer <fweimer at redhat dot com> ---
Additional details:

Sadly, sysconf(_SC_PROCESSORS_ONLN) not the number of CPUs which is relevant to
the sched_getaffinity system call.  I have system which reports
_NPROCESSORS_ONLN and _NPROCESSORS_CONF as 40 (and /proc/cpuinfo and /proc/stat
match that), yet calling sched_getaffinity with small arguments fails:

[pid  3420] sched_getaffinity(0, 8, 0x146b010) = -1 EINVAL (Invalid argument)
[pid  3420] sched_getaffinity(0, 16, 0x146b010) = -1 EINVAL (Invalid argument)
[pid  3420] sched_getaffinity(0, 32, {ffffffffff, 0, 0, 0}) = 32

The kernel seems to operate with a nr_cpu_ids value of 240:

kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:240 nr_cpu_ids:240
nr_node_ids:2
kernel:         RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=240.

nr_cpu_ids is not directly exposed to user space, I think, so the EINVAL
behavior is pretty much required.

If other systems have such inflated nr_cpu_ids values, this could become an
issue well before systems with 1024 hardware threads become common.

My laptop has nr_cpu_ids=8 (4 hardware threads), another server has
nr_cpu_ids=144, with the same number of hardware threads.  I don't think it's
prudent to assume that nr_cpu_ids value remains constant after boot, either.

So I wonder if the premise of this bug report (we can get rid of the EINVAL
return value and truncate results) is correct.  There is no other way to obtain
the magic constant, and even if you grab the number from somewhere, it may be
racy with regards to system reconfiguration.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
  2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
                   ` (2 preceding siblings ...)
  2015-05-18 18:37 ` fweimer at redhat dot com
@ 2015-05-18 18:59 ` fweimer at redhat dot com
  2015-05-20  2:54 ` carlos at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2015-05-18 18:59 UTC (permalink / raw)
  To: glibc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 3051 bytes --]

https://sourceware.org/bugzilla/show_bug.cgi?id=15630

--- Comment #4 from Florian Weimer <fweimer at redhat dot com> ---
A potential solution would accept smaller size arguments for sched_getaffinity,
as long as the other bits on the kernel side a cleared.  Tools such as taskset
could then restrict the CPU set to the supported CPUs for legacy applications.

This is what we do with RLIMIT_NOFILE—it is still 1024 on many systems to
prevent issues with the select function and the default FD_SETSIZE value. 
Unfortunately, it is somewhat more restrictive.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-28294-listarch-glibc-bugs=sources.redhat.com@sourceware.org Mon May 18 19:14:58 2015
Return-Path: <glibc-bugs-return-28294-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 82776 invoked by alias); 18 May 2015 19:14:58 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 82736 invoked by uid 48); 18 May 2015 19:14:54 -0000
From: "fweimer at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
Date: Mon, 18 May 2015 19:14:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: libc
X-Bugzilla-Version: 2.18
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: fweimer at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security-
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-15630-131-YXoCEITIm0@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-15630-131@http.sourceware.org/bugzilla/>
References: <bug-15630-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-05/txt/msg00159.txt.bz2
Content-length: 375

https://sourceware.org/bugzilla/show_bug.cgi?id\x15630

--- Comment #5 from Florian Weimer <fweimer at redhat dot com> ---
Created attachment 8322
  --> https://sourceware.org/bugzilla/attachment.cgi?idƒ22&actioníit
Test case which reflects the truncation assumption (does not currently work)

--
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libc/15630] Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.
  2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
                   ` (3 preceding siblings ...)
  2015-05-18 18:59 ` fweimer at redhat dot com
@ 2015-05-20  2:54 ` carlos at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: carlos at redhat dot com @ 2015-05-20  2:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15630

--- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Florian Weimer from comment #2)
> Maybe addressing this should also fix this design issue with the cpu_set_t
> functions:
> 
> CPU_ALLOC returns a set which is capable of storing more CPUs than the user
> requested because the request is rounded up to the next multiple of sizeof
> (long) * CHAR_BIT CPUs.  This is reflected in the return value of
> CPU_ALLOC_SIZE.  As a result, the sched_getaffinity call in
> 
>   set = CPU_ALLOC (count);
>   sched_getaffinity (0, CPU_ALLOC_SIZE (count), set);
> 
> can succeed even if count is smaller than the maximum number of relevant
> CPUs.  This defeats the purpose of the EINVAL error return code in case the
> specified CPU set is not large enough.

Agreed. That is a flaw. The allocation should certainly be large enough to hold
all possible CPUs ever (nr_cpumask_bits in kernel speak).

(In reply to Florian Weimer from comment #3)
> Additional details:
> 
> Sadly, sysconf(_SC_PROCESSORS_ONLN) not the number of CPUs which is relevant
> to the sched_getaffinity system call.  I have system which reports
> _NPROCESSORS_ONLN and _NPROCESSORS_CONF as 40 (and /proc/cpuinfo and
> /proc/stat match that), yet calling sched_getaffinity with small arguments
> fails:
> 
> [pid  3420] sched_getaffinity(0, 8, 0x146b010) = -1 EINVAL (Invalid argument)
> [pid  3420] sched_getaffinity(0, 16, 0x146b010) = -1 EINVAL (Invalid
> argument)
> [pid  3420] sched_getaffinity(0, 32, {ffffffffff, 0, 0, 0}) = 32
> 
> The kernel seems to operate with a nr_cpu_ids value of 240:
> 
> kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:240 nr_cpu_ids:240
> nr_node_ids:2
> kernel:         RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=240.
> 
> nr_cpu_ids is not directly exposed to user space, I think, so the EINVAL
> behavior is pretty much required.

It is not. We need to get access to nr_cpumask_bits, and likely the only way to
do that is read it from proc like some of the code does today.

See this thread:
https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-20  2:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-14 19:31 [Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus carlos at redhat dot com
2014-06-13 15:05 ` [Bug libc/15630] " fweimer at redhat dot com
2015-05-18 15:56 ` fweimer at redhat dot com
2015-05-18 18:37 ` fweimer at redhat dot com
2015-05-18 18:59 ` fweimer at redhat dot com
2015-05-20  2:54 ` carlos at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).