public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Update mmap() flags and errors lists
@ 2024-05-10 18:59 DJ Delorie
  2024-06-04 22:16 ` Florian Weimer
  0 siblings, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-05-10 18:59 UTC (permalink / raw)
  To: libc-alpha


[DJ - information taken from various sources, including man pages
(which I read, summarized in my notes, ignored for a while, then
rewrote from my notes and kernel sources - "how to take advantage of
bad memory" ;) and kernel sources (linux and hurd).  I contemplated
adding a table cross-referencing each flag with the kernels that
support them and versions introduced, but decided that was too much
work and detail for the results desired.]

[patch starts here]

Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)

Extend the list of errno codes.

diff --git a/manual/llio.texi b/manual/llio.texi
index fae49d1433..2086e04afd 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1573,10 +1573,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
 of address space for future use.  The @code{mprotect} function can be
 used to change the protection flags.  @xref{Memory Protection}.
 
-@var{flags} contains flags that control the nature of the map.
-One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
+@var{flags} contains flags that control the nature of the map.  One of
+@code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or @code{MAP_PRIVATE}
+must be specified.  Additional flags may be bitwise OR'd to further
+define the mapping.
 
-They include:
+Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
+all flags are supported on all versions of all operating systems.
+Consult the kernel-specific documenation for details.  The flags
+include:
 
 @vtable @code
 @item MAP_PRIVATE
@@ -1598,9 +1603,18 @@ Note that actual writing may take place at any time.  You need to use
 @code{msync}, described below, if it is important that other processes
 using conventional I/O get a consistent view of the file.
 
+@item MAP_SHARED_VALIDATE
+Similar to @code{MAP_SHARED} except that additional flags will be
+validated by the kernel, and the call will fail if an unrecognized
+flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
+that doesn't support it causes the flag to be ignored.
+@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
+flags is required.
+
 @item MAP_FIXED
 This forces the system to use the exact mapping address specified in
-@var{address} and fail if it can't.
+@var{address} and fail if it can't.  Note that if the new mapping
+would overlap an existing mapping, the existing map is unmapped.
 
 @c One of these is official - the other is obviously an obsolete synonym
 @c Which is which?
@@ -1638,13 +1652,79 @@ Not all file systems support mappings with an increased page size.
 
 The @code{MAP_HUGETLB} flag is specific to Linux.
 
-@c There is a mechanism to select different hugepage sizes; see
-@c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
-
-@c Linux has some other MAP_ options, which I have not discussed here.
-@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
-@c user programs (and I don't understand the last two).  MAP_LOCKED does
-@c not appear to be implemented.
+@item MAP_HUGE_16KB
+@dots{}
+@item MAP_HUGE_16GB
+Some architectures support more than one size of ``huge'' pages for
+@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
+them.  Note that while the ABI allows the caller to specify arbitrary
+page sizes, not all sizes have corresponding defined macros, and not
+all defined macros correspond to sizes supported by the kernel.  It is
+up to the programmer to only ask for huge page sizes that are known to
+be supported.
+
+@item MAP_32BIT
+Require addresses that can be accessed with a 32 bit pointer, i.e.,
+within the first 4 GiB.  Ignored if MAP_FIXED is specified.
+
+@item MAP_DENYWRITE
+@item MAP_EXECUTABLE
+@item MAP_FILE
+
+Provided for compatibility.  Ignored by the Linux kernel.
+
+@item MAP_FIXED_NOREPLACE
+Similar to @code{MAP_FIXED} except the call will fail with
+@code{EEXIST} if the new mapping would overwrite an existing mapping.
+
+@item MAP_GROWSDOWN
+This flag is used to make stacks, and is typically only needed inside
+the program loader to set up the main stack and thread stacks for the
+running process.  The mapping is created according to the other flags,
+except an additional page just prior to the mapping is marked as a
+``guard page''.  If a write is attempted inside this guard page, that
+page is mapped, the mapping is extended, and a new guard page is
+created.  Thus, the mapping continues to grow towards lower addresses
+until it encounters some other mapping.
+
+@item MAP_LOCKED
+Requests that mapped pages are locked in memory (i.e. not paged out).
+Note that this is a request and not a requirement; use @code{mlock} if
+locking is mandatory.
+
+@item MAP_POPULATE
+@item MAP_NONBLOCK
+These two are opposites.  @code{MAP_POPULATE} requests that the kernel
+read-ahead a file-backed mapping, causing more pages to be mapped
+before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
+@emph{not} attempt such, only mapping pages when they're actually
+needed.
+
+@item MAP_NORESERVE
+Asks the kernel to not reserve physical backing for a mapping.  This
+would be useful for, for example, a very large but sparsely used
+mapping which need not be limited in span by available RAM or swap.
+Note that writes to such a mapping may cause a @code{SIGSEGV} if the
+amount of backing required eventualy exceeds system resources.
+
+On Linux, this flag's behavior may be overwridden by
+@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).
+
+@item MAP_STACK
+Ensures that the resulting mapping is suitable for use as a program
+stack.  For example, the use of huge pages might be precluded.
+
+@item MAP_SYNC
+This flag is used to map persistent memory devices into the running
+program in such a way that writes to the mapping are immediately
+written to the device as well.  Unlike most other flags, this one will
+fail unless @code{MAP_SHARED_VALIDATE} is also given.
+
+@item MAP_UNINITIALIZED
+This flag allows the kernel to map anonymous pages without zeroing
+them out first.  This is, of course, a security risk, and will only
+work if the kernel is built to allow it (typically, on single-process
+embedded systems).
 
 @end vtable
 
@@ -1655,6 +1735,24 @@ Possible errors include:
 
 @table @code
 
+@item EACCES
+
+@var{filedes} was not open for the type of access specified in @var{protect}.
+
+@item EAGAIN
+
+Either the underlying file is locked, or the system has temporarily
+run out of resources.
+
+@item EBADF
+
+The @var{fd} passes is invalid, and a valid file descriptor is required.
+
+@item EEXIST
+
+@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
+found in the requested address range.
+
 @item EINVAL
 
 Either @var{address} was unusable (because it is not a multiple of the
@@ -1663,28 +1761,35 @@ applicable page size), or inconsistent @var{flags} were given.
 If @code{MAP_HUGETLB} was specified, the file or system does not support
 large page sizes.
 
-@item EACCES
+@item ENFILE
 
-@var{filedes} was not open for the type of access specified in @var{protect}.
+There are too many open files in the system.
+
+@item ENODEV
+
+This file is of a type that doesn't support mapping.
 
 @item ENOMEM
 
 Either there is not enough memory for the operation, or the process is
 out of address space.
 
-@item ENODEV
-
-This file is of a type that doesn't support mapping.
-
 @item ENOEXEC
 
 The file is on a filesystem that doesn't support mapping.
 
-@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
-@c However mandatory locks are not discussed in this manual.
-@c
-@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented
-@c here) is used and the file is already open for writing.
+@item EPERM
+
+@item EOVERFLOW
+
+Either the offset into the file causes the page counts to exceed the
+range of a 32 bit number, or the offset requested exceeds the length
+of the file.
+
+@item ETXTBSY
+
+@code{MAP_DENYWRITE} was specified, but the file descriptor given was
+open for writing.
 
 @end table
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-05-10 18:59 Update mmap() flags and errors lists DJ Delorie
@ 2024-06-04 22:16 ` Florian Weimer
  2024-06-05  4:10   ` DJ Delorie
  2024-06-05  4:11   ` [v2] " DJ Delorie
  0 siblings, 2 replies; 20+ messages in thread
From: Florian Weimer @ 2024-06-04 22:16 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

* DJ Delorie:

> [DJ - information taken from various sources, including man pages
> (which I read, summarized in my notes, ignored for a while, then
> rewrote from my notes and kernel sources - "how to take advantage of
> bad memory" ;) and kernel sources (linux and hurd).  I contemplated
> adding a table cross-referencing each flag with the kernels that
> support them and versions introduced, but decided that was too much
> work and detail for the results desired.]
>
> [patch starts here]
>
> Extend the list of MAP_* macros to include all macros available
> to the average program (gcc -E -dM | grep MAP_*)
>
> Extend the list of errno codes.
>
> diff --git a/manual/llio.texi b/manual/llio.texi
> index fae49d1433..2086e04afd 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1573,10 +1573,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
>  of address space for future use.  The @code{mprotect} function can be
>  used to change the protection flags.  @xref{Memory Protection}.
>  
> -@var{flags} contains flags that control the nature of the map.
> -One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
> +@var{flags} contains flags that control the nature of the map.  One of
> +@code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or @code{MAP_PRIVATE}
> +must be specified.  Additional flags may be bitwise OR'd to further
> +define the mapping.

While you are adding this, please avoid starting a sentence with @var,
so something like:

  [The] @var{flags} [parameter] contains …

> -They include:
> +Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
> +all flags are supported on all versions of all operating systems.
> +Consult the kernel-specific documenation for details.  The flags
> +include:

typo: documen[t]ation

> +@item MAP_SHARED_VALIDATE
> +Similar to @code{MAP_SHARED} except that additional flags will be
> +validated by the kernel, and the call will fail if an unrecognized
> +flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
> +that doesn't support it causes the flag to be ignored.
> +@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
> +flags is required.

This leads to the question what to do if you want this checking behavior
with MAP_PRIVATE instead of MAP_SHARED.

> +
>  @item MAP_FIXED
>  This forces the system to use the exact mapping address specified in
> -@var{address} and fail if it can't.
> +@var{address} and fail if it can't.  Note that if the new mapping
> +would overlap an existing mapping, the existing map is unmapped.

This is misleading, I believe.  The overlapping part is replaced with
the new mapping.  If the overlap is incomplete, part of the previous
mapping remains.

> +@item MAP_HUGE_16KB
> +@dots{}
> +@item MAP_HUGE_16GB
> +Some architectures support more than one size of ``huge'' pages for
> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
> +them.  Note that while the ABI allows the caller to specify arbitrary
> +page sizes, not all sizes have corresponding defined macros, and not
> +all defined macros correspond to sizes supported by the kernel.  It is
> +up to the programmer to only ask for huge page sizes that are known to
> +be supported.

These we do not support?  (We probably should.)

> +@item MAP_32BIT
> +Require addresses that can be accessed with a 32 bit pointer, i.e.,
> +within the first 4 GiB.  Ignored if MAP_FIXED is specified.
> +
> +@item MAP_DENYWRITE
> +@item MAP_EXECUTABLE
> +@item MAP_FILE
> +
> +Provided for compatibility.  Ignored by the Linux kernel.

I thought that some corner cases still handle MAP_DENYWRITE?

> +@item MAP_FIXED_NOREPLACE
> +Similar to @code{MAP_FIXED} except the call will fail with
> +@code{EEXIST} if the new mapping would overwrite an existing mapping.

How does this interact with MAP_SHARED_VALIDATE above?  Can it be
combined with MAP_FIXED?

> +@item MAP_GROWSDOWN
> +This flag is used to make stacks, and is typically only needed inside
> +the program loader to set up the main stack and thread stacks for the
> +running process.  The mapping is created according to the other flags,
> +except an additional page just prior to the mapping is marked as a
> +``guard page''.  If a write is attempted inside this guard page, that
> +page is mapped, the mapping is extended, and a new guard page is
> +created.  Thus, the mapping continues to grow towards lower addresses
> +until it encounters some other mapping.

Maybe reference -fstack-clash-protection, and note that @theglibc{} does
not use this for thread stacks?

> +@item MAP_LOCKED
> +Requests that mapped pages are locked in memory (i.e. not paged out).
> +Note that this is a request and not a requirement; use @code{mlock} if
> +locking is mandatory.
> +
> +@item MAP_POPULATE
> +@item MAP_NONBLOCK
> +These two are opposites.  @code{MAP_POPULATE} requests that the kernel
> +read-ahead a file-backed mapping, causing more pages to be mapped
> +before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
> +@emph{not} attempt such, only mapping pages when they're actually
> +needed.

MAP_POPULATE is just a hint, right?  And even with mlockall, or
MAP_LOCKED, it does not guarantee the absence of future page faults.

> +@item MAP_NORESERVE
> +Asks the kernel to not reserve physical backing for a mapping.  This
> +would be useful for, for example, a very large but sparsely used
> +mapping which need not be limited in span by available RAM or swap.
> +Note that writes to such a mapping may cause a @code{SIGSEGV} if the
> +amount of backing required eventualy exceeds system resources.
> +
> +On Linux, this flag's behavior may be overwridden by
> +@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).

Shoud @xref the man-pages section added in the other patch.  However,
swap(5) does not appear to exist?

> +@item MAP_STACK
> +Ensures that the resulting mapping is suitable for use as a program
> +stack.  For example, the use of huge pages might be precluded.
> +
> +@item MAP_SYNC
> +This flag is used to map persistent memory devices into the running
> +program in such a way that writes to the mapping are immediately
> +written to the device as well.  Unlike most other flags, this one will
> +fail unless @code{MAP_SHARED_VALIDATE} is also given.

Is this about DAX?

> +@item MAP_UNINITIALIZED
> +This flag allows the kernel to map anonymous pages without zeroing
> +them out first.  This is, of course, a security risk, and will only
> +work if the kernel is built to allow it (typically, on single-process
> +embedded systems).
>  
>  @end vtable
>  
> @@ -1655,6 +1735,24 @@ Possible errors include:
>  
>  @table @code
>  
> +@item EACCES
> +
> +@var{filedes} was not open for the type of access specified in @var{protect}.
> +
> +@item EAGAIN
> +
> +Either the underlying file is locked, or the system has temporarily
> +run out of resources.

See below, I think the reference about locking is spurious.

> +@item EBADF
> +
> +The @var{fd} passes is invalid, and a valid file descriptor is required.

Is a file descriptor ever required?

> +@item EEXIST
> +
> +@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
> +found in the requested address range.

See my comment above for MAP_FIXED_NOREPLACE.

>  @item EINVAL
>  
>  Either @var{address} was unusable (because it is not a multiple of the
> @@ -1663,28 +1761,35 @@ applicable page size), or inconsistent @var{flags} were given.
>  If @code{MAP_HUGETLB} was specified, the file or system does not support
>  large page sizes.
>  
> -@item EACCES
> +@item ENFILE
>  
> -@var{filedes} was not open for the type of access specified in @var{protect}.
> +There are too many open files in the system.

Can this error actually happen?  It's a bit surprising.

> +@item ENODEV
> +
> +This file is of a type that doesn't support mapping.
>  
>  @item ENOMEM
>  
>  Either there is not enough memory for the operation, or the process is
>  out of address space.

This should probably reference vm.max_map_count.

> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
> -@c However mandatory locks are not discussed in this manual.

Mandatory locks are disabled in pretty much all kernels out there, no?

> +@item EOVERFLOW
> +
> +Either the offset into the file causes the page counts to exceed the
> +range of a 32 bit number, or the offset requested exceeds the length
> +of the file.

The reference to page size may be incorrect.  I think it's a fixed
offset regardless of page size on systems that can't pass a 64-bit file
offset.

> +@item ETXTBSY
> +
> +@code{MAP_DENYWRITE} was specified, but the file descriptor given was
> +open for writing.

This seems to contradict the earlier suggestion that MAP_DENYWRITE is
ignored.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-06-04 22:16 ` Florian Weimer
@ 2024-06-05  4:10   ` DJ Delorie
  2024-06-05  6:38     ` Florian Weimer
  2024-06-05  4:11   ` [v2] " DJ Delorie
  1 sibling, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-05  4:10 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha


v2 will follow...

Florian Weimer <fweimer@redhat.com> writes:
> While you are adding this, please avoid starting a sentence with @var,
> so something like:
>
>   [The] @var{flags} [parameter] contains …

Fixed.

>> -They include:
>> +Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
>> +all flags are supported on all versions of all operating systems.
>> +Consult the kernel-specific documenation for details.  The flags
>> +include:
>
> typo: documen[t]ation

Fixed.

>> +@item MAP_SHARED_VALIDATE
>> +Similar to @code{MAP_SHARED} except that additional flags will be
>> +validated by the kernel, and the call will fail if an unrecognized
>> +flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
>> +that doesn't support it causes the flag to be ignored.
>> +@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
>> +flags is required.
>
> This leads to the question what to do if you want this checking behavior
> with MAP_PRIVATE instead of MAP_SHARED.

I didn't write the spec ;-)

>>  @item MAP_FIXED
>>  This forces the system to use the exact mapping address specified in
>> -@var{address} and fail if it can't.
>> +@var{address} and fail if it can't.  Note that if the new mapping
>> +would overlap an existing mapping, the existing map is unmapped.
>
> This is misleading, I believe.  The overlapping part is replaced with
> the new mapping.  If the overlap is incomplete, part of the previous
> mapping remains.

Reworded.

>> +@item MAP_HUGE_16KB
>> +@dots{}
>> +@item MAP_HUGE_16GB
>> +Some architectures support more than one size of ``huge'' pages for
>> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
>> +them.  Note that while the ABI allows the caller to specify arbitrary
>> +page sizes, not all sizes have corresponding defined macros, and not
>> +all defined macros correspond to sizes supported by the kernel.  It is
>> +up to the programmer to only ask for huge page sizes that are known to
>> +be supported.
>
> These we do not support?  (We probably should.)

The ABI is a 6-bit bitfield giving the biased bit width of the page
size.  Not all combinations have macros, and not all combinations are
honored by the kernel.  We do have macros for the combinations that the
kernel honors.  So if the kernel can do it, it works, but if the kernel
can't do it, either you get a runtime error or a compile time error ;-)

I suppose we could list all 64 possible macros in our headers, but at
the moment, we don't.

>> +@item MAP_32BIT
>> +Require addresses that can be accessed with a 32 bit pointer, i.e.,
>> +within the first 4 GiB.  Ignored if MAP_FIXED is specified.
>> +
>> +@item MAP_DENYWRITE
>> +@item MAP_EXECUTABLE
>> +@item MAP_FILE
>> +
>> +Provided for compatibility.  Ignored by the Linux kernel.
>
> I thought that some corner cases still handle MAP_DENYWRITE?

Nope, completely ignored by the kernel.

>> +@item MAP_FIXED_NOREPLACE
>> +Similar to @code{MAP_FIXED} except the call will fail with
>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>
> How does this interact with MAP_SHARED_VALIDATE above?  Can it be
> combined with MAP_FIXED?

Superset of MAP_FIXED, so it's internally *always* combined:
        /* force arch specific MAP_FIXED handling in get_unmapped_area */
        if (flags & MAP_FIXED_NOREPLACE)
                flags |= MAP_FIXED;

I would assume it interacts with MAP_SHARED_VALIDATE exactly as
documented.  Creates a shared fixed mapping, unless the kernel doesn't
support MAP_FIXED_NOREPLACE, then errors.

>> +@item MAP_GROWSDOWN
>> +This flag is used to make stacks, and is typically only needed inside
>> +the program loader to set up the main stack and thread stacks for the
>> +running process.  The mapping is created according to the other flags,
>> +except an additional page just prior to the mapping is marked as a
>> +``guard page''.  If a write is attempted inside this guard page, that
>> +page is mapped, the mapping is extended, and a new guard page is
>> +created.  Thus, the mapping continues to grow towards lower addresses
>> +until it encounters some other mapping.
>
> Maybe reference -fstack-clash-protection, and note that @theglibc{} does
> not use this for thread stacks?

I took out the thread stack text.

Added text about -fstack-clash-protection.

>> +@item MAP_LOCKED
>> +Requests that mapped pages are locked in memory (i.e. not paged out).
>> +Note that this is a request and not a requirement; use @code{mlock} if
>> +locking is mandatory.
>> +
>> +@item MAP_POPULATE
>> +@item MAP_NONBLOCK
>> +These two are opposites.  @code{MAP_POPULATE} requests that the kernel
>> +read-ahead a file-backed mapping, causing more pages to be mapped
>> +before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
>> +@emph{not} attempt such, only mapping pages when they're actually
>> +needed.
>
> MAP_POPULATE is just a hint, right?  And even with mlockall, or
> MAP_LOCKED, it does not guarantee the absence of future page faults.

Correct, which is why I said "requests" but I'll add better text.

>> +@item MAP_NORESERVE
>> +Asks the kernel to not reserve physical backing for a mapping.  This
>> +would be useful for, for example, a very large but sparsely used
>> +mapping which need not be limited in span by available RAM or swap.
>> +Note that writes to such a mapping may cause a @code{SIGSEGV} if the
>> +amount of backing required eventualy exceeds system resources.
>> +
>> +On Linux, this flag's behavior may be overwridden by
>> +@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).
>
> Shoud @xref the man-pages section added in the other patch.  However,
> swap(5) does not appear to exist?

Should be proc(5).  I tweaked the wording to not need a reference, I
think.  We do *not* want to accidentally include-by-reference
documentation on /proc or /sys, just the system calls.

>> +@item MAP_SYNC
>> +This flag is used to map persistent memory devices into the running
>> +program in such a way that writes to the mapping are immediately
>> +written to the device as well.  Unlike most other flags, this one will
>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>
> Is this about DAX?

Yes.

>> +@item EAGAIN
>> +
>> +Either the underlying file is locked, or the system has temporarily
>> +run out of resources.
>
> See below, I think the reference about locking is spurious.

Based on kernel code:

        if (!mlock_future_ok(mm, vm_flags, len))
                return -EAGAIN;

>> +@item EBADF
>> +
>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>
> Is a file descriptor ever required?

If mapping a file, yes.  That's the default ;-)

>> +@item EEXIST
>> +
>> +@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
>> +found in the requested address range.
>
> See my comment above for MAP_FIXED_NOREPLACE.

Tweaked the wording.

>>  @item EINVAL
>>  
>>  Either @var{address} was unusable (because it is not a multiple of the
>> @@ -1663,28 +1761,35 @@ applicable page size), or inconsistent @var{flags} were given.
>>  If @code{MAP_HUGETLB} was specified, the file or system does not support
>>  large page sizes.
>>  
>> -@item EACCES
>> +@item ENFILE
>>  
>> -@var{filedes} was not open for the type of access specified in @var{protect}.
>> +There are too many open files in the system.
>
> Can this error actually happen?  It's a bit surprising.

No direct mention in the kernel sources but the man pages documents it.
Removed.

>> +@item ENODEV
>> +
>> +This file is of a type that doesn't support mapping.
>>  
>>  @item ENOMEM
>>  
>>  Either there is not enough memory for the operation, or the process is
>>  out of address space.
>
> This should probably reference vm.max_map_count.

Noted.

>> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
>> -@c However mandatory locks are not discussed in this manual.
>
> Mandatory locks are disabled in pretty much all kernels out there, no?

I wouldn't think we'd want to write documentation based on config
options; if you get this error, obviously the config option was set on ;-)

But I removed this comment because I mentioned locking (in general) in
the added EAGAIN entry.

>> +@item EOVERFLOW
>> +
>> +Either the offset into the file causes the page counts to exceed the
>> +range of a 32 bit number, or the offset requested exceeds the length
>> +of the file.
>
> The reference to page size may be incorrect.  I think it's a fixed
> offset regardless of page size on systems that can't pass a 64-bit file
> offset.

The code I was looking at was talking about having 2^32 *pages* mapped.

>> +@item ETXTBSY
>> +
>> +@code{MAP_DENYWRITE} was specified, but the file descriptor given was
>> +open for writing.
>
> This seems to contradict the earlier suggestion that MAP_DENYWRITE is
> ignored.

Kernel source says this is still returned if you try to map a swap file
for writing...  rewritten.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [v2] Update mmap() flags and errors lists
  2024-06-04 22:16 ` Florian Weimer
  2024-06-05  4:10   ` DJ Delorie
@ 2024-06-05  4:11   ` DJ Delorie
  2024-06-05  7:44     ` Andreas Schwab
  1 sibling, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-05  4:11 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha


Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)

Extend the list of errno codes.

diff --git a/manual/llio.texi b/manual/llio.texi
index fe1807a849..03cdd622bb 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1573,10 +1573,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
 of address space for future use.  The @code{mprotect} function can be
 used to change the protection flags.  @xref{Memory Protection}.
 
-@var{flags} contains flags that control the nature of the map.
-One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
+The @var{flags} parameter contains flags that control the nature of
+the map.  One of @code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or
+@code{MAP_PRIVATE} must be specified.  Additional flags may be bitwise
+OR'd to further define the mapping.
 
-They include:
+Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
+all flags are supported on all versions of all operating systems.
+Consult the kernel-specific documentation for details.  The flags
+include:
 
 @vtable @code
 @item MAP_PRIVATE
@@ -1598,9 +1603,19 @@ Note that actual writing may take place at any time.  You need to use
 @code{msync}, described below, if it is important that other processes
 using conventional I/O get a consistent view of the file.
 
+@item MAP_SHARED_VALIDATE
+Similar to @code{MAP_SHARED} except that additional flags will be
+validated by the kernel, and the call will fail if an unrecognized
+flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
+that doesn't support it causes the flag to be ignored.
+@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
+flags is required.
+
 @item MAP_FIXED
 This forces the system to use the exact mapping address specified in
-@var{address} and fail if it can't.
+@var{address} and fail if it can't.  Note that if the new mapping
+would overlap an existing mapping, the overlapping portion of the
+existing map is unmapped.
 
 @c One of these is official - the other is obviously an obsolete synonym
 @c Which is which?
@@ -1638,13 +1653,84 @@ Not all file systems support mappings with an increased page size.
 
 The @code{MAP_HUGETLB} flag is specific to Linux.
 
-@c There is a mechanism to select different hugepage sizes; see
-@c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
-
-@c Linux has some other MAP_ options, which I have not discussed here.
-@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
-@c user programs (and I don't understand the last two).  MAP_LOCKED does
-@c not appear to be implemented.
+@item MAP_HUGE_16KB
+@dots{}
+@item MAP_HUGE_16GB
+Some architectures support more than one size of ``huge'' pages for
+@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
+them.  Note that while the ABI allows the caller to specify arbitrary
+page sizes, not all sizes have corresponding defined macros, and not
+all defined macros correspond to sizes supported by the kernel.  It is
+up to the programmer to only ask for huge page sizes that are known to
+be supported.
+
+@item MAP_32BIT
+Require addresses that can be accessed with a 32 bit pointer, i.e.,
+within the first 4 GiB.  Ignored if MAP_FIXED is specified.
+
+@item MAP_DENYWRITE
+@item MAP_EXECUTABLE
+@item MAP_FILE
+
+Provided for compatibility.  Ignored by the Linux kernel.
+
+@item MAP_FIXED_NOREPLACE
+Similar to @code{MAP_FIXED} except the call will fail with
+@code{EEXIST} if the new mapping would overwrite an existing mapping.
+
+@item MAP_GROWSDOWN
+This flag is used to make stacks, and is typically only needed inside
+the program loader to set up the main stack for the running process.
+The mapping is created according to the other flags, except an
+additional page just prior to the mapping is marked as a ``guard
+page''.  If a write is attempted inside this guard page, that page is
+mapped, the mapping is extended, and a new guard page is created.
+Thus, the mapping continues to grow towards lower addresses until it
+encounters some other mapping.
+
+Note that accessing memory beyond the guard page will not trigger this
+feature.  In gcc, use @code{-fstack-clash-protection} to ensure the
+guard page is always touched.
+
+@item MAP_LOCKED
+A hint that requests that mapped pages are locked in memory (i.e. not
+paged out).  Note that this is a request and not a requirement; use
+@code{mlock} if locking is mandatory.
+
+@item MAP_POPULATE
+@item MAP_NONBLOCK
+These two are opposites.  @code{MAP_POPULATE} is a hint that requests
+that the kernel read-ahead a file-backed mapping, causing more pages
+to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
+that requests that the kernel @emph{not} attempt such, only mapping
+pages when they're actually needed.
+
+@item MAP_NORESERVE
+Asks the kernel to not reserve physical backing for a mapping.  This
+would be useful for, for example, a very large but sparsely used
+mapping which need not be limited in span by available RAM or swap.
+Note that writes to such a mapping may cause a @code{SIGSEGV} if the
+amount of backing required eventualy exceeds system resources.
+
+On Linux, this flag's behavior may be overwridden by
+@code{/proc/sys/vm/overcommit_memory} as documented in the proc(5) man
+page.
+
+@item MAP_STACK
+Ensures that the resulting mapping is suitable for use as a program
+stack.  For example, the use of huge pages might be precluded.
+
+@item MAP_SYNC
+This flag is used to map persistent memory devices into the running
+program in such a way that writes to the mapping are immediately
+written to the device as well.  Unlike most other flags, this one will
+fail unless @code{MAP_SHARED_VALIDATE} is also given.
+
+@item MAP_UNINITIALIZED
+This flag allows the kernel to map anonymous pages without zeroing
+them out first.  This is, of course, a security risk, and will only
+work if the kernel is built to allow it (typically, on single-process
+embedded systems).
 
 @end vtable
 
@@ -1655,6 +1741,25 @@ Possible errors include:
 
 @table @code
 
+@item EACCES
+
+@var{filedes} was not open for the type of access specified in @var{protect}.
+
+@item EAGAIN
+
+Either the underlying file is locked, or the system has temporarily
+run out of resources.
+
+@item EBADF
+
+The @var{fd} passes is invalid, and a valid file descriptor is
+required (i.e. mapping a file).
+
+@item EEXIST
+
+@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
+found overlapping the requested address range.
+
 @item EINVAL
 
 Either @var{address} was unusable (because it is not a multiple of the
@@ -1663,28 +1768,35 @@ applicable page size), or inconsistent @var{flags} were given.
 If @code{MAP_HUGETLB} was specified, the file or system does not support
 large page sizes.
 
-@item EACCES
+@item ENODEV
 
-@var{filedes} was not open for the type of access specified in @var{protect}.
+This file is of a type that doesn't support mapping.
 
 @item ENOMEM
 
-Either there is not enough memory for the operation, or the process is
-out of address space.
-
-@item ENODEV
-
-This file is of a type that doesn't support mapping.
+There is not enough memory for the operation, the process is out of
+address space, or there are too many mappings.  On Linux, the maximum
+number of mappings can be controlled via
+@code{/proc/sys/vm/max_map_count}.
 
 @item ENOEXEC
 
 The file is on a filesystem that doesn't support mapping.
 
-@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
-@c However mandatory locks are not discussed in this manual.
-@c
-@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented
-@c here) is used and the file is already open for writing.
+@item EPERM
+
+@item EOVERFLOW
+
+Either the offset into the file causes the page counts to exceed the
+range of a 32 bit number, or the offset requested exceeds the length
+of the file.
+
+@item ETXTBSY
+
+On older kernels when MAP_DENYWRITE was honored, and
+@code{MAP_DENYWRITE} was specified, but the file descriptor given was
+open for writing.  Also returned if you try to mmap a swap file with
+PROT_WRITE.
 
 @end table
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-06-05  4:10   ` DJ Delorie
@ 2024-06-05  6:38     ` Florian Weimer
  2024-06-05 18:42       ` DJ Delorie
  2024-06-05 18:50       ` [v3] " DJ Delorie
  0 siblings, 2 replies; 20+ messages in thread
From: Florian Weimer @ 2024-06-05  6:38 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

* DJ Delorie:

>>> +@item MAP_HUGE_16KB
>>> +@dots{}
>>> +@item MAP_HUGE_16GB
>>> +Some architectures support more than one size of ``huge'' pages for
>>> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
>>> +them.  Note that while the ABI allows the caller to specify arbitrary
>>> +page sizes, not all sizes have corresponding defined macros, and not
>>> +all defined macros correspond to sizes supported by the kernel.  It is
>>> +up to the programmer to only ask for huge page sizes that are known to
>>> +be supported.
>>
>> These we do not support?  (We probably should.)
>
> The ABI is a 6-bit bitfield giving the biased bit width of the page
> size.  Not all combinations have macros, and not all combinations are
> honored by the kernel.  We do have macros for the combinations that the
> kernel honors.  So if the kernel can do it, it works, but if the kernel
> can't do it, either you get a runtime error or a compile time error ;-)
>
> I suppose we could list all 64 possible macros in our headers, but at
> the moment, we don't.

What I meant is that they aren't part of <sys/mman.h>.

>>> +@item MAP_FIXED_NOREPLACE
>>> +Similar to @code{MAP_FIXED} except the call will fail with
>>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>>
>> How does this interact with MAP_SHARED_VALIDATE above?  Can it be
>> combined with MAP_FIXED?
>
> Superset of MAP_FIXED, so it's internally *always* combined:
>         /* force arch specific MAP_FIXED handling in get_unmapped_area */
>         if (flags & MAP_FIXED_NOREPLACE)
>                 flags |= MAP_FIXED;
>
> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
> documented.  Creates a shared fixed mapping, unless the kernel doesn't
> support MAP_FIXED_NOREPLACE, then errors.

The question is if MAP_FIXED_NOREPLACE can be silently treated as
MAP_FIXED.

>>> +@item EAGAIN
>>> +
>>> +Either the underlying file is locked, or the system has temporarily
>>> +run out of resources.
>>
>> See below, I think the reference about locking is spurious.
>
> Based on kernel code:
>
>         if (!mlock_future_ok(mm, vm_flags, len))
>                 return -EAGAIN;

Ahh, this refers to mlock/MAP_LOCKED?  Please say so.  We shouldn't
discuss mandatory locking support.

>>> +@item EBADF
>>> +
>>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>>
>> Is a file descriptor ever required?
>
> If mapping a file, yes.  That's the default ;-)

Ah, right.  Do we say anywhere that the fd argument is ignored for
MAP_ANONYMOUS?

>>> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
>>> -@c However mandatory locks are not discussed in this manual.
>>
>> Mandatory locks are disabled in pretty much all kernels out there, no?
>
> I wouldn't think we'd want to write documentation based on config
> options; if you get this error, obviously the config option was set on ;-)
>
> But I removed this comment because I mentioned locking (in general) in
> the added EAGAIN entry.

Different kind of locking, see above.

Mandatory locking is generally considered a very bad idea.  I doubt any
distribution kernels enable it.  I'm worried that discussing it would be
misleading at best.

>>> +@item EOVERFLOW
>>> +
>>> +Either the offset into the file causes the page counts to exceed the
>>> +range of a 32 bit number, or the offset requested exceeds the length
>>> +of the file.
>>
>> The reference to page size may be incorrect.  I think it's a fixed
>> offset regardless of page size on systems that can't pass a 64-bit file
>> offset.
>
> The code I was looking at was talking about having 2^32 *pages* mapped.

See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
regardless of page size.

And it turns out that we do not check that the offset argument fits into
32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
matters).  The documentation implies we should return EOVERFLOW in this
case.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v2] Update mmap() flags and errors lists
  2024-06-05  4:11   ` [v2] " DJ Delorie
@ 2024-06-05  7:44     ` Andreas Schwab
  2024-06-05 18:42       ` DJ Delorie
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Schwab @ 2024-06-05  7:44 UTC (permalink / raw)
  To: DJ Delorie; +Cc: Florian Weimer, libc-alpha

On Jun 05 2024, DJ Delorie wrote:

> +@item MAP_HUGE_16KB
> +@dots{}
> +@item MAP_HUGE_16GB

You need to use @itemx on subsequent lines.  That won't work for the
@dots line, since it isn't supposed to be indexed (and it should not be
enclosed in @code), so this needs to be formulated differently.

> +@item MAP_DENYWRITE
> +@item MAP_EXECUTABLE
> +@item MAP_FILE

@itemx

> +@item MAP_POPULATE
> +@item MAP_NONBLOCK

@itemx

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-06-05  6:38     ` Florian Weimer
@ 2024-06-05 18:42       ` DJ Delorie
  2024-06-14  8:14         ` Florian Weimer
  2024-06-05 18:50       ` [v3] " DJ Delorie
  1 sibling, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-05 18:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:
> What I meant is that they aren't part of <sys/mman.h>.

So they aren't.  We define the shift macros, though...

Removed.

>> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
>> documented.  Creates a shared fixed mapping, unless the kernel doesn't
>> support MAP_FIXED_NOREPLACE, then errors.
>
> The question is if MAP_FIXED_NOREPLACE can be silently treated as
> MAP_FIXED.

I wouldn't think so.  If the kernel supports it, it automatically
includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
treatment.

If you needed MAP_PRIVATE | MAP_FIXED_NOREPLACE I suppose you'd have to
do a test mapping with MAP_SHARED_VALIDATE | MAP_FIXED_NOREPLACE first,
to see if the running kernel supports it.  But this is the case for
*any* flag.

> Ahh, this refers to mlock/MAP_LOCKED?  Please say so.  We shouldn't
> discuss mandatory locking support.

Ok, I'll take out mentions of these locks.

I assume this is different than the mlock() and MAP_LOCKED locks, yes?

>>>> +@item EBADF
>>>> +
>>>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>>>
>>> Is a file descriptor ever required?
>>
>> If mapping a file, yes.  That's the default ;-)
>
> Ah, right.  Do we say anywhere that the fd argument is ignored for
> MAP_ANONYMOUS?

Yes.  I reworded it a bit.

@item MAP_ANONYMOUS
@itemx MAP_ANON
This flag tells the system to create an anonymous mapping, not connected
to a file.  @var{filedes} and @var{offset} are ignored, and the region is
initialized with zeros.

>>>> +@item EOVERFLOW
>>>> +
>>>> +Either the offset into the file causes the page counts to exceed the
>>>> +range of a 32 bit number, or the offset requested exceeds the length
>>>> +of the file.
>>>
>>> The reference to page size may be incorrect.  I think it's a fixed
>>> offset regardless of page size on systems that can't pass a 64-bit file
>>> offset.
>>
>> The code I was looking at was talking about having 2^32 *pages* mapped.
>
> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
> regardless of page size.
>
> And it turns out that we do not check that the offset argument fits into
> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
> matters).  The documentation implies we should return EOVERFLOW in this
> case.


	/* offset overflow? */
	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
		return -EOVERFLOW;

the man pages describe it thusly:

       EOVERFLOW
              On 32-bit architecture together with the large file
              extension (i.e., using 64-bit off_t): the number of pages
              used for length plus number of pages used for offset would
              overflow unsigned long (32 bits).

I reworded it to be a bit more correct and at the same time vague ;-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v2] Update mmap() flags and errors lists
  2024-06-05  7:44     ` Andreas Schwab
@ 2024-06-05 18:42       ` DJ Delorie
  0 siblings, 0 replies; 20+ messages in thread
From: DJ Delorie @ 2024-06-05 18:42 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: fweimer, libc-alpha

Andreas Schwab <schwab@suse.de> writes:
> You need to use @itemx on subsequent lines.  That won't work for the
> @dots line, since it isn't supposed to be indexed (and it should not be
> enclosed in @code), so this needs to be formulated differently.

Thanks, fixed.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [v3] Update mmap() flags and errors lists
  2024-06-05  6:38     ` Florian Weimer
  2024-06-05 18:42       ` DJ Delorie
@ 2024-06-05 18:50       ` DJ Delorie
  2024-06-14  8:21         ` Florian Weimer
  1 sibling, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-05 18:50 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer


Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)

Extend the list of errno codes.

diff --git a/manual/llio.texi b/manual/llio.texi
index fe1807a849..fae808c090 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1573,10 +1573,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
 of address space for future use.  The @code{mprotect} function can be
 used to change the protection flags.  @xref{Memory Protection}.
 
-@var{flags} contains flags that control the nature of the map.
-One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
+The @var{flags} parameter contains flags that control the nature of
+the map.  One of @code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or
+@code{MAP_PRIVATE} must be specified.  Additional flags may be bitwise
+OR'd to further define the mapping.
 
-They include:
+Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
+all flags are supported on all versions of all operating systems.
+Consult the kernel-specific documentation for details.  The flags
+include:
 
 @vtable @code
 @item MAP_PRIVATE
@@ -1598,9 +1603,19 @@ Note that actual writing may take place at any time.  You need to use
 @code{msync}, described below, if it is important that other processes
 using conventional I/O get a consistent view of the file.
 
+@item MAP_SHARED_VALIDATE
+Similar to @code{MAP_SHARED} except that additional flags will be
+validated by the kernel, and the call will fail if an unrecognized
+flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
+that doesn't support it causes the flag to be ignored.
+@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
+flags is required.
+
 @item MAP_FIXED
 This forces the system to use the exact mapping address specified in
-@var{address} and fail if it can't.
+@var{address} and fail if it can't.  Note that if the new mapping
+would overlap an existing mapping, the overlapping portion of the
+existing map is unmapped.
 
 @c One of these is official - the other is obviously an obsolete synonym
 @c Which is which?
@@ -1641,10 +1656,73 @@ The @code{MAP_HUGETLB} flag is specific to Linux.
 @c There is a mechanism to select different hugepage sizes; see
 @c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
 
-@c Linux has some other MAP_ options, which I have not discussed here.
-@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
-@c user programs (and I don't understand the last two).  MAP_LOCKED does
-@c not appear to be implemented.
+@item MAP_32BIT
+Require addresses that can be accessed with a 32 bit pointer, i.e.,
+within the first 4 GiB.  Ignored if MAP_FIXED is specified.
+
+@item MAP_DENYWRITE
+@itemx MAP_EXECUTABLE
+@itemx MAP_FILE
+
+Provided for compatibility.  Ignored by the Linux kernel.
+
+@item MAP_FIXED_NOREPLACE
+Similar to @code{MAP_FIXED} except the call will fail with
+@code{EEXIST} if the new mapping would overwrite an existing mapping.
+
+@item MAP_GROWSDOWN
+This flag is used to make stacks, and is typically only needed inside
+the program loader to set up the main stack for the running process.
+The mapping is created according to the other flags, except an
+additional page just prior to the mapping is marked as a ``guard
+page''.  If a write is attempted inside this guard page, that page is
+mapped, the mapping is extended, and a new guard page is created.
+Thus, the mapping continues to grow towards lower addresses until it
+encounters some other mapping.
+
+Note that accessing memory beyond the guard page will not trigger this
+feature.  In gcc, use @code{-fstack-clash-protection} to ensure the
+guard page is always touched.
+
+@item MAP_LOCKED
+A hint that requests that mapped pages are locked in memory (i.e. not
+paged out).  Note that this is a request and not a requirement; use
+@code{mlock} if locking is required.
+
+@item MAP_POPULATE
+@itemx MAP_NONBLOCK
+These two are opposites.  @code{MAP_POPULATE} is a hint that requests
+that the kernel read-ahead a file-backed mapping, causing more pages
+to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
+that requests that the kernel @emph{not} attempt such, only mapping
+pages when they're actually needed.
+
+@item MAP_NORESERVE
+Asks the kernel to not reserve physical backing for a mapping.  This
+would be useful for, for example, a very large but sparsely used
+mapping which need not be limited in span by available RAM or swap.
+Note that writes to such a mapping may cause a @code{SIGSEGV} if the
+amount of backing required eventualy exceeds system resources.
+
+On Linux, this flag's behavior may be overwridden by
+@code{/proc/sys/vm/overcommit_memory} as documented in the proc(5) man
+page.
+
+@item MAP_STACK
+Ensures that the resulting mapping is suitable for use as a program
+stack.  For example, the use of huge pages might be precluded.
+
+@item MAP_SYNC
+This flag is used to map persistent memory devices into the running
+program in such a way that writes to the mapping are immediately
+written to the device as well.  Unlike most other flags, this one will
+fail unless @code{MAP_SHARED_VALIDATE} is also given.
+
+@item MAP_UNINITIALIZED
+This flag allows the kernel to map anonymous pages without zeroing
+them out first.  This is, of course, a security risk, and will only
+work if the kernel is built to allow it (typically, on single-process
+embedded systems).
 
 @end vtable
 
@@ -1655,6 +1733,24 @@ Possible errors include:
 
 @table @code
 
+@item EACCES
+
+@var{filedes} was not open for the type of access specified in @var{protect}.
+
+@item EAGAIN
+
+The system has temporarily run out of resources.
+
+@item EBADF
+
+The @var{fd} passes is invalid, and a valid file descriptor is
+required (i.e. MAP_ANONYMOUS was not specified).
+
+@item EEXIST
+
+@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
+found overlapping the requested address range.
+
 @item EINVAL
 
 Either @var{address} was unusable (because it is not a multiple of the
@@ -1663,23 +1759,32 @@ applicable page size), or inconsistent @var{flags} were given.
 If @code{MAP_HUGETLB} was specified, the file or system does not support
 large page sizes.
 
-@item EACCES
+@item ENODEV
 
-@var{filedes} was not open for the type of access specified in @var{protect}.
+This file is of a type that doesn't support mapping.
 
 @item ENOMEM
 
-Either there is not enough memory for the operation, or the process is
-out of address space.
-
-@item ENODEV
-
-This file is of a type that doesn't support mapping.
+There is not enough memory for the operation, the process is out of
+address space, or there are too many mappings.  On Linux, the maximum
+number of mappings can be controlled via
+@code{/proc/sys/vm/max_map_count}.
 
 @item ENOEXEC
 
 The file is on a filesystem that doesn't support mapping.
 
+@item EPERM
+
+@code{PROT_EXEC} was requested but the file is on a filesystem that
+was mounted with execution denied.
+
+@item EOVERFLOW
+
+Either the offset into the file plus the length of the mapping causes
+internal page counts to overflow, or the offset requested exceeds the
+length of the file.
+
 @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
 @c However mandatory locks are not discussed in this manual.
 @c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-06-05 18:42       ` DJ Delorie
@ 2024-06-14  8:14         ` Florian Weimer
  2024-06-14 16:40           ` DJ Delorie
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2024-06-14  8:14 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

* DJ Delorie:

> Florian Weimer <fweimer@redhat.com> writes:
>> What I meant is that they aren't part of <sys/mman.h>.
>
> So they aren't.  We define the shift macros, though...
>
> Removed.
>
>>> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
>>> documented.  Creates a shared fixed mapping, unless the kernel doesn't
>>> support MAP_FIXED_NOREPLACE, then errors.
>>
>> The question is if MAP_FIXED_NOREPLACE can be silently treated as
>> MAP_FIXED.
>
> I wouldn't think so.  If the kernel supports it, it automatically
> includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
> You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
> treatment.

Ahh, so the application has to if the return address changed and unmap?

>>> The code I was looking at was talking about having 2^32 *pages* mapped.
>>
>> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
>> regardless of page size.
>>
>> And it turns out that we do not check that the offset argument fits into
>> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
>> matters).  The documentation implies we should return EOVERFLOW in this
>> case.
>
>
> 	/* offset overflow? */
> 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
> 		return -EOVERFLOW;

This is kernel code.  We have a shift in userspace on some
architectures, too.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v3] Update mmap() flags and errors lists
  2024-06-05 18:50       ` [v3] " DJ Delorie
@ 2024-06-14  8:21         ` Florian Weimer
  2024-06-14 18:19           ` DJ Delorie
  2024-06-14 18:46           ` [v4] " DJ Delorie
  0 siblings, 2 replies; 20+ messages in thread
From: Florian Weimer @ 2024-06-14  8:21 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

* DJ Delorie:

> +@item MAP_32BIT
> +Require addresses that can be accessed with a 32 bit pointer, i.e.,
> +within the first 4 GiB.  Ignored if MAP_FIXED is specified.

Isn't MAP_32BIT mapping within  the first 2 GiB?

> +@item MAP_FIXED_NOREPLACE
> +Similar to @code{MAP_FIXED} except the call will fail with
> +@code{EEXIST} if the new mapping would overwrite an existing mapping.

Maybe note that if the kernel does not support MAP_FIXED_NOREPLACE,
the mapping can result at a different address, with no overlap?

> +@item MAP_POPULATE
> +@itemx MAP_NONBLOCK
> +These two are opposites.  @code{MAP_POPULATE} is a hint that requests
> +that the kernel read-ahead a file-backed mapping, causing more pages
> +to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
> +that requests that the kernel @emph{not} attempt such, only mapping
> +pages when they're actually needed.

Maybe mention that even with mlockall or MAP_LOCK, MAP_POPULATE does not
avoid future page faults, and mlock still needs to be used?

> +@code{/proc/sys/vm/max_map_count}.

Either use @file{/proc/sys/vm/max_map_count}, or something like the “the
@code{vm.max_map_count} sysctl setting”.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Update mmap() flags and errors lists
  2024-06-14  8:14         ` Florian Weimer
@ 2024-06-14 16:40           ` DJ Delorie
  0 siblings, 0 replies; 20+ messages in thread
From: DJ Delorie @ 2024-06-14 16:40 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:
>>> The question is if MAP_FIXED_NOREPLACE can be silently treated as
>>> MAP_FIXED.
>>
>> I wouldn't think so.  If the kernel supports it, it automatically
>> includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
>> You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
>> treatment.
>
> Ahh, so the application has to if the return address changed and unmap?

I suppose that's one way of auto-detecting it, yes.

>>>> The code I was looking at was talking about having 2^32 *pages* mapped.
>>>
>>> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
>>> regardless of page size.
>>>
>>> And it turns out that we do not check that the offset argument fits into
>>> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
>>> matters).  The documentation implies we should return EOVERFLOW in this
>>> case.
>>
>>
>> 	/* offset overflow? */
>> 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
>> 		return -EOVERFLOW;
>
> This is kernel code.  We have a shift in userspace on some
> architectures, too.

That shouldn't affect the kernel's computation of that error code,
though?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v3] Update mmap() flags and errors lists
  2024-06-14  8:21         ` Florian Weimer
@ 2024-06-14 18:19           ` DJ Delorie
  2024-06-14 18:46           ` [v4] " DJ Delorie
  1 sibling, 0 replies; 20+ messages in thread
From: DJ Delorie @ 2024-06-14 18:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:
> Isn't MAP_32BIT mapping within  the first 2 GiB?

So it is.  Fixed:

  @item MAP_32BIT
  Require addresses that can be accessed with a signed 32 bit pointer,
  i.e., within the first 2 GiB.  Ignored if MAP_FIXED is specified.

>> +@item MAP_FIXED_NOREPLACE
>> +Similar to @code{MAP_FIXED} except the call will fail with
>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>
> Maybe note that if the kernel does not support MAP_FIXED_NOREPLACE,
> the mapping can result at a different address, with no overlap?

Noted:

  @item MAP_FIXED_NOREPLACE
  Similar to @code{MAP_FIXED} except the call will fail with
  @code{EEXIST} if the new mapping would overwrite an existing mapping.
  To test for this, specify MAP_FIXED_NOREPLACE without MAP_FIXED, and
  check the actual address returned.  If it does not match the address
  passed, then this flag is not supported.

>> +@item MAP_POPULATE
>> +@itemx MAP_NONBLOCK
>> +These two are opposites.  @code{MAP_POPULATE} is a hint that requests
>> +that the kernel read-ahead a file-backed mapping, causing more pages
>> +to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
>> +that requests that the kernel @emph{not} attempt such, only mapping
>> +pages when they're actually needed.
>
> Maybe mention that even with mlockall or MAP_LOCK, MAP_POPULATE does not
> avoid future page faults, and mlock still needs to be used?

Mentioned:

  @item MAP_POPULATE
  @itemx MAP_NONBLOCK
  . . .  Note that neither of these hints affects future paging
  activity, use @code{mlock} if such needs to be controlled.

>> +@code{/proc/sys/vm/max_map_count}.
>
> Either use @file{/proc/sys/vm/max_map_count}, or something like the “the
> @code{vm.max_map_count} sysctl setting”.

Noted:

  There is not enough memory for the operation, the process is out of
  address space, or there are too many mappings.  On Linux, the maximum
  number of mappings can be controlled via
  @file{/proc/sys/vm/max_map_count} or, if your OS supports it, via
  the @code{vm.max_map_count} @code{sysctl} setting.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [v4] Update mmap() flags and errors lists
  2024-06-14  8:21         ` Florian Weimer
  2024-06-14 18:19           ` DJ Delorie
@ 2024-06-14 18:46           ` DJ Delorie
  2024-06-18 20:13             ` Mathieu Desnoyers
  2024-06-19  7:16             ` Florian Weimer
  1 sibling, 2 replies; 20+ messages in thread
From: DJ Delorie @ 2024-06-14 18:46 UTC (permalink / raw)
  To: libc-alpha


[v4: tweaked text on MAP_FIXED_NOREPLACE, MAP_POPULATE, MAP_32BIT, and
ENOMEM]

Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)

Extend the list of errno codes.

diff --git a/manual/llio.texi b/manual/llio.texi
index 0d1a32e3e1..7edec3e8d7 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1574,10 +1574,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
 of address space for future use.  The @code{mprotect} function can be
 used to change the protection flags.  @xref{Memory Protection}.
 
-@var{flags} contains flags that control the nature of the map.
-One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
+The @var{flags} parameter contains flags that control the nature of
+the map.  One of @code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or
+@code{MAP_PRIVATE} must be specified.  Additional flags may be bitwise
+OR'd to further define the mapping.
 
-They include:
+Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
+all flags are supported on all versions of all operating systems.
+Consult the kernel-specific documentation for details.  The flags
+include:
 
 @vtable @code
 @item MAP_PRIVATE
@@ -1599,9 +1604,19 @@ Note that actual writing may take place at any time.  You need to use
 @code{msync}, described below, if it is important that other processes
 using conventional I/O get a consistent view of the file.
 
+@item MAP_SHARED_VALIDATE
+Similar to @code{MAP_SHARED} except that additional flags will be
+validated by the kernel, and the call will fail if an unrecognized
+flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
+that doesn't support it causes the flag to be ignored.
+@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
+flags is required.
+
 @item MAP_FIXED
 This forces the system to use the exact mapping address specified in
-@var{address} and fail if it can't.
+@var{address} and fail if it can't.  Note that if the new mapping
+would overlap an existing mapping, the overlapping portion of the
+existing map is unmapped.
 
 @c One of these is official - the other is obviously an obsolete synonym
 @c Which is which?
@@ -1642,10 +1657,78 @@ The @code{MAP_HUGETLB} flag is specific to Linux.
 @c There is a mechanism to select different hugepage sizes; see
 @c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
 
-@c Linux has some other MAP_ options, which I have not discussed here.
-@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
-@c user programs (and I don't understand the last two).  MAP_LOCKED does
-@c not appear to be implemented.
+@item MAP_32BIT
+Require addresses that can be accessed with a signed 32 bit pointer,
+i.e., within the first 2 GiB.  Ignored if MAP_FIXED is specified.
+
+@item MAP_DENYWRITE
+@itemx MAP_EXECUTABLE
+@itemx MAP_FILE
+
+Provided for compatibility.  Ignored by the Linux kernel.
+
+@item MAP_FIXED_NOREPLACE
+Similar to @code{MAP_FIXED} except the call will fail with
+@code{EEXIST} if the new mapping would overwrite an existing mapping.
+To test for this, specify MAP_FIXED_NOREPLACE without MAP_FIXED, and
+check the actual address returned.  If it does not match the address
+passed, then this flag is not supported.
+
+@item MAP_GROWSDOWN
+This flag is used to make stacks, and is typically only needed inside
+the program loader to set up the main stack for the running process.
+The mapping is created according to the other flags, except an
+additional page just prior to the mapping is marked as a ``guard
+page''.  If a write is attempted inside this guard page, that page is
+mapped, the mapping is extended, and a new guard page is created.
+Thus, the mapping continues to grow towards lower addresses until it
+encounters some other mapping.
+
+Note that accessing memory beyond the guard page will not trigger this
+feature.  In gcc, use @code{-fstack-clash-protection} to ensure the
+guard page is always touched.
+
+@item MAP_LOCKED
+A hint that requests that mapped pages are locked in memory (i.e. not
+paged out).  Note that this is a request and not a requirement; use
+@code{mlock} if locking is required.
+
+@item MAP_POPULATE
+@itemx MAP_NONBLOCK
+These two are opposites.  @code{MAP_POPULATE} is a hint that requests
+that the kernel read-ahead a file-backed mapping, causing more pages
+to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
+that requests that the kernel @emph{not} attempt such, only mapping
+pages when they're actually needed.  Note that neither of these hints
+affects future paging activity, use @code{mlock} if such needs to be
+controlled.
+
+@item MAP_NORESERVE
+Asks the kernel to not reserve physical backing for a mapping.  This
+would be useful for, for example, a very large but sparsely used
+mapping which need not be limited in span by available RAM or swap.
+Note that writes to such a mapping may cause a @code{SIGSEGV} if the
+amount of backing required eventualy exceeds system resources.
+
+On Linux, this flag's behavior may be overwridden by
+@file{/proc/sys/vm/overcommit_memory} as documented in the proc(5) man
+page.
+
+@item MAP_STACK
+Ensures that the resulting mapping is suitable for use as a program
+stack.  For example, the use of huge pages might be precluded.
+
+@item MAP_SYNC
+This flag is used to map persistent memory devices into the running
+program in such a way that writes to the mapping are immediately
+written to the device as well.  Unlike most other flags, this one will
+fail unless @code{MAP_SHARED_VALIDATE} is also given.
+
+@item MAP_UNINITIALIZED
+This flag allows the kernel to map anonymous pages without zeroing
+them out first.  This is, of course, a security risk, and will only
+work if the kernel is built to allow it (typically, on single-process
+embedded systems).
 
 @end vtable
 
@@ -1656,6 +1739,24 @@ Possible errors include:
 
 @table @code
 
+@item EACCES
+
+@var{filedes} was not open for the type of access specified in @var{protect}.
+
+@item EAGAIN
+
+The system has temporarily run out of resources.
+
+@item EBADF
+
+The @var{fd} passes is invalid, and a valid file descriptor is
+required (i.e. MAP_ANONYMOUS was not specified).
+
+@item EEXIST
+
+@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
+found overlapping the requested address range.
+
 @item EINVAL
 
 Either @var{address} was unusable (because it is not a multiple of the
@@ -1664,23 +1765,33 @@ applicable page size), or inconsistent @var{flags} were given.
 If @code{MAP_HUGETLB} was specified, the file or system does not support
 large page sizes.
 
-@item EACCES
+@item ENODEV
 
-@var{filedes} was not open for the type of access specified in @var{protect}.
+This file is of a type that doesn't support mapping.
 
 @item ENOMEM
 
-Either there is not enough memory for the operation, or the process is
-out of address space.
-
-@item ENODEV
-
-This file is of a type that doesn't support mapping.
+There is not enough memory for the operation, the process is out of
+address space, or there are too many mappings.  On Linux, the maximum
+number of mappings can be controlled via
+@file{/proc/sys/vm/max_map_count} or, if your OS supports it, via
+the @code{vm.max_map_count} @code{sysctl} setting.
 
 @item ENOEXEC
 
 The file is on a filesystem that doesn't support mapping.
 
+@item EPERM
+
+@code{PROT_EXEC} was requested but the file is on a filesystem that
+was mounted with execution denied.
+
+@item EOVERFLOW
+
+Either the offset into the file plus the length of the mapping causes
+internal page counts to overflow, or the offset requested exceeds the
+length of the file.
+
 @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
 @c However mandatory locks are not discussed in this manual.
 @c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-14 18:46           ` [v4] " DJ Delorie
@ 2024-06-18 20:13             ` Mathieu Desnoyers
  2024-06-18 20:57               ` DJ Delorie
  2024-06-19  7:16             ` Florian Weimer
  1 sibling, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2024-06-18 20:13 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

On 14-Jun-2024 02:46:05 PM, DJ Delorie wrote:
> [v4: tweaked text on MAP_FIXED_NOREPLACE, MAP_POPULATE, MAP_32BIT, and
> ENOMEM]
> 
> Extend the list of MAP_* macros to include all macros available
> to the average program (gcc -E -dM | grep MAP_*)
> 
> Extend the list of errno codes.

I will review this patch based on my understanding of the Linux mmap(2)
manual from Linux man-pages 6.03 2023-02-05. It is very much possible
that your intent is not to match that specific syscall man page, in
which case feel free to dismiss my concerns.

> 
> diff --git a/manual/llio.texi b/manual/llio.texi
> index 0d1a32e3e1..7edec3e8d7 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1574,10 +1574,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
>  of address space for future use.  The @code{mprotect} function can be
>  used to change the protection flags.  @xref{Memory Protection}.
>  
> -@var{flags} contains flags that control the nature of the map.
> -One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
> +The @var{flags} parameter contains flags that control the nature of
> +the map.  One of @code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or
> +@code{MAP_PRIVATE} must be specified.  Additional flags may be bitwise
> +OR'd to further define the mapping.

OK

>  
> -They include:
> +Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
> +all flags are supported on all versions of all operating systems.
> +Consult the kernel-specific documentation for details.  The flags
> +include:
>  
>  @vtable @code
>  @item MAP_PRIVATE
> @@ -1599,9 +1604,19 @@ Note that actual writing may take place at any time.  You need to use
>  @code{msync}, described below, if it is important that other processes
>  using conventional I/O get a consistent view of the file.
>  
> +@item MAP_SHARED_VALIDATE
> +Similar to @code{MAP_SHARED} except that additional flags will be
> +validated by the kernel, and the call will fail if an unrecognized
> +flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
> +that doesn't support it causes the flag to be ignored.
> +@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
> +flags is required.

OK

> +
>  @item MAP_FIXED
>  This forces the system to use the exact mapping address specified in
> -@var{address} and fail if it can't.
> +@var{address} and fail if it can't.  Note that if the new mapping
> +would overlap an existing mapping, the overlapping portion of the
> +existing map is unmapped.
>  
>  @c One of these is official - the other is obviously an obsolete synonym
>  @c Which is which?
> @@ -1642,10 +1657,78 @@ The @code{MAP_HUGETLB} flag is specific to Linux.
>  @c There is a mechanism to select different hugepage sizes; see
>  @c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
>  
> -@c Linux has some other MAP_ options, which I have not discussed here.
> -@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
> -@c user programs (and I don't understand the last two).  MAP_LOCKED does
> -@c not appear to be implemented.
> +@item MAP_32BIT
> +Require addresses that can be accessed with a signed 32 bit pointer,
> +i.e., within the first 2 GiB.  Ignored if MAP_FIXED is specified.
> +
> +@item MAP_DENYWRITE
> +@itemx MAP_EXECUTABLE
> +@itemx MAP_FILE
> +
> +Provided for compatibility.  Ignored by the Linux kernel.
> +
> +@item MAP_FIXED_NOREPLACE
> +Similar to @code{MAP_FIXED} except the call will fail with
> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
> +To test for this, specify MAP_FIXED_NOREPLACE without MAP_FIXED, and
> +check the actual address returned.  If it does not match the address
> +passed, then this flag is not supported.

mmap(2) states that older kernels fallback to non-MAP_FIXED behavior if
the mapping would overwrite an existing mapping, which requires to
carefully handle the return value. Is this backward-compatibility
handling somehow abstracted within the libc wrapper ?

> +
> +@item MAP_GROWSDOWN
> +This flag is used to make stacks, and is typically only needed inside
> +the program loader to set up the main stack for the running process.
> +The mapping is created according to the other flags, except an
> +additional page just prior to the mapping is marked as a ``guard
> +page''.  If a write is attempted inside this guard page, that page is
> +mapped, the mapping is extended, and a new guard page is created.
> +Thus, the mapping continues to grow towards lower addresses until it
> +encounters some other mapping.
> +
> +Note that accessing memory beyond the guard page will not trigger this
> +feature.  In gcc, use @code{-fstack-clash-protection} to ensure the
> +guard page is always touched.

OK

> +
> +@item MAP_LOCKED
> +A hint that requests that mapped pages are locked in memory (i.e. not
> +paged out).  Note that this is a request and not a requirement; use
> +@code{mlock} if locking is required.
> +
> +@item MAP_POPULATE
> +@itemx MAP_NONBLOCK
> +These two are opposites.  @code{MAP_POPULATE} is a hint that requests
> +that the kernel read-ahead a file-backed mapping, causing more pages
> +to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
> +that requests that the kernel @emph{not} attempt such, only mapping
> +pages when they're actually needed.  Note that neither of these hints
> +affects future paging activity, use @code{mlock} if such needs to be
> +controlled.

This explanation does not match my understanding of the mmap(2) man
page. MAP_NONBLOCK appears to be only meaningful in conjunction _with_
MAP_POPULATE. I suspect the goal here when those are combined is to
opportunistically populate the page table entries when those do not
require read-ahead from a file (AFAIU).

> +
> +@item MAP_NORESERVE
> +Asks the kernel to not reserve physical backing for a mapping.

What is "physical backing" ? I guess that you mean not backed by a swap
block device (or anything that requires I/O), but I am not sure that
"physical backing" conveys this clearly.

> This
> +would be useful for, for example, a very large but sparsely used
> +mapping which need not be limited in span by available RAM or swap.

I don't understand the meaning of this. How does not reserving swap
has anything to do with the virtual mapping size and its sparseness ?

> +Note that writes to such a mapping may cause a @code{SIGSEGV} if the
> +amount of backing required eventualy exceeds system resources.

It could be clarified that here "backing" does _not_ refer to physical
backing.

> +
> +On Linux, this flag's behavior may be overwridden by
> +@file{/proc/sys/vm/overcommit_memory} as documented in the proc(5) man
> +page.

OK

> +
> +@item MAP_STACK
> +Ensures that the resulting mapping is suitable for use as a program
> +stack.  For example, the use of huge pages might be precluded.

OK

> +
> +@item MAP_SYNC
> +This flag is used to map persistent memory devices into the running
> +program in such a way that writes to the mapping are immediately
> +written to the device as well.  Unlike most other flags, this one will
> +fail unless @code{MAP_SHARED_VALIDATE} is also given.

Note that this wording is misleading. Users of persistent memory devices
need to issue explicit "flush" instructions to ensure that writes are
made persistent to the device. The MAP_SYNC merely guarantees that
memory mappings within a file on a dax-enabled filesystem will appear
at the same file offset after a crash/reboot. It goes not guarantee
anything about write persistence.

> +
> +@item MAP_UNINITIALIZED
> +This flag allows the kernel to map anonymous pages without zeroing
> +them out first.  This is, of course, a security risk, and will only
> +work if the kernel is built to allow it (typically, on single-process
> +embedded systems).

OK

>  
>  @end vtable
>  
> @@ -1656,6 +1739,24 @@ Possible errors include:
>  
>  @table @code
>  
> +@item EACCES
> +
> +@var{filedes} was not open for the type of access specified in @var{protect}.
> +
> +@item EAGAIN
> +
> +The system has temporarily run out of resources.

or file has been locked.

> +
> +@item EBADF
> +
> +The @var{fd} passes is invalid, and a valid file descriptor is

passes -> passed ?

> +required (i.e. MAP_ANONYMOUS was not specified).
> +
> +@item EEXIST
> +
> +@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
> +found overlapping the requested address range.
> +
>  @item EINVAL
>  
>  Either @var{address} was unusable (because it is not a multiple of the
> @@ -1664,23 +1765,33 @@ applicable page size), or inconsistent @var{flags} were given.
>  If @code{MAP_HUGETLB} was specified, the file or system does not support
>  large page sizes.
>  
> -@item EACCES
> +@item ENODEV
>  
> -@var{filedes} was not open for the type of access specified in @var{protect}.
> +This file is of a type that doesn't support mapping.
>  
>  @item ENOMEM
>  
> -Either there is not enough memory for the operation, or the process is
> -out of address space.
> -
> -@item ENODEV
> -
> -This file is of a type that doesn't support mapping.
> +There is not enough memory for the operation, the process is out of
> +address space, or there are too many mappings.  On Linux, the maximum
> +number of mappings can be controlled via
> +@file{/proc/sys/vm/max_map_count} or, if your OS supports it, via
> +the @code{vm.max_map_count} @code{sysctl} setting.

Also getrlimit(2) RLIMIT_DATA exceeded, or @addr exceeds virtual address
space of the CPU.

>  
>  @item ENOEXEC
>  
>  The file is on a filesystem that doesn't support mapping.
>  
> +@item EPERM
> +
> +@code{PROT_EXEC} was requested but the file is on a filesystem that
> +was mounted with execution denied.

Also operation was prevented by a file seal (fcntl(2)).
Also MAP_HUGETLB flag was specified, but the caller was not priviledged.

> +
> +@item EOVERFLOW
> +
> +Either the offset into the file plus the length of the mapping causes
> +internal page counts to overflow, or the offset requested exceeds the
> +length of the file.
> +

Thanks,

Mathieu


>  @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
>  @c However mandatory locks are not discussed in this manual.
>  @c

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-18 20:13             ` Mathieu Desnoyers
@ 2024-06-18 20:57               ` DJ Delorie
  2024-06-21 13:02                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-18 20:57 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: libc-alpha

Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
> I will review this patch based on my understanding of the Linux mmap(2)
> manual from Linux man-pages 6.03 2023-02-05. It is very much possible
> that your intent is not to match that specific syscall man page, in
> which case feel free to dismiss my concerns.

My intent was to not cut-n-paste the man pages, as the licenses do not
allow that.  So I summarized the man pages into my notes, and at a later
date, confirmed those notes against the kernel sources, and wrote my
text.

So there's no intent to match or not match the man pages, just intent to
keep the work separated enough to not cause copyright problems.

>> +@item MAP_FIXED_NOREPLACE
>> +Similar to @code{MAP_FIXED} except the call will fail with
>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>> +To test for this, specify MAP_FIXED_NOREPLACE without MAP_FIXED, and
>> +check the actual address returned.  If it does not match the address
>> +passed, then this flag is not supported.
>
> mmap(2) states that older kernels fallback to non-MAP_FIXED behavior if
> the mapping would overwrite an existing mapping, which requires to
> carefully handle the return value. Is this backward-compatibility
> handling somehow abstracted within the libc wrapper ?

The man page says that older kernels which do not support
MAP_FIXED_NOREPLACE would act as if that flag were just omitted, which
means no MAP_FIXED either.  It does not imply that there's an older
kernel that *does* support MAP_FIXED_NOREPLACE but acts differently than
today.  In either case, testing the returned address to see if it
matches the desired one is the correct response.

There's no special backwards compatibility for mmap() in glibc, other
than to call mmap2 instead of mmap if available.

>> +@item MAP_POPULATE
>> +@itemx MAP_NONBLOCK
>> +These two are opposites.  @code{MAP_POPULATE} is a hint that requests
>> +that the kernel read-ahead a file-backed mapping, causing more pages
>> +to be mapped before they're needed.  @code{MAP_NONBLOCK} is a hint
>> +that requests that the kernel @emph{not} attempt such, only mapping
>> +pages when they're actually needed.  Note that neither of these hints
>> +affects future paging activity, use @code{mlock} if such needs to be
>> +controlled.
>
> This explanation does not match my understanding of the mmap(2) man
> page. MAP_NONBLOCK appears to be only meaningful in conjunction _with_
> MAP_POPULATE. I suspect the goal here when those are combined is to
> opportunistically populate the page table entries when those do not
> require read-ahead from a file (AFAIU).

Sigh ;-)

>> +
>> +@item MAP_NORESERVE
>> +Asks the kernel to not reserve physical backing for a mapping.
>
> What is "physical backing" ? I guess that you mean not backed by a swap
> block device (or anything that requires I/O), but I am not sure that
> "physical backing" conveys this clearly.

I'll reword this some.

>> +@item MAP_SYNC
>> +This flag is used to map persistent memory devices into the running
>> +program in such a way that writes to the mapping are immediately
>> +written to the device as well.  Unlike most other flags, this one will
>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>
> Note that this wording is misleading. Users of persistent memory devices
> need to issue explicit "flush" instructions to ensure that writes are
> made persistent to the device. The MAP_SYNC merely guarantees that
> memory mappings within a file on a dax-enabled filesystem will appear
> at the same file offset after a crash/reboot. It goes not guarantee
> anything about write persistence.

  "This flag is supported only for files supporting DAX (direct mapping
   of persistent memory)"

  "it will be visible in the same file at the same offset even after the
   system crashes or is rebooted."

That sounds like persistence to me?

>> +@item EAGAIN
>> +
>> +The system has temporarily run out of resources.
>
> or file has been locked.

I was asked to avoid mentioning certiain types of locks that glibc
chooses not to support...

>> +@item EBADF
>> +
>> +The @var{fd} passes is invalid, and a valid file descriptor is
>
> passes -> passed ?

Sigh ;-)

>> -@item ENODEV
>> -
>> -This file is of a type that doesn't support mapping.
>> +There is not enough memory for the operation, the process is out of
>> +address space, or there are too many mappings.  On Linux, the maximum
>> +number of mappings can be controlled via
>> +@file{/proc/sys/vm/max_map_count} or, if your OS supports it, via
>> +the @code{vm.max_map_count} @code{sysctl} setting.
>
> Also getrlimit(2) RLIMIT_DATA exceeded, or @addr exceeds virtual address
> space of the CPU.

Noted.


>> +@item EPERM
>> +
>> +@code{PROT_EXEC} was requested but the file is on a filesystem that
>> +was mounted with execution denied.
>
> Also operation was prevented by a file seal (fcntl(2)).
> Also MAP_HUGETLB flag was specified, but the caller was not priviledged.

Noted.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-14 18:46           ` [v4] " DJ Delorie
  2024-06-18 20:13             ` Mathieu Desnoyers
@ 2024-06-19  7:16             ` Florian Weimer
  1 sibling, 0 replies; 20+ messages in thread
From: Florian Weimer @ 2024-06-19  7:16 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

* DJ Delorie:

> +@item MAP_UNINITIALIZED
> +This flag allows the kernel to map anonymous pages without zeroing
> +them out first.  This is, of course, a security risk, and will only
> +work if the kernel is built to allow it (typically, on single-process
> +embedded systems).

This isn't currently part of our headers, I think.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-18 20:57               ` DJ Delorie
@ 2024-06-21 13:02                 ` Mathieu Desnoyers
  2024-06-21 16:17                   ` DJ Delorie
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2024-06-21 13:02 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

On 2024-06-18 16:57, DJ Delorie wrote:
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
[...]
> 
>>> +@item MAP_FIXED_NOREPLACE
>>> +Similar to @code{MAP_FIXED} except the call will fail with
>>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>>> +To test for this, specify MAP_FIXED_NOREPLACE without MAP_FIXED, and
>>> +check the actual address returned.  If it does not match the address
>>> +passed, then this flag is not supported.
>>
>> mmap(2) states that older kernels fallback to non-MAP_FIXED behavior if
>> the mapping would overwrite an existing mapping, which requires to
>> carefully handle the return value. Is this backward-compatibility
>> handling somehow abstracted within the libc wrapper ?
> 
> The man page says that older kernels which do not support
> MAP_FIXED_NOREPLACE would act as if that flag were just omitted, which
> means no MAP_FIXED either.  It does not imply that there's an older
> kernel that *does* support MAP_FIXED_NOREPLACE but acts differently than
> today.  In either case, testing the returned address to see if it
> matches the desired one is the correct response.
> 
> There's no special backwards compatibility for mmap() in glibc, other
> than to call mmap2 instead of mmap if available.

OK

[...]

> 
>>> +@item MAP_SYNC
>>> +This flag is used to map persistent memory devices into the running
>>> +program in such a way that writes to the mapping are immediately
>>> +written to the device as well.  Unlike most other flags, this one will
>>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>>
>> Note that this wording is misleading. Users of persistent memory devices
>> need to issue explicit "flush" instructions to ensure that writes are
>> made persistent to the device. The MAP_SYNC merely guarantees that
>> memory mappings within a file on a dax-enabled filesystem will appear
>> at the same file offset after a crash/reboot. It goes not guarantee
>> anything about write persistence.
> 
>    "This flag is supported only for files supporting DAX (direct mapping
>     of persistent memory)"
> 
>    "it will be visible in the same file at the same offset even after the
>     system crashes or is rebooted."
> 
> That sounds like persistence to me?

AFAIU it states persistence of the mapping location within the file, but
it does not guarantee anything about persistence of the writes to that 
mapping.

In many cases, persistence of the writes to the mapping are only made
persistent after specific cache-flushing instructions have been issued.

See this man page for reference:

https://manpages.debian.org/experimental/libpmem-dev/pmem_flush.3

> 
>>> +@item EAGAIN
>>> +
>>> +The system has temporarily run out of resources.
>>
>> or file has been locked.
> 
> I was asked to avoid mentioning certiain types of locks that glibc
> chooses not to support...

OK

[...]

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-21 13:02                 ` Mathieu Desnoyers
@ 2024-06-21 16:17                   ` DJ Delorie
  2024-06-21 16:20                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: DJ Delorie @ 2024-06-21 16:17 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: libc-alpha

Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
>>>> +@item MAP_SYNC
>>>> +This flag is used to map persistent memory devices into the running
>>>> +program in such a way that writes to the mapping are immediately
>>>> +written to the device as well.  Unlike most other flags, this one will
>>>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>>>
>>> Note that this wording is misleading. Users of persistent memory devices
>>> need to issue explicit "flush" instructions to ensure that writes are
>>> made persistent to the device. The MAP_SYNC merely guarantees that
>>> memory mappings within a file on a dax-enabled filesystem will appear
>>> at the same file offset after a crash/reboot. It goes not guarantee
>>> anything about write persistence.
>> 
>>    "This flag is supported only for files supporting DAX (direct mapping
>>     of persistent memory)"
>> 
>>    "it will be visible in the same file at the same offset even after the
>>     system crashes or is rebooted."
>> 
>> That sounds like persistence to me?
>
> AFAIU it states persistence of the mapping location within the file, but
> it does not guarantee anything about persistence of the writes to that 
> mapping.
>
> In many cases, persistence of the writes to the mapping are only made
> persistent after specific cache-flushing instructions have been issued.
>
> See this man page for reference:
>
> https://manpages.debian.org/experimental/libpmem-dev/pmem_flush.3

How about if I just say "This is a special flag for DAX devices." and
not bother trying to be helpful?  ;-)

All the kernel does is write out dirty metadata whenever dirty data is
written out, when the flag is set.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4] Update mmap() flags and errors lists
  2024-06-21 16:17                   ` DJ Delorie
@ 2024-06-21 16:20                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 20+ messages in thread
From: Mathieu Desnoyers @ 2024-06-21 16:20 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

On 2024-06-21 12:17, DJ Delorie wrote:
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
>>>>> +@item MAP_SYNC
>>>>> +This flag is used to map persistent memory devices into the running
>>>>> +program in such a way that writes to the mapping are immediately
>>>>> +written to the device as well.  Unlike most other flags, this one will
>>>>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>>>>
>>>> Note that this wording is misleading. Users of persistent memory devices
>>>> need to issue explicit "flush" instructions to ensure that writes are
>>>> made persistent to the device. The MAP_SYNC merely guarantees that
>>>> memory mappings within a file on a dax-enabled filesystem will appear
>>>> at the same file offset after a crash/reboot. It goes not guarantee
>>>> anything about write persistence.
>>>
>>>     "This flag is supported only for files supporting DAX (direct mapping
>>>      of persistent memory)"
>>>
>>>     "it will be visible in the same file at the same offset even after the
>>>      system crashes or is rebooted."
>>>
>>> That sounds like persistence to me?
>>
>> AFAIU it states persistence of the mapping location within the file, but
>> it does not guarantee anything about persistence of the writes to that
>> mapping.
>>
>> In many cases, persistence of the writes to the mapping are only made
>> persistent after specific cache-flushing instructions have been issued.
>>
>> See this man page for reference:
>>
>> https://manpages.debian.org/experimental/libpmem-dev/pmem_flush.3
> 
> How about if I just say "This is a special flag for DAX devices." and
> not bother trying to be helpful?  ;-)
> 

That works for me.

> All the kernel does is write out dirty metadata whenever dirty data is
> written out, when the flag is set.

Exactly, it's affecting how the page fault handler populates the file
metadata before allowing user-space to see the newly populated page
AFAIU.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-06-21 16:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10 18:59 Update mmap() flags and errors lists DJ Delorie
2024-06-04 22:16 ` Florian Weimer
2024-06-05  4:10   ` DJ Delorie
2024-06-05  6:38     ` Florian Weimer
2024-06-05 18:42       ` DJ Delorie
2024-06-14  8:14         ` Florian Weimer
2024-06-14 16:40           ` DJ Delorie
2024-06-05 18:50       ` [v3] " DJ Delorie
2024-06-14  8:21         ` Florian Weimer
2024-06-14 18:19           ` DJ Delorie
2024-06-14 18:46           ` [v4] " DJ Delorie
2024-06-18 20:13             ` Mathieu Desnoyers
2024-06-18 20:57               ` DJ Delorie
2024-06-21 13:02                 ` Mathieu Desnoyers
2024-06-21 16:17                   ` DJ Delorie
2024-06-21 16:20                     ` Mathieu Desnoyers
2024-06-19  7:16             ` Florian Weimer
2024-06-05  4:11   ` [v2] " DJ Delorie
2024-06-05  7:44     ` Andreas Schwab
2024-06-05 18:42       ` DJ Delorie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).