public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
@ 2023-11-18 18:48 jyescas at google dot com
  2023-11-18 18:50 ` [Bug dynamic-link/31076] " jyescas at google dot com
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-18 18:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

            Bug ID: 31076
           Summary: Extra struct vm_area_struct with ---p created when
                    PAGE_SIZE < max-page-size
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: jyescas at google dot com
  Target Milestone: ---

Created attachment 15224
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15224&action=edit
Contains the C files, headers and makefile to reproduce the issue.

# The story of the extra struct vm_area_struct and the loader

## Problem

When the page size is `4096` and shared libraries and binaries are
`16384/65536` elf aligned with the flag `-Wl,-z,max-page-size=[16384|65536]`
the dynamic linker (loader) creates an extra vm_area_struct for each segment.

This happens because the loader allocates a memory area big enough to
map the shared library and then mprotect every segment at
page size boundaries instead of a `p_align` boundary.

The size of `struct vm_area_struct` is 152 bytes in the `Linux 6.3.13` kernel.
The increase in the number of vm_area_struct is an issue for low end memory
devices (mobile devices, etc). We have seen an increase of 30MBs of
unreclaimable kernel memory on those devices due the extra vm_area_struct.

## Linux and gcc details

```
 $ lsb_release  -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

 $ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
```

## How to reproduce it (See attachments)

In a `4096` page size kernel, compile a shared library with 16384 or 65536 elf
alignment.

In the `sample` directory, there are these files:

```
 $ tree sample/
sample/
├── arithmetic.c
├── arithmetic.h
├── main.c
└── Makefile
```

To compile the shared library and main, run:

```
$ cd sample

# Build shared library and main
$ make build
```

To load and execute a program, there are 2 possible ways:

1. Ask the Linux kernel to load the program and dynamic linker, once that the
dynamic linker is loaded, it will load the shared libraries, in this case
`libarithmetic.so`.

```
$ make run
```

2. Ask the kernel to load the dynamic linker and then the dynamic linker will
load the program and shared libraries, in this case `libarithmetic.so`.

```
$ make runloader
```

### Linux kernel loads the program and dynamic linker

When the `Linux kernel` loads the program and dynamic linker, the dynamic
linker
loads the shared libraries, we can see that an extra vm_area_struct is used
with
the permissions `---p` in the shared libraries that are 16384/65536 elf
aligned.

```
 $ make run
LD_LIBRARY_PATH=./build ./build/main

# In another console, run
 $ cat /proc/1717808/maps
564d8e90f000-564d8e910000 r--p 00000000 08:05 2519504        /sample/build/main
564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504        /sample/build/main
564d8e92f000-564d8e930000 r--p 00020000 08:05 2519504        /sample/build/main
564d8e93f000-564d8e940000 r--p 00020000 08:05 2519504        /sample/build/main
564d8e940000-564d8e941000 rw-p 00021000 08:05 2519504        /sample/build/main
564d9055d000-564d9057e000 rw-p 00000000 00:00 0              [heap]
7ff6417ba000-7ff6417bd000 rw-p 00000000 00:00 0
7ff6417bd000-7ff6417df000 r--p 00000000 08:05 2623619       
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7ff6417df000-7ff641957000 r-xp 00022000 08:05 2623619       
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7ff641957000-7ff6419a5000 r--p 0019a000 08:05 2623619       
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7ff6419a5000-7ff6419a9000 r--p 001e7000 08:05 2623619       
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7ff6419a9000-7ff6419ab000 rw-p 001eb000 08:05 2623619       
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7ff6419ab000-7ff6419af000 rw-p 00000000 00:00 0
7ff6419c0000-7ff6419c1000 r--p 00000000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419c1000-7ff6419d0000 ---p 00001000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419d0000-7ff6419d1000 r-xp 00010000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419d1000-7ff6419e0000 ---p 00011000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419e0000-7ff6419e1000 r--p 00020000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419e1000-7ff6419f0000 ---p 00021000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419f0000-7ff6419f1000 r--p 00020000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419f1000-7ff6419f2000 rw-p 00021000 08:05 2519503       
/sample/build/libarithmetic.so
7ff6419f2000-7ff6419f4000 rw-p 00000000 00:00 0
7ff6419f4000-7ff6419f5000 r--p 00000000 08:05 2623613       
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7ff6419f5000-7ff641a18000 r-xp 00001000 08:05 2623613       
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7ff641a18000-7ff641a20000 r--p 00024000 08:05 2623613       
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7ff641a21000-7ff641a22000 r--p 0002c000 08:05 2623613       
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7ff641a22000-7ff641a23000 rw-p 0002d000 08:05 2623613       
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7ff641a23000-7ff641a24000 rw-p 00000000 00:00 0
7ffdfd748000-7ffdfd769000 rw-p 00000000 00:00 0              [stack]
7ffdfd77c000-7ffdfd77f000 r--p 00000000 00:00 0              [vvar]
7ffdfd77f000-7ffdfd780000 r-xp 00000000 00:00 0              [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0      [vsyscall]
```

> Note that in `main` there is not a vm_area_struct with `---p` permissions due
> the Linux kernel loads the program.

### Linux kernel ONLY loads the dynamic linker

When the Linux kernel `ONLY` loads the dynamic linker, and the dynamic linker
loads the shared libraries and program, we can see that an extra vm_area_struct
is used with the permissions `---p` in the shared libraries and program that
are 16384/65536 elf aligned.

```
 $ make runloader
LD_LIBRARY_PATH=./build /lib64/ld-linux-x86-64.so.2 ./build/main

# In another console, run
 $ cat /proc/1711495/maps
555557048000-555557069000 rw-p 00000000 00:00 0             [heap]
7f58a13c0000-7f58a13c3000 rw-p 00000000 00:00 0
7f58a13c3000-7f58a13e5000 r--p 00000000 08:05 2623619      
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f58a13e5000-7f58a155d000 r-xp 00022000 08:05 2623619      
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f58a155d000-7f58a15ab000 r--p 0019a000 08:05 2623619      
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f58a15ab000-7f58a15af000 r--p 001e7000 08:05 2623619      
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f58a15af000-7f58a15b1000 rw-p 001eb000 08:05 2623619      
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f58a15b1000-7f58a15b5000 rw-p 00000000 00:00 0
7f58a15c6000-7f58a15c7000 r--p 00000000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15c7000-7f58a15d6000 ---p 00001000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15d6000-7f58a15d7000 r-xp 00010000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15d7000-7f58a15e6000 ---p 00011000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15e6000-7f58a15e7000 r--p 00020000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15e7000-7f58a15f6000 ---p 00021000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15f6000-7f58a15f7000 r--p 00020000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15f7000-7f58a15f8000 rw-p 00021000 08:05 2519503      
/sample/build/libarithmetic.so
7f58a15f8000-7f58a15fa000 rw-p 00000000 00:00 0
7f58a15fa000-7f58a15fb000 r--p 00000000 08:05 2519504       /sample/build/main
7f58a15fb000-7f58a160a000 ---p 00001000 08:05 2519504       /sample/build/main
7f58a160a000-7f58a160b000 r-xp 00010000 08:05 2519504       /sample/build/main
7f58a160b000-7f58a161a000 ---p 00011000 08:05 2519504       /sample/build/main
7f58a161a000-7f58a161b000 r--p 00020000 08:05 2519504       /sample/build/main
7f58a161b000-7f58a162a000 ---p 00021000 08:05 2519504       /sample/build/main
7f58a162a000-7f58a162b000 r--p 00020000 08:05 2519504       /sample/build/main
7f58a162b000-7f58a162c000 rw-p 00021000 08:05 2519504       /sample/build/main
7f58a162c000-7f58a162d000 r--p 00000000 08:05 2623613      
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f58a162d000-7f58a1650000 r-xp 00001000 08:05 2623613      
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f58a1650000-7f58a1658000 r--p 00024000 08:05 2623613      
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f58a1659000-7f58a165a000 r--p 0002c000 08:05 2623613      
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f58a165a000-7f58a165b000 rw-p 0002d000 08:05 2623613      
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f58a165b000-7f58a165c000 rw-p 00000000 00:00 0
7ffe19e5c000-7ffe19e7d000 rw-p 00000000 00:00 0             [stack]
7ffe19f7e000-7ffe19f81000 r--p 00000000 00:00 0             [vvar]
7ffe19f81000-7ffe19f82000 r-xp 00000000 00:00 0             [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0     [vsyscall]
```

### Linux kernel page size is 4096 and the elf alignment is 4096

When a shared library and executable are linked using the flag
`-Wl,-z,max-page-size=4096`
and loaded by the dynamic linker, we can see that there is `NOT` an extra
vm_area_struct with
permissions `---p`.

```
 $ make runloader
LD_LIBRARY_PATH=./build /lib64/ld-linux-x86-64.so.2 ./build/main

# In another console, run
 $ cat /proc/1721958/maps
555556cef000-555556d10000 rw-p 00000000 00:00 0            [heap]
7f70bb246000-7f70bb249000 rw-p 00000000 00:00 0
7f70bb249000-7f70bb26b000 r--p 00000000 08:05 2623619     
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f70bb26b000-7f70bb3e3000 r-xp 00022000 08:05 2623619     
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f70bb3e3000-7f70bb431000 r--p 0019a000 08:05 2623619     
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f70bb431000-7f70bb435000 r--p 001e7000 08:05 2623619     
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f70bb435000-7f70bb437000 rw-p 001eb000 08:05 2623619     
/usr/lib/x86_64-linux-gnu/libc-2.31.so
7f70bb437000-7f70bb43b000 rw-p 00000000 00:00 0
7f70bb44c000-7f70bb44d000 r--p 00000000 08:05 2519503     
/sample/build/libarithmetic.so
7f70bb44d000-7f70bb44e000 r-xp 00001000 08:05 2519503     
/sample/build/libarithmetic.so
7f70bb44e000-7f70bb44f000 r--p 00002000 08:05 2519503     
/sample/build/libarithmetic.so
7f70bb44f000-7f70bb450000 r--p 00002000 08:05 2519503     
/sample/build/libarithmetic.so
7f70bb450000-7f70bb451000 rw-p 00003000 08:05 2519503     
/sample/build/libarithmetic.so
7f70bb451000-7f70bb453000 rw-p 00000000 00:00 0
7f70bb453000-7f70bb454000 r--p 00000000 08:05 2519504      /sample/build/main
7f70bb454000-7f70bb455000 r-xp 00001000 08:05 2519504      /sample/build/main
7f70bb455000-7f70bb456000 r--p 00002000 08:05 2519504      /sample/build/main
7f70bb456000-7f70bb457000 r--p 00002000 08:05 2519504      /sample/build/main
7f70bb457000-7f70bb458000 rw-p 00003000 08:05 2519504      /sample/build/main
7f70bb458000-7f70bb459000 r--p 00000000 08:05 2623613     
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f70bb459000-7f70bb47c000 r-xp 00001000 08:05 2623613     
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f70bb47c000-7f70bb484000 r--p 00024000 08:05 2623613     
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f70bb485000-7f70bb486000 r--p 0002c000 08:05 2623613     
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f70bb486000-7f70bb487000 rw-p 0002d000 08:05 2623613     
/usr/lib/x86_64-linux-gnu/ld-2.31.so
7f70bb487000-7f70bb488000 rw-p 00000000 00:00 0
7ffe42e67000-7ffe42e88000 rw-p 00000000 00:00 0            [stack]
7ffe42ebe000-7ffe42ec1000 r--p 00000000 00:00 0            [vvar]
7ffe42ec1000-7ffe42ec2000 r-xp 00000000 00:00 0            [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0    [vsyscall]
```

## Solution 1 - Modify the dynamic linker

When the dynamic linker loads the shared libraries, extend the `vm_area_struct`
to be at a `p_align` boundary. This is done by passing to `mprotect()` the
right address space ranges.

## Solution 2 - Modify the static linker

When the static linker creates the program segments, make sure that the
`p_filesz` and `p_memsz` are extended to a `p_align` boundary. This can be done
by inserting a new ELF section at every PT_LOAD ELF segment that will serve as
padding.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
@ 2023-11-18 18:50 ` jyescas at google dot com
  2023-11-21 13:42 ` carlos at redhat dot com
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-18 18:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Juan Yescas <jyescas at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jyescas at google dot com

--- Comment #1 from Juan Yescas <jyescas at google dot com> ---
As per my conversation with Carlos O'Donell, he or Siddehesh Poyarekar can take
a look at the issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
  2023-11-18 18:50 ` [Bug dynamic-link/31076] " jyescas at google dot com
@ 2023-11-21 13:42 ` carlos at redhat dot com
  2023-11-22  0:39 ` i at maskray dot me
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: carlos at redhat dot com @ 2023-11-21 13:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-11-21
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
Juan, Thank you for filling this issue! It was great to meet you at LPC 2023
and discuss the problem. I'll have a look at the details and think about
possible solutions. I'm not sure yet if this is something the loader can do or
if the static linker has arrange the PT_LOAD segments in the right way.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
  2023-11-18 18:50 ` [Bug dynamic-link/31076] " jyescas at google dot com
  2023-11-21 13:42 ` carlos at redhat dot com
@ 2023-11-22  0:39 ` i at maskray dot me
  2023-11-22  4:33 ` kaleshsingh at google dot com
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: i at maskray dot me @ 2023-11-22  0:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Fangrui Song <i at maskray dot me> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |i at maskray dot me

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (2 preceding siblings ...)
  2023-11-22  0:39 ` i at maskray dot me
@ 2023-11-22  4:33 ` kaleshsingh at google dot com
  2023-11-22 18:19 ` jyescas at google dot com
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: kaleshsingh at google dot com @ 2023-11-22  4:33 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Kalesh Singh <kaleshsingh at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kaleshsingh at google dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (3 preceding siblings ...)
  2023-11-22  4:33 ` kaleshsingh at google dot com
@ 2023-11-22 18:19 ` jyescas at google dot com
  2023-11-23 11:42 ` sam at gentoo dot org
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-22 18:19 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #3 from Juan Yescas <jyescas at google dot com> ---
Thanks Carlos, It was nice to meet you too. I am looking forward to
collaborating with you.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (4 preceding siblings ...)
  2023-11-22 18:19 ` jyescas at google dot com
@ 2023-11-23 11:42 ` sam at gentoo dot org
  2023-11-24 17:40 ` adhemerval.zanella at linaro dot org
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: sam at gentoo dot org @ 2023-11-23 11:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sam at gentoo dot org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (5 preceding siblings ...)
  2023-11-23 11:42 ` sam at gentoo dot org
@ 2023-11-24 17:40 ` adhemerval.zanella at linaro dot org
  2023-11-27 15:11 ` fweimer at redhat dot com
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-24 17:40 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #4 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
In ancient times, ldso used to unmap the disjoined regions from PT_LOAD.  It
was changed decades ago (22930c9bf21ea15d0da1477a379029e2de259b69), most likely
due some kernel VMA limitation since it done between Linux 1.3 to 2.0 release. 
Changing back to unmap should remove the extra VMA ranges:

diff --git a/elf/dl-load.h b/elf/dl-load.h
index 1d5207694b..f53983fd1f 100644
--- a/elf/dl-load.h
+++ b/elf/dl-load.h
@@ -124,6 +124,8 @@ static const char *_dl_map_segments (struct link_map *l,
int fd,
    guaranteed to have translations.  */
 #define DL_MAP_SEGMENTS_ERROR_MAP_SEGMENT \
   N_("failed to map segment from shared object")
+#define DL_MAP_SEGMENTS_ERROR_UNMAP_SEGMENT \
+  N_("failed to unmap segment from shared object")
 #define DL_MAP_SEGMENTS_ERROR_MPROTECT \
   N_("cannot change memory protections")
 #define DL_MAP_SEGMENTS_ERROR_MAP_ZERO_FILL \
diff --git a/elf/dl-map-segments.h b/elf/dl-map-segments.h
index ac10182d58..7ecb1d917b 100644
--- a/elf/dl-map-segments.h
+++ b/elf/dl-map-segments.h
@@ -115,11 +115,10 @@ _dl_map_segments (struct link_map *l, int fd,
          if (__glibc_unlikely (loadcmds[nloadcmds - 1].mapstart <
                                c->mapend))
            return N_("ELF load command address/offset not page-aligned");
-          if (__glibc_unlikely
-              (__mprotect ((caddr_t) (l->l_addr + c->mapend),
-                           loadcmds[nloadcmds - 1].mapstart - c->mapend,
-                           PROT_NONE) < 0))
-            return DL_MAP_SEGMENTS_ERROR_MPROTECT;
+         if (__glibc_unlikely
+             (__munmap ((caddr_t) (l->l_addr + c->mapend),
+                        loadcmds[nloadcmds - 1].mapstart - c->mapend) < 0))
+            return DL_MAP_SEGMENTS_ERROR_UNMAP_SEGMENT;
         }

       l->l_contiguous = 1;

It would be somewhat more coslty, since kernel will need to remap and split the
VMA range.

However, it is not really clear to me what is the advantage of mprotect/munmap
the scenario where the PT_LOAD might have 'holes' (essentially where the system
page size is lower than the required alignment).  I can understand the munmap,
since it might have some sense back on 32 bit deployments were common and VMA
range was scarse in some situation (specially due fragmentation).

But the mprotect does not relly bring any advantage here, it consumes more
kernel metadata, do not improve the process VMA range usage, nor it was done as
a hardening scheme.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (6 preceding siblings ...)
  2023-11-24 17:40 ` adhemerval.zanella at linaro dot org
@ 2023-11-27 15:11 ` fweimer at redhat dot com
  2023-11-27 15:22 ` fweimer at redhat dot com
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-11-27 15:11 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #5 from Florian Weimer <fweimer at redhat dot com> ---
I don't see how a 152 byte struct results in 30 MB of unreclaimable kernel
memory.  Wouldn't that need ~200,000 instances? That seems really large. I've
only got ~70,000 lines in /proc/*/maps on this desktop system, and not all
these mappings will exhibit this issue.

Would it help if we use MAP_FIXED with PROT_NONE to map over these unused
parts? But as far as I understand it, these tails have not been written to, so
it shouldn't matter if the underlying memory needs to be preserved by the
kernel or not.

Regarding not doing the mprotect altogether, I believe this would result in a
loss of functionality. Today, you can use the current behavior to get as few
gadgets as possible in the process image on systems with smaller page sizes,
while still maintaining run-time compatibility with larger page sizes and
avoiding on-disk padding (which would increase file size). If we stop doing the
mprotect, then the gadgets would become visible even with smaller page sizes.
At least in principle, it should be possible for a link editor to produce
objects that do not require tail adjustment because the load segments are
usable as-is (but I understand that there are link editor limitations in this
area today).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (7 preceding siblings ...)
  2023-11-27 15:11 ` fweimer at redhat dot com
@ 2023-11-27 15:22 ` fweimer at redhat dot com
  2023-11-27 16:27 ` adhemerval.zanella at linaro dot org
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-11-27 15:22 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
I forgot to mention that we have assumptions in the code base that shared
objects *we* map are contiguous, so Adhemerval's patch isn't correct as -is.
(We tolerate non-contiguous mappings created by the kernel, both for the main
program and ld.so.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (8 preceding siblings ...)
  2023-11-27 15:22 ` fweimer at redhat dot com
@ 2023-11-27 16:27 ` adhemerval.zanella at linaro dot org
  2023-11-27 17:19 ` fweimer at redhat dot com
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-27 16:27 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #7 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Florian Weimer from comment #5)
> I don't see how a 152 byte struct results in 30 MB of unreclaimable kernel
> memory.  Wouldn't that need ~200,000 instances? That seems really large.
> I've only got ~70,000 lines in /proc/*/maps on this desktop system, and not
> all these mappings will exhibit this issue.
> 
> Would it help if we use MAP_FIXED with PROT_NONE to map over these unused
> parts? But as far as I understand it, these tails have not been written to,
> so it shouldn't matter if the underlying memory needs to be preserved by the
> kernel or not.
> 
> Regarding not doing the mprotect altogether, I believe this would result in
> a loss of functionality. Today, you can use the current behavior to get as
> few gadgets as possible in the process image on systems with smaller page
> sizes, while still maintaining run-time compatibility with larger page sizes
> and avoiding on-disk padding (which would increase file size). If we stop
> doing the mprotect, then the gadgets would become visible even with smaller
> page sizes. At least in principle, it should be possible for a link editor
> to produce objects that do not require tail adjustment because the load
> segments are usable as-is (but I understand that there are link editor
> limitations in this area today).

So the mprotect is essentially a hardening feature, assuming that the dynamic
object padding/holes might contain gadgets.  It still does not happen for
loader and main program itself, since normally they would be mapped by the
kernel and its does do anything with holes, and IMHO it should up to the static
linker to fill the padding with NOP/trap instruction to avoid such issues. 

So I am not fully convinced that the mprotect is really helping much here.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (9 preceding siblings ...)
  2023-11-27 16:27 ` adhemerval.zanella at linaro dot org
@ 2023-11-27 17:19 ` fweimer at redhat dot com
  2023-11-27 17:39 ` adhemerval.zanella at linaro dot org
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-11-27 17:19 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #8 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Adhemerval Zanella from comment #7)
> So the mprotect is essentially a hardening feature, assuming that the
> dynamic object padding/holes might contain gadgets.  It still does not
> happen for loader and main program itself, since normally they would be
> mapped by the kernel and its does do anything with holes,

There's no expectation that these are contiguous in memory, yes. Such an
expectation exists for shared objects loaded by glibc.

> and IMHO it should
> up to the static linker to fill the padding with NOP/trap instruction to
> avoid such issues. 

That requires padding to the maximum page size on disk, though.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (10 preceding siblings ...)
  2023-11-27 17:19 ` fweimer at redhat dot com
@ 2023-11-27 17:39 ` adhemerval.zanella at linaro dot org
  2023-11-27 17:45 ` fweimer at redhat dot com
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-27 17:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #9 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Florian Weimer from comment #8)
> (In reply to Adhemerval Zanella from comment #7)
> > So the mprotect is essentially a hardening feature, assuming that the
> > dynamic object padding/holes might contain gadgets.  It still does not
> > happen for loader and main program itself, since normally they would be
> > mapped by the kernel and its does do anything with holes,
> 
> There's no expectation that these are contiguous in memory, yes. Such an
> expectation exists for shared objects loaded by glibc.

Right, but my point is if this is a hardening feature that glibc aims to
provide it does not help if the kernel still does not provide it for the loader
(if this is built with a different page size) and the main program itself.
Should we push for kernel to implement a similar handling?

And the holes seem to be always from the initial read-only segment, which makes
the whole gadget avoidance argument moot. This would make sense back when RELRO
was not wildly used/deployed (even with the current somewhat broken status),
but it still does not make much sense to me.

> 
> > and IMHO it should
> > up to the static linker to fill the padding with NOP/trap instruction to
> > avoid such issues. 
> 
> That requires padding to the maximum page size on disk, though.

But this is what binutils does [1], and while not being optimized it seems that
it would be somewhat hard to fix it (at least with Fangrui Song suggestion).

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=30612

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (11 preceding siblings ...)
  2023-11-27 17:39 ` adhemerval.zanella at linaro dot org
@ 2023-11-27 17:45 ` fweimer at redhat dot com
  2023-11-27 17:58 ` adhemerval.zanella at linaro dot org
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-11-27 17:45 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #10 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Adhemerval Zanella from comment #9)
> (In reply to Florian Weimer from comment #8)
> > (In reply to Adhemerval Zanella from comment #7)
> > > So the mprotect is essentially a hardening feature, assuming that the
> > > dynamic object padding/holes might contain gadgets.  It still does not
> > > happen for loader and main program itself, since normally they would be
> > > mapped by the kernel and its does do anything with holes,
> > 
> > There's no expectation that these are contiguous in memory, yes. Such an
> > expectation exists for shared objects loaded by glibc.
> 
> Right, but my point is if this is a hardening feature that glibc aims to
> provide it does not help if the kernel still does not provide it for the
> loader (if this is built with a different page size) and the main program
> itself. Should we push for kernel to implement a similar handling?

The kernel effectively implements the munmap behavior, I think. It creates
unmapped holes, not PROT_MEM reserved holes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (12 preceding siblings ...)
  2023-11-27 17:45 ` fweimer at redhat dot com
@ 2023-11-27 17:58 ` adhemerval.zanella at linaro dot org
  2023-11-27 19:47 ` jyescas at google dot com
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-27 17:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #11 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Florian Weimer from comment #10)
> (In reply to Adhemerval Zanella from comment #9)
> > (In reply to Florian Weimer from comment #8)
> > > (In reply to Adhemerval Zanella from comment #7)
> > > > So the mprotect is essentially a hardening feature, assuming that the
> > > > dynamic object padding/holes might contain gadgets.  It still does not
> > > > happen for loader and main program itself, since normally they would be
> > > > mapped by the kernel and its does do anything with holes,
> > > 
> > > There's no expectation that these are contiguous in memory, yes. Such an
> > > expectation exists for shared objects loaded by glibc.
> > 
> > Right, but my point is if this is a hardening feature that glibc aims to
> > provide it does not help if the kernel still does not provide it for the
> > loader (if this is built with a different page size) and the main program
> > itself. Should we push for kernel to implement a similar handling?
> 
> The kernel effectively implements the munmap behavior, I think. It creates
> unmapped holes, not PROT_MEM reserved holes.

It does indeed, which makes me realize that _dl_find_object does not work
correctly if the loader is built with -Wl,-z,max-page-size= different than the
system one.  The resulting address of the tst-dl_find_object will be:

      0x555555554000     0x555555556000     0x2000        0x0  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/tst-dl_find_object
      0x555555556000     0x55555555a000     0x4000     0x2000  r-xp  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/tst-dl_find_object
      0x55555555a000     0x55555555c000     0x2000     0x6000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/tst-dl_find_object
      0x55555555c000     0x55555555d000     0x1000     0x7000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/tst-dl_find_object
      0x55555555d000     0x55555555e000     0x1000     0x8000  rw-p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/tst-dl_find_object
      0x7ffff7c00000     0x7ffff7c26000    0x26000        0x0  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so
      0x7ffff7c26000     0x7ffff7d9a000   0x174000    0x26000  r-xp  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so
      0x7ffff7d9a000     0x7ffff7df1000    0x57000   0x19a000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so
      0x7ffff7df1000     0x7ffff7df5000     0x4000   0x1f0000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so
      0x7ffff7df5000     0x7ffff7df7000     0x2000   0x1f4000  rw-p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so
      0x7ffff7df7000     0x7ffff7e04000     0xd000        0x0  rw-p
      0x7ffff7f9d000     0x7ffff7f9e000     0x1000        0x0  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld.so
      0x7ffff7fad000     0x7ffff7fd3000    0x26000    0x10000  r-xp  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld.so
      0x7ffff7fdd000     0x7ffff7fe7000     0xa000    0x40000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld.so
      0x7ffff7fef000     0x7ffff7ff0000     0x1000        0x0  rw-s   /dev/zero
(deleted)
      0x7ffff7ff0000     0x7ffff7ff5000     0x5000        0x0  rw-p
      0x7ffff7ff5000     0x7ffff7ff9000     0x4000        0x0  r--p   [vvar]
      0x7ffff7ff9000     0x7ffff7ffb000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7ffb000     0x7ffff7ffd000     0x2000    0x4e000  r--p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld.so
      0x7ffff7ffd000     0x7ffff7fff000     0x2000    0x50000  rw-p  
/home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld.so
      0x7ffffffdd000     0x7ffffffff000    0x22000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp  
[vsyscall]

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (13 preceding siblings ...)
  2023-11-27 17:58 ` adhemerval.zanella at linaro dot org
@ 2023-11-27 19:47 ` jyescas at google dot com
  2023-11-27 19:55 ` jyescas at google dot com
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-27 19:47 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #12 from Juan Yescas <jyescas at google dot com> ---
(In reply to Florian Weimer from comment #5)
> I don't see how a 152 byte struct results in 30 MB of unreclaimable kernel
> memory.  Wouldn't that need ~200,000 instances? That seems really large.
> I've only got ~70,000 lines in /proc/*/maps on this desktop system, and not
> all these mappings will exhibit this issue.
> 

When we compile all the shared libraries in the system with
-Wl,-z,max-page-size=65536 and use 4k page size kernel, for every PT_LOAD
segment (except the last) loaded in memory, there will be a vm_area_struct with
---p permissions.

In mobile devices we have seen an increased of 120K vm_area_structs due
-Wl,-z,max-page-size=65536. There used to be 40K vm_area_structs. This is an
increase of ~18MB (120K * 152).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (14 preceding siblings ...)
  2023-11-27 19:47 ` jyescas at google dot com
@ 2023-11-27 19:55 ` jyescas at google dot com
  2023-11-28  8:48 ` rprichard at google dot com
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-27 19:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #13 from Juan Yescas <jyescas at google dot com> ---
> > Right, but my point is if this is a hardening feature that glibc aims to
> > provide it does not help if the kernel still does not provide it for the
> > loader (if this is built with a different page size) and the main program
> > itself. Should we push for kernel to implement a similar handling?
> 
> The kernel effectively implements the munmap behavior, I think. It creates
> unmapped holes, not PROT_MEM reserved holes.

That is right. The kernel vm_munmap the holes for the program image and dynamic
linker image:

Code that vm_munmap the holes.
https://elixir.bootlin.com/linux/v6.6.2/source/fs/binfmt_elf.c#L397

Code that loads the program image segments.
https://elixir.bootlin.com/linux/v6.6.2/source/fs/binfmt_elf.c#L1165

Code that loads the dynamic linker segments.
https://elixir.bootlin.com/linux/v6.6.2/source/fs/binfmt_elf.c#L397

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (15 preceding siblings ...)
  2023-11-27 19:55 ` jyescas at google dot com
@ 2023-11-28  8:48 ` rprichard at google dot com
  2023-11-28 18:59 ` kaleshsingh at google dot com
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rprichard at google dot com @ 2023-11-28  8:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

Ryan Prichard <rprichard at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rprichard at google dot com

--- Comment #14 from Ryan Prichard <rprichard at google dot com> ---
> When we compile all the shared libraries in the system with -Wl,-z,max-page-size=65536 and use 4k page size kernel, for every PT_LOAD segment (except the last) loaded in memory, there will be a vm_area_struct with ---p permissions.

> In mobile devices we have seen an increased of 120K vm_area_structs due -Wl,-z,max-page-size=65536. There used to be 40K vm_area_structs. This is an increase of ~18MB (120K * 152).

I would've expected the number of VMAs to nearly double, but it sounds like
they quadrupled instead?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (16 preceding siblings ...)
  2023-11-28  8:48 ` rprichard at google dot com
@ 2023-11-28 18:59 ` kaleshsingh at google dot com
  2023-11-28 23:58 ` jyescas at google dot com
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: kaleshsingh at google dot com @ 2023-11-28 18:59 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #15 from Kalesh Singh <kaleshsingh at google dot com> ---
(In reply to Adhemerval Zanella from comment #9)
> (In reply to Florian Weimer from comment #8)
> > (In reply to Adhemerval Zanella from comment #7)
> > > So the mprotect is essentially a hardening feature, assuming that the
> > > dynamic object padding/holes might contain gadgets.  It still does not
> > > happen for loader and main program itself, since normally they would be
> > > mapped by the kernel and its does do anything with holes,
> > 
> > There's no expectation that these are contiguous in memory, yes. Such an
> > expectation exists for shared objects loaded by glibc.
> 
> Right, but my point is if this is a hardening feature that glibc aims to
> provide it does not help if the kernel still does not provide it for the
> loader (if this is built with a different page size) and the main program
> itself. Should we push for kernel to implement a similar handling?
> 
> And the holes seem to be always from the initial read-only segment, which
> makes the whole gadget avoidance argument moot. This would make sense back
> when RELRO was not wildly used/deployed (even with the current somewhat
> broken status), but it still does not make much sense to me.
> 
> > 
> > > and IMHO it should
> > > up to the static linker to fill the padding with NOP/trap instruction to
> > > avoid such issues. 


Hi folks,

I was wondering if this is really a security feature or rather a result of how
the original VA space is reserved (split VMAs from the original mapping).

AIUI if the runtime-page-size equals the max-page-size, the holes are also
mapped in as part of the segment mapping and share the same permissions. Does
this mean that on such systems, any protection it offers becomes void?

Sorry if dumb questions (I'm not too familiar with this area)

Thanks,
Kalesh


> > 
> > That requires padding to the maximum page size on disk, though.
> 
> But this is what binutils does [1], and while not being optimized it seems
> that it would be somewhat hard to fix it (at least with Fangrui Song
> suggestion).
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=30612

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (17 preceding siblings ...)
  2023-11-28 18:59 ` kaleshsingh at google dot com
@ 2023-11-28 23:58 ` jyescas at google dot com
  2023-12-02 17:08 ` i at maskray dot me
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jyescas at google dot com @ 2023-11-28 23:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #16 from Juan Yescas <jyescas at google dot com> ---
(In reply to Ryan Prichard from comment #14)
> > When we compile all the shared libraries in the system with -Wl,-z,max-page-size=65536 and use 4k page size kernel, for every PT_LOAD segment (except the last) loaded in memory, there will be a vm_area_struct with ---p permissions.
> 
> > In mobile devices we have seen an increased of 120K vm_area_structs due -Wl,-z,max-page-size=65536. There used to be 40K vm_area_structs. This is an increase of ~18MB (120K * 152).
> 
> I would've expected the number of VMAs to nearly double, but it sounds like
> they quadrupled instead?


Sorry for the confusion, I was comparing different devices.

These are results where we can see the increase in extra vmas when the
alignment is max-page-size=65536.

VMA size = 200
Loaded ELF segments = 181806
PROT_NONE (---p) gaps = 124297    <- This is the # of extra vmas
Slab Memory ELF Usage: 34 MB
Slab Memory noneprot Usage: 23 MB

The script that was used is below:

```
#!/bin/sh

vma_size=$(cat /proc/slabinfo | grep vm_area_struct | awk '{ print $4 }')
echo "VMA size = $vma_size"

nr_elf_segs=$(cat /proc/*/maps | grep '\.so' | wc -l)
echo "Loaded ELF segments = $nr_elf_segs"

nr_noneprot=$(cat /proc/*/maps | grep -A1 '\.so' | grep '\---p' | wc -l)
echo "PROT_NONE (---p) gaps = $nr_noneprot"

slab_mem_mb=$(echo "$vma_size * $nr_elf_segs / 1024 / 1024" | bc)
echo "Slab Memory ELF Usage: $slab_mem_mb MB"

slab_mem_mb=$(echo "$vma_size * $nr_noneprot / 1024 / 1024" | bc)
echo "Slab Memory PROTNONE GAP Usage: $slab_mem_mb MB"
```

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (18 preceding siblings ...)
  2023-11-28 23:58 ` jyescas at google dot com
@ 2023-12-02 17:08 ` i at maskray dot me
  2023-12-06 11:57 ` fweimer at redhat dot com
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: i at maskray dot me @ 2023-12-02 17:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #17 from Fangrui Song <i at maskray dot me> ---
Let's refer to the first two build/map maps in comment 0 for discussion.

    # In another console, run
     $ cat /proc/1717808/maps
    564d8e90f000-564d8e910000 r--p 00000000 08:05 2519504       
/sample/build/main
    564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504       
/sample/build/main
    ...

We have three choices for the gap.

1. Do not create a map for the gap (separate mmap using MAP_FIXED, or a large
mmap with explicit munmap).

Linux kernel fs/binfmt_elf.c employs this approach. However, there is a risk
that an unrelated future mmap may be placed in the gap:

    564d8e90f000-564d8e910000 r--p 00000000 08:05 2519504       
/sample/build/main
       ================ an unrelated mmap may be placed in the gap
    564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504       
/sample/build/main

If there is no map in between, there is nothing lose. Accessing the gap will
cause SIGSEGV.

Is the potential occurrence of an unrelated mmap considered a compromise in the
hardening features?
Personally, I don't think this poses a significant issue. The program does not
access the gaps.
We can even guarantee this property for direct access when input relocations to
the linker use symbols with in-bounds addends (e.g. when x is defined relative
to an input section, we know R_X86_64_PC32(x) must be in-bounds).

However, some programs may expect contiguous maps of a file (such as when glibc
link_map::l_contiguous is set to 1).
Does this choice render the program exploitable if an attacker can ensure a map
within the gap instead of outside the file?
It seems to me that they could achieve everything with a map outside of the
file.

Having said that, the presence of an unrelated map between maps associated with
a single file descriptor remains odd, so it's preferable to avoid it if
possible (Choice 3 below).

2. Map the gap with the associated fd

This approach is commonly adopted by many dynamic loaders.
It involves a large PROT_READ mmap covering the whole file followed by several
mmap instances using MAP_FIXED and the relevant flags.

The gaps may have permissions set as r--p or ---p, resulting in additional
instances of the `struct vm_area_struct`.
Within glibc/elf/dl-map-segments.h, the `has_holes` code calls mprotect to
change r--p permissions to ---p:

    564d8e90f000-564d8e910000 r--p 00000000 08:05 2519504       
/sample/build/main
    564d8e910000-564d8e91f000 ---p 00001000 08:05 2519504       
/sample/build/main
    564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504       
/sample/build/main

Although ---p might be seen as a hardening measure, personally, I don't believe
it significantly impacts the exploitability.
While there might be plenty of gadgets in r-xp sections, reducing gadgets in
r--p sections doesn't seem notably useful.
(https://isopenbsdsecu.re/mitigations/rop_removal/)

3. Extend the map end to cover the gap

    564d8e90f000-**564d8e91f000** r--p 00000000 08:05 2519504       
/sample/build/main  (the end is extended)
    564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504       
/sample/build/main

The loader code can adjust p_filesz and p_memsz when invoking mmap.

By extending the map size to a multiple of the maximum page size, it benefits
transparent huge pages or a userspace hugepage remapper.

Right now Android Bionic is exploring this choice:
https://r.android.com/2796855

---

Personally I favor "3. Extend the map end to cover the gap" the most.
I've also pondered whether this falls under the purview of linkers. Such a
change seems intrusive and unsightly.
If the linker extends the end of p_memsz to cover the gap, should it also
extend p_filesz?

* If it doesn't, we create a PT_LOAD with p_filesz/p_memsz that is not for BSS,
which is weird.
* If it does, we have an output file featuring overlapping file offset ranges,
which is weird as well.

Moreover, a PT_LOAD whose end isn't backed by a section is unusual. I'm
concerned that many binary manipulation tools may not handle this case
correctly.
Utilizing a linker script can intentionally create discontiguous address
ranges. I'm concerned that the linker might not discern such cases with
intelligent logic regarding p_filesz/p_memsz.

This feature request seems to be within the realm of loaders and specific
information, such as the page size, is only accessible to loaders.
I believe loaders are better equipped to handle this task."

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (19 preceding siblings ...)
  2023-12-02 17:08 ` i at maskray dot me
@ 2023-12-06 11:57 ` fweimer at redhat dot com
  2023-12-07  5:11 ` i at maskray dot me
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-12-06 11:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #18 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Kalesh Singh from comment #15)
> AIUI if the runtime-page-size equals the max-page-size, the holes are also
> mapped in as part of the segment mapping and share the same permissions.
> Does this mean that on such systems, any protection it offers becomes void?

That's my understanding. The current behavior gives programmers the *option* to
avoid mapping extra code on small-page systems, while maintaining compatibility
with large-page systems. The proposed change to over-map to the next
load-segment, to fill the gap, would take away that option. Unmapping (to
create a gap) has compatibility implications.

Just to clarify, I'm opposed to a glibc change here mainly because it's not
necessary to enable the desired behavior (fewer VMAs), and it removes
functionality that people might find useful. The link editor merely needs to
over-map in the generated LOAD segments, then there won't be any gaps for the
loader to process.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (20 preceding siblings ...)
  2023-12-06 11:57 ` fweimer at redhat dot com
@ 2023-12-07  5:11 ` i at maskray dot me
  2023-12-07  9:30 ` fweimer at redhat dot com
  2023-12-08  3:22 ` i at maskray dot me
  23 siblings, 0 replies; 25+ messages in thread
From: i at maskray dot me @ 2023-12-07  5:11 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #19 from Fangrui Song <i at maskray dot me> ---
(In reply to Florian Weimer from comment #18)
> (In reply to Kalesh Singh from comment #15)
> > AIUI if the runtime-page-size equals the max-page-size, the holes are also
> > mapped in as part of the segment mapping and share the same permissions.
> > Does this mean that on such systems, any protection it offers becomes void?
> 
> That's my understanding. The current behavior gives programmers the *option*
> to avoid mapping extra code on small-page systems, while maintaining
> compatibility with large-page systems. The proposed change to over-map to
> the next load-segment, to fill the gap, would take away that option.

Can you elaborate what the option (and "functionality" below) is? Since
22930c9bf21ea15d0da1477a379029e2de259b69 (1996) rtld just calls `mprotect` (one
extra `struct vm_area_struct` instance, see `sudo grep vm_area_struct
/proc/slabinfo`), and the user has no option to control this behavior.

> Unmapping (to create a gap) has compatibility implications.

I wonder the implications are. If they are related to l_contiguous not
accounted for correctly, the user needs to be fixed.

> Just to clarify, I'm opposed to a glibc change here mainly because it's not
> necessary to enable the desired behavior (fewer VMAs), and it removes
> functionality that people might find useful.

See my question above.

> The link editor merely needs to over-map in the generated LOAD segments, then
> there won't be any gaps for the loader to process.

I have thought about this but I feel that a linker change would be intrusive
and unsightly: "I've also pondered whether this falls under the purview of
linkers." in comment 17.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (21 preceding siblings ...)
  2023-12-07  5:11 ` i at maskray dot me
@ 2023-12-07  9:30 ` fweimer at redhat dot com
  2023-12-08  3:22 ` i at maskray dot me
  23 siblings, 0 replies; 25+ messages in thread
From: fweimer at redhat dot com @ 2023-12-07  9:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #20 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Fangrui Song from comment #19)
> (In reply to Florian Weimer from comment #18)
> > (In reply to Kalesh Singh from comment #15)
> > > AIUI if the runtime-page-size equals the max-page-size, the holes are also
> > > mapped in as part of the segment mapping and share the same permissions.
> > > Does this mean that on such systems, any protection it offers becomes void?
> > 
> > That's my understanding. The current behavior gives programmers the *option*
> > to avoid mapping extra code on small-page systems, while maintaining
> > compatibility with large-page systems. The proposed change to over-map to
> > the next load-segment, to fill the gap, would take away that option.
> 
> Can you elaborate what the option (and "functionality" below) is? Since
> 22930c9bf21ea15d0da1477a379029e2de259b69 (1996) rtld just calls `mprotect`
> (one
> extra `struct vm_area_struct` instance, see `sudo grep vm_area_struct
> /proc/slabinfo`), and the user has no option to control this behavior.

The developer can produce a binary with a LOAD segment that does not need that
mprotect call.

> > Unmapping (to create a gap) has compatibility implications.
> 
> I wonder the implications are. If they are related to l_contiguous not
> accounted for correctly, the user needs to be fixed.

We do not set l_contiguous for the dynamic loader, and it's an internal
variable. The bug we have is that we assume (without checking) the the loader
is mapped contiguously although the kernel does not do that.

We have feedback from downstream that some tools break if the gaps are so large
that objects are mapped in the gaps of other objects. This is particularly
visible on x86-64 when objects linked with different binutils versions/flags
are loaded because the gaps can be quite large due to 2 MiB load segment
alignment in some builds. Other objects do not do that alignment and are much
smaller than 2 MiB, and so can fit comfortably into the gaps. I'm worried that
if we change loader behavior to munmap instead of mprotect, that we'll
retroactively expose these bugs more widely.

We obviously need to fix the l_contiguous assumption for ld.so in glibc, either
by making sure that ld.so has contiguous load segments, or assuming in the code
that it does not.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug dynamic-link/31076] Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size
  2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
                   ` (22 preceding siblings ...)
  2023-12-07  9:30 ` fweimer at redhat dot com
@ 2023-12-08  3:22 ` i at maskray dot me
  23 siblings, 0 replies; 25+ messages in thread
From: i at maskray dot me @ 2023-12-08  3:22 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31076

--- Comment #21 from Fangrui Song <i at maskray dot me> ---
(In reply to Florian Weimer from comment #20)
> (In reply to Fangrui Song from comment #19)
> > (In reply to Florian Weimer from comment #18)
> > > (In reply to Kalesh Singh from comment #15)
> > > > AIUI if the runtime-page-size equals the max-page-size, the holes are also
> > > > mapped in as part of the segment mapping and share the same permissions.
> > > > Does this mean that on such systems, any protection it offers becomes void?
> > > 
> > > That's my understanding. The current behavior gives programmers the *option*
> > > to avoid mapping extra code on small-page systems, while maintaining
> > > compatibility with large-page systems. The proposed change to over-map to
> > > the next load-segment, to fill the gap, would take away that option.
> > 
> > Can you elaborate what the option (and "functionality" below) is? Since
> > 22930c9bf21ea15d0da1477a379029e2de259b69 (1996) rtld just calls `mprotect`
> > (one
> > extra `struct vm_area_struct` instance, see `sudo grep vm_area_struct
> > /proc/slabinfo`), and the user has no option to control this behavior.
> 
> The developer can produce a binary with a LOAD segment that does not need
> that mprotect call.

This is described by "## Solution 2 - Modify the static linker" in comment 1. I
find the approach intrusive.
("Moreover, a PT_LOAD whose end isn't backed by a section is unusual." in
comment 17.)

A loader alternative is "3. Extend the map end to cover the gap" in comment 17.
Instead of 

    mmap(l->l_addr+c->mapstart, c->mapend - c->mapstart, c->prot,
MAP_FIXED|MAP_COPY|MAP_FILE, fd, c->mapoff)

increase the map length

    mmap(l->l_addr+c->mapstart, c[1].mapstart - c->mapstart, c->prot,
MAP_FIXED|MAP_COPY|MAP_FILE, fd, c->mapoff)

The existing mprotect in glibc/elf/dl-map-segments.h will not be touched.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-12-08  3:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 18:48 [Bug dynamic-link/31076] New: Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size jyescas at google dot com
2023-11-18 18:50 ` [Bug dynamic-link/31076] " jyescas at google dot com
2023-11-21 13:42 ` carlos at redhat dot com
2023-11-22  0:39 ` i at maskray dot me
2023-11-22  4:33 ` kaleshsingh at google dot com
2023-11-22 18:19 ` jyescas at google dot com
2023-11-23 11:42 ` sam at gentoo dot org
2023-11-24 17:40 ` adhemerval.zanella at linaro dot org
2023-11-27 15:11 ` fweimer at redhat dot com
2023-11-27 15:22 ` fweimer at redhat dot com
2023-11-27 16:27 ` adhemerval.zanella at linaro dot org
2023-11-27 17:19 ` fweimer at redhat dot com
2023-11-27 17:39 ` adhemerval.zanella at linaro dot org
2023-11-27 17:45 ` fweimer at redhat dot com
2023-11-27 17:58 ` adhemerval.zanella at linaro dot org
2023-11-27 19:47 ` jyescas at google dot com
2023-11-27 19:55 ` jyescas at google dot com
2023-11-28  8:48 ` rprichard at google dot com
2023-11-28 18:59 ` kaleshsingh at google dot com
2023-11-28 23:58 ` jyescas at google dot com
2023-12-02 17:08 ` i at maskray dot me
2023-12-06 11:57 ` fweimer at redhat dot com
2023-12-07  5:11 ` i at maskray dot me
2023-12-07  9:30 ` fweimer at redhat dot com
2023-12-08  3:22 ` i at maskray dot me

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).