* Randomize offset between program segments?
@ 2020-10-19 15:31 Topi Miettinen
2020-10-21 2:21 ` Siddhesh Poyarekar
2020-10-21 5:33 ` Florian Weimer
0 siblings, 2 replies; 13+ messages in thread
From: Topi Miettinen @ 2020-10-19 15:31 UTC (permalink / raw)
To: libc-alpha
Hi,
I'd like to improve address space randomization (ASLR) by randomizing
the offset between .text, .data and .bss segments (or more generalized,
any program segments). With large code generation model (-mcmodel=large)
on AMD64, the offset could be very large, but even with the default
model, the segments could be randomized within range of RIP-relative
accesses (+/-2GB). Currently the dynamic loader can't randomize the
segments (nothing also tells it if this would be OK) so it maps them
next to each other, which is predictable and boring.
For this to happen, I think the compiler would have to emit relocations
for all cross-segment accesses and probably flagging the shared object
somehow. Then, when detecting the flag, the dynamic loader could load
the segments at random offsets within 2GB, or if the large model was
used in compilation (another flag), anywhere in the available virtual
address space (let OS map the segment anywhere by using mmap(NULL,...)).
Perhaps if GOT would be kept within 2GB range, other data segments could
still be placed anywhere.
There would be some slowdown because of additional relocations (and the
OS would not be happy due to increased VM fragmentation) but I think
otherwise nothing should change (the code should be identical). This
would be of course an opt-in feature mainly for hardened systems.
Assuming that compilers get support to this feature, what would be the
preferred way for the dynamic linker to detect that further
randomization would be possible? Maybe a new tag in "dynamic" section?
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-19 15:31 Randomize offset between program segments? Topi Miettinen
@ 2020-10-21 2:21 ` Siddhesh Poyarekar
2020-10-21 7:40 ` Topi Miettinen
2020-10-21 5:33 ` Florian Weimer
1 sibling, 1 reply; 13+ messages in thread
From: Siddhesh Poyarekar @ 2020-10-21 2:21 UTC (permalink / raw)
To: Topi Miettinen, libc-alpha
On 10/19/20 9:01 PM, Topi Miettinen via Libc-alpha wrote:
> I'd like to improve address space randomization (ASLR) by randomizing
> the offset between .text, .data and .bss segments (or more generalized,
> any program segments). With large code generation model (-mcmodel=large)
> on AMD64, the offset could be very large, but even with the default
> model, the segments could be randomized within range of RIP-relative
> accesses (+/-2GB). Currently the dynamic loader can't randomize the
> segments (nothing also tells it if this would be OK) so it maps them
> next to each other, which is predictable and boring.
What is the advantage of randomizing offsets between these segments?
> For this to happen, I think the compiler would have to emit relocations
> for all cross-segment accesses and probably flagging the shared object
> somehow. Then, when detecting the flag, the dynamic loader could load
> the segments at random offsets within 2GB, or if the large model was
> used in compilation (another flag), anywhere in the available virtual
> address space (let OS map the segment anywhere by using mmap(NULL,...)).
The challenge is not just pc-relative range, but also technical
feasibility of using pc-relative loads in the first place because the
change would either involve an additional indirection during load (like
GOT) which could have a significant performance overhead or a text
relocation, which would be like inviting wolves to drive out raccoons.
Of course, first we need to determine if there are raccoons.
Siddhesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-19 15:31 Randomize offset between program segments? Topi Miettinen
2020-10-21 2:21 ` Siddhesh Poyarekar
@ 2020-10-21 5:33 ` Florian Weimer
2020-10-21 8:02 ` Topi Miettinen
1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2020-10-21 5:33 UTC (permalink / raw)
To: Topi Miettinen via Libc-alpha; +Cc: Topi Miettinen
* Topi Miettinen via Libc-alpha:
> I'd like to improve address space randomization (ASLR) by randomizing
> the offset between .text, .data and .bss segments (or more
> generalized, any program segments). With large code generation model
> (-mcmodel=large) on AMD64, the offset could be very large, but even
> with the default model, the segments could be randomized within range
> of RIP-relative accesses (+/-2GB). Currently the dynamic loader can't
> randomize the segments (nothing also tells it if this would be OK) so
> it maps them next to each other, which is predictable and boring.
As far as I understand it, -mcmodel=large does not really work right now
with a stock toolchain.
But what you are asking for looks more like FDPIC support to me anyway.
I do not think it's particularly useful to implement this for targets
which have an MMU, to be honest.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 2:21 ` Siddhesh Poyarekar
@ 2020-10-21 7:40 ` Topi Miettinen
2020-10-21 9:14 ` Siddhesh Poyarekar
0 siblings, 1 reply; 13+ messages in thread
From: Topi Miettinen @ 2020-10-21 7:40 UTC (permalink / raw)
To: Siddhesh Poyarekar, libc-alpha
On 21.10.2020 5.21, Siddhesh Poyarekar wrote:
> On 10/19/20 9:01 PM, Topi Miettinen via Libc-alpha wrote:
>> I'd like to improve address space randomization (ASLR) by randomizing
>> the offset between .text, .data and .bss segments (or more generalized,
>> any program segments). With large code generation model (-mcmodel=large)
>> on AMD64, the offset could be very large, but even with the default
>> model, the segments could be randomized within range of RIP-relative
>> accesses (+/-2GB). Currently the dynamic loader can't randomize the
>> segments (nothing also tells it if this would be OK) so it maps them
>> next to each other, which is predictable and boring.
>
> What is the advantage of randomizing offsets between these segments?
In case an observer learns an address of one segment, it may not be
possible (or it may be more difficult) to infer from this address the
addresses of other segments. I think this is the basic idea in ASLR, to
make certain classes of attacks more difficult because the layout is
less predictable.
>> For this to happen, I think the compiler would have to emit relocations
>> for all cross-segment accesses and probably flagging the shared object
>> somehow. Then, when detecting the flag, the dynamic loader could load
>> the segments at random offsets within 2GB, or if the large model was
>> used in compilation (another flag), anywhere in the available virtual
>> address space (let OS map the segment anywhere by using mmap(NULL,...)).
>
> The challenge is not just pc-relative range, but also technical
> feasibility of using pc-relative loads in the first place because the
> change would either involve an additional indirection during load (like
> GOT) which could have a significant performance overhead or a text
> relocation, which would be like inviting wolves to drive out raccoons.
Sorry, I forgot to mention that the code is also compiled with "-fPIC
-pie", so PC-relative accesses will be always used.
$ cat f.c
static unsigned long dst;
void setter(unsigned long src) {
dst = src;
}
unsigned long getter(void) {
return dst;
}
$ gcc -fPIC -pie -O -c f.c
$ objdump -d f.o
f.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <setter>:
0: 48 89 3d 00 00 00 00 mov %rdi,0x0(%rip) # 7
<setter+0x7>
7: c3 retq
0000000000000008 <getter>:
8: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # f
<getter+0x7>
f: c3 retq
> Of course, first we need to determine if there are raccoons.
This would be an optional feature for hardened systems, where small loss
of performance would be acceptable for possible hardening gains. With
the normal model of compilation, the performance loss would happen only
at startup due to additional relocations but after that the execution
speed should be identical since the code itself isn't changed. Compiling
with -mcmodel=large implies further penalties but also further gains in
ASLR. For existing binaries there would be no change since the dynamic
linker can't determine if it's safe to randomize the segments and
nothing prevents applying this method only to selected, most critical
programs and libraries (or not apply only for performance-critical ones).
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 5:33 ` Florian Weimer
@ 2020-10-21 8:02 ` Topi Miettinen
2020-10-21 8:06 ` Florian Weimer
0 siblings, 1 reply; 13+ messages in thread
From: Topi Miettinen @ 2020-10-21 8:02 UTC (permalink / raw)
To: Florian Weimer, Topi Miettinen via Libc-alpha
On 21.10.2020 8.33, Florian Weimer wrote:
> * Topi Miettinen via Libc-alpha:
>
>> I'd like to improve address space randomization (ASLR) by randomizing
>> the offset between .text, .data and .bss segments (or more
>> generalized, any program segments). With large code generation model
>> (-mcmodel=large) on AMD64, the offset could be very large, but even
>> with the default model, the segments could be randomized within range
>> of RIP-relative accesses (+/-2GB). Currently the dynamic loader can't
>> randomize the segments (nothing also tells it if this would be OK) so
>> it maps them next to each other, which is predictable and boring.
>
> As far as I understand it, -mcmodel=large does not really work right now
> with a stock toolchain.
At least small programs seem to work, but I'll try compiling something
more serious. I'd expect very little breakage outside of assembly, since
externally visible values (pointers) already use 64-bit absolute
addresses. But I'm not so interested in large model anyway, randomizing
the default model could be more interesting.
> But what you are asking for looks more like FDPIC support to me anyway.
> I do not think it's particularly useful to implement this for targets
> which have an MMU, to be honest.
I'm not proposing to introduce FDPIC. In the default code model, 32 bit
offsets to RIP would not be hard coded like now but there would be
relocations, handled by the dynamic linker. Even for the large model, I
don't care much how the compiler implements the position independence.
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 8:02 ` Topi Miettinen
@ 2020-10-21 8:06 ` Florian Weimer
2020-10-21 9:28 ` Topi Miettinen
0 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2020-10-21 8:06 UTC (permalink / raw)
To: Topi Miettinen; +Cc: Topi Miettinen via Libc-alpha
* Topi Miettinen:
> At least small programs seem to work, but I'll try compiling something
> more serious. I'd expect very little breakage outside of assembly,
> since externally visible values (pointers) already use 64-bit absolute
> addresses. But I'm not so interested in large model anyway,
> randomizing the default model could be more interesting.
Anything that depends on static libraries will be broken. That includes
features from libc_nonshared.a and libgcc.a.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 7:40 ` Topi Miettinen
@ 2020-10-21 9:14 ` Siddhesh Poyarekar
2020-10-21 9:34 ` Topi Miettinen
0 siblings, 1 reply; 13+ messages in thread
From: Siddhesh Poyarekar @ 2020-10-21 9:14 UTC (permalink / raw)
To: Topi Miettinen, libc-alpha
On 10/21/20 1:10 PM, Topi Miettinen wrote:
> Sorry, I forgot to mention that the code is also compiled with "-fPIC
> -pie", so PC-relative accesses will be always used.
PC-relative accesses will work only if the offset between the code and
data is known at link time. It implies that it is fixed at runtime,
which won't work for your use case since you want it to be dynamic.
Siddhesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 8:06 ` Florian Weimer
@ 2020-10-21 9:28 ` Topi Miettinen
0 siblings, 0 replies; 13+ messages in thread
From: Topi Miettinen @ 2020-10-21 9:28 UTC (permalink / raw)
To: Florian Weimer; +Cc: Topi Miettinen via Libc-alpha
On 21.10.2020 11.06, Florian Weimer wrote:
> * Topi Miettinen:
>
>> At least small programs seem to work, but I'll try compiling something
>> more serious. I'd expect very little breakage outside of assembly,
>> since externally visible values (pointers) already use 64-bit absolute
>> addresses. But I'm not so interested in large model anyway,
>> randomizing the default model could be more interesting.
>
> Anything that depends on static libraries will be broken. That includes
> features from libc_nonshared.a and libgcc.a.
I installed systemd built with -mcmodel=large and so far I haven't seen
any problems or performance issues. This includes a shared library
libsystemd0 used by several programs and systemd internal library
libsystemd-shared-246.so. But since the dynamic linker doesn't know that
the segments could now be freely placed within the virtual address
space, the end result is just less optimal build. Probably static
libraries were not involved.
Building libc with -mcmodel=large fails:
/bin/ld: /build/glibc-2.31/build-tree/amd64-libc/libc_pic.os.clean:
relocation R_X86_64_GOTOFF64 against STT_GNU_IFUNC symbol `__GI_strlen'
isn't supported
Quick attempts to simply disable IFUNC (what's that?) support seems to
break other stuff.
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 9:14 ` Siddhesh Poyarekar
@ 2020-10-21 9:34 ` Topi Miettinen
2020-10-21 9:54 ` Siddhesh Poyarekar
0 siblings, 1 reply; 13+ messages in thread
From: Topi Miettinen @ 2020-10-21 9:34 UTC (permalink / raw)
To: Siddhesh Poyarekar, libc-alpha
On 21.10.2020 12.14, Siddhesh Poyarekar wrote:
> On 10/21/20 1:10 PM, Topi Miettinen wrote:
>> Sorry, I forgot to mention that the code is also compiled with "-fPIC
>> -pie", so PC-relative accesses will be always used.
>
> PC-relative accesses will work only if the offset between the code and
> data is known at link time. It implies that it is fixed at runtime,
> which won't work for your use case since you want it to be dynamic.
Why can't the dynamic linker calculate the offset?
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 9:34 ` Topi Miettinen
@ 2020-10-21 9:54 ` Siddhesh Poyarekar
2020-10-21 10:44 ` Topi Miettinen
0 siblings, 1 reply; 13+ messages in thread
From: Siddhesh Poyarekar @ 2020-10-21 9:54 UTC (permalink / raw)
To: Topi Miettinen, libc-alpha
On 10/21/20 3:04 PM, Topi Miettinen wrote:
> Why can't the dynamic linker calculate the offset?
>
It can calculate, but to be able to patch the pc-relative load
instructions it will need the executable section to also be writable and
is a really bad idea.
The alternative (which is what PIC does for global variables) is to have
a GOT-like indirection, where instead of the single pc-relative load,
the compiler emits a load from that table and a subsequent load from the
address in GOT. Here, patching by the dynamic linker is safe since the
offset table is rw, but you will have doubled the number of instructions
needed to access your data.
Hence the question: how much benefit does this provide on top of what is
achieved by randomizing the base address and does it justify doubling
the number of instructions to access static variables?
To be clear, that question is not rhetorical, I am genuinely curious and
would be interested in an answer to that if you explore this further.
Siddhesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 9:54 ` Siddhesh Poyarekar
@ 2020-10-21 10:44 ` Topi Miettinen
2020-10-29 8:04 ` Siddhesh Poyarekar
0 siblings, 1 reply; 13+ messages in thread
From: Topi Miettinen @ 2020-10-21 10:44 UTC (permalink / raw)
To: Siddhesh Poyarekar, libc-alpha
On 21.10.2020 12.54, Siddhesh Poyarekar wrote:
> On 10/21/20 3:04 PM, Topi Miettinen wrote:
>> Why can't the dynamic linker calculate the offset?
>>
>
> It can calculate, but to be able to patch the pc-relative load
> instructions it will need the executable section to also be writable and
> is a really bad idea.
Agreed, I didn't consider that.
> The alternative (which is what PIC does for global variables) is to have
> a GOT-like indirection, where instead of the single pc-relative load,
> the compiler emits a load from that table and a subsequent load from the
> address in GOT. Here, patching by the dynamic linker is safe since the
> offset table is rw, but you will have doubled the number of instructions
> needed to access your data.
Also size of GOT will increase. Otherwise this seems a better approach.
> Hence the question: how much benefit does this provide on top of what is
> achieved by randomizing the base address and does it justify doubling
> the number of instructions to access static variables?
I don't know. What would be the method to quantify such benefits? This
applies to a specific case where the attacker is able to determine an
address in one segment but needs to find an address in another segment
in order to win, and without ASLR the offset between the addresses would
be always known by the attacker (for example, because the distro and the
version for the program or library is known). Without ASLR, chance of
winning is 100%. With ASLR, this could be related to number of bits in
randomization. In the 32 bit offset case this would be 20 bits (assuming
12 bits page size), so the chances of guessing would be 2^-20 and brute
forcing the offset would be expected to take 2^20/2 attempts.
For the large model, ASLR could use 44 - 12 = 32 bits, so numbers would
be 2^-32 and 2^32/2.
-mcmodel=large increases number of instructions by a factor of 5, so
doubling would still be an improvement:
0000000000000000 <setter>:
0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <setter>
7: 49 bb 00 00 00 00 00 movabs $0x0,%r11
e: 00 00 00
11: 4c 01 d8 add %r11,%rax
14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx
1b: 00 00 00
1e: 48 89 3c 02 mov %rdi,(%rdx,%rax,1)
22: c3 retq
Normal model, extern variable should be similar to GOT access:
0000000000000000 <setter>:
0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7
<setter+0x7>
7: 48 89 38 mov %rdi,(%rax)
a: c3 retq
Normal model, static variable:
0000000000000000 <setter>:
0: 48 89 3d 00 00 00 00 mov %rdi,0x0(%rip) # 7
<setter+0x7>
7: c3 retq
But I suppose the extra memory access in GOT version is worse for
performance than the extra instructions which don't access memory in
-mcmodel=large.
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-21 10:44 ` Topi Miettinen
@ 2020-10-29 8:04 ` Siddhesh Poyarekar
2020-10-30 15:37 ` Topi Miettinen
0 siblings, 1 reply; 13+ messages in thread
From: Siddhesh Poyarekar @ 2020-10-29 8:04 UTC (permalink / raw)
To: Topi Miettinen, libc-alpha
On 10/21/20 4:14 PM, Topi Miettinen wrote:
>> The alternative (which is what PIC does for global variables) is to have
>> a GOT-like indirection, where instead of the single pc-relative load,
>> the compiler emits a load from that table and a subsequent load from the
>> address in GOT. Here, patching by the dynamic linker is safe since the
>> offset table is rw, but you will have doubled the number of instructions
>> needed to access your data.
>
> Also size of GOT will increase. Otherwise this seems a better approach.
You shouldn't use GOT (because it's the *global* offset table) but a
similar idea. A rose by another name...
> I don't know. What would be the method to quantify such benefits? This
> applies to a specific case where the attacker is able to determine an
> address in one segment but needs to find an address in another segment
> in order to win, and without ASLR the offset between the addresses would
First build evidence for this possibility, i.e. how easy is it to
determine the address of one segment in a binary and how much
*incremental* effort does it take to determine the address of other
segments?
> be always known by the attacker (for example, because the distro and the
> version for the program or library is known). Without ASLR, chance of
> winning is 100%. With ASLR, this could be related to number of bits in
> randomization. In the 32 bit offset case this would be 20 bits (assuming
> 12 bits page size), so the chances of guessing would be 2^-20 and brute
> forcing the offset would be expected to take 2^20/2 attempts.
Question is, would it really be an additional 2^20 tries at all times?
For example, the offset table has to be at a known offset from the code
and hence could be computed once you know the address of the code
segment. From there, the final address for the data access is just an
indirection away.
Siddhesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Randomize offset between program segments?
2020-10-29 8:04 ` Siddhesh Poyarekar
@ 2020-10-30 15:37 ` Topi Miettinen
0 siblings, 0 replies; 13+ messages in thread
From: Topi Miettinen @ 2020-10-30 15:37 UTC (permalink / raw)
To: Siddhesh Poyarekar, libc-alpha
On 29.10.2020 10.04, Siddhesh Poyarekar wrote:
> On 10/21/20 4:14 PM, Topi Miettinen wrote:
>>> The alternative (which is what PIC does for global variables) is to have
>>> a GOT-like indirection, where instead of the single pc-relative load,
>>> the compiler emits a load from that table and a subsequent load from the
>>> address in GOT. Here, patching by the dynamic linker is safe since the
>>> offset table is rw, but you will have doubled the number of instructions
>>> needed to access your data.
>>
>> Also size of GOT will increase. Otherwise this seems a better approach.
>
> You shouldn't use GOT (because it's the *global* offset table) but a
> similar idea. A rose by another name...
A table containing offsets would be used, so why not GOT? What kind of
globalness the name refers to? One GOT is local to the program but each
shared library has an own local GOTs.
>> I don't know. What would be the method to quantify such benefits? This
>> applies to a specific case where the attacker is able to determine an
>> address in one segment but needs to find an address in another segment
>> in order to win, and without ASLR the offset between the addresses would
>
> First build evidence for this possibility, i.e. how easy is it to
> determine the address of one segment in a binary and how much
> *incremental* effort does it take to determine the address of other
> segments?
>
>> be always known by the attacker (for example, because the distro and the
>> version for the program or library is known). Without ASLR, chance of
>> winning is 100%. With ASLR, this could be related to number of bits in
>> randomization. In the 32 bit offset case this would be 20 bits (assuming
>> 12 bits page size), so the chances of guessing would be 2^-20 and brute
>> forcing the offset would be expected to take 2^20/2 attempts.
>
> Question is, would it really be an additional 2^20 tries at all times?
> For example, the offset table has to be at a known offset from the code
> and hence could be computed once you know the address of the code
> segment. From there, the final address for the data access is just an
> indirection away.
Right, and the same also applies if a register would be dedicated to
access the data areas. The compiler might be able to pick a random
register, but then it's easy for the attacker to guess it from the very
few registers. I don't think this approach will be useful after all.
Perhaps randomness could be increased by other means. When compiling,
each .o file could be compiled into a shared object. Then these small
pieces of a program or library could be placed independently. This
probably wouldn't require any changes except for build logic in
Makefiles etc. Now, could this be achieved without dozens of .so files,
so that all shared objects would be contained in a single ELF file? But
this is not relevant to libc anymore.
-Topi
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-10-30 15:37 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 15:31 Randomize offset between program segments? Topi Miettinen
2020-10-21 2:21 ` Siddhesh Poyarekar
2020-10-21 7:40 ` Topi Miettinen
2020-10-21 9:14 ` Siddhesh Poyarekar
2020-10-21 9:34 ` Topi Miettinen
2020-10-21 9:54 ` Siddhesh Poyarekar
2020-10-21 10:44 ` Topi Miettinen
2020-10-29 8:04 ` Siddhesh Poyarekar
2020-10-30 15:37 ` Topi Miettinen
2020-10-21 5:33 ` Florian Weimer
2020-10-21 8:02 ` Topi Miettinen
2020-10-21 8:06 ` Florian Weimer
2020-10-21 9:28 ` Topi Miettinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).