public inbox for gnu-gabi@sourceware.org
 help / color / mirror / Atom feed
* Re: Invalid program counters and unwinding
  2018-01-01  0:00 ` Jeff Law
@ 2018-01-01  0:00   ` Florian Weimer
  2018-01-01  0:00     ` Jeff Law
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Jeff Law, GCC, GNU C Library, Binutils, gnu-gabi

On 06/28/2018 04:16 AM, Jeff Law wrote:
>> Previous discussions:
>>
>> https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
>> https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
>>    (patch with a spread lock, still not async-signal-safe)

> You might also want to look at RH BZ 1293594 which I think has pointers
> back to an issue from 2008 :(

Interesting.  That does suspiciously look like a concurrent dlclose. 
It's just that the crash handler crashes, after the application crash. 
I think this one is really NOTABUG, both technically and from user 
impact: we do not cause the crash, we just react poorly to the 
application triggering undefined behavior.

In the bug, you mentioned this code fragment for x86-64:

42        unsigned char *pc = context->ra;
43        struct sigcontext *sc;
44        long new_cfa;
45
46        /* movq __NR_rt_sigreturn, %rax ; syscall  */
47        if (*(unsigned char *)(pc+0) == 0x48
48            && *(unsigned long *)(pc+1) == 0x050f0000000fc0c7)

I'm not sure I agree that it is “dumb”, but I think it proves 
conclusively that you cannot feed random addresses to the unwinder. 8-)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Invalid program counters and unwinding
@ 2018-01-01  0:00 Florian Weimer
  2018-01-01  0:00 ` Nathan Sidwell
  2018-01-01  0:00 ` Jeff Law
  0 siblings, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: GCC, GNU C Library, Binutils, gnu-gabi

I'm looking at ways to speed up _Unwind_Find_FDE when libgcc is running 
on top of glibc.  I have something (at the design level, with some of 
the code written) which allows me to get a pointer to the 
PT_GNU_EH_FRAME segment in memory in a lock-free fashion (so it would 
also be async-signal safe).

This part works also when the program counter used in the search is 
invalid and does not point to within a loaded object, even in the case 
of concurrent dlopen/dlclose.

However, it's still necessary to read the PT_GNU_EH_FRAME data itself, 
and if _Unwind_Find_FDE is not a valid program counter found on the 
stack (with in a caller, where unmapping it with dlclose would be 
invalid), it could happen that it is a random address in *another*, 
unrelated object, which then gets dlclose'd (which is valid).

The current glibc-based implementation in libgcc calls dl_iterate_phdr, 
which acquires a lock blocking dlclose for the entire duration of the 
iteration.  But I think this still doesn't support arbitrary, random PC 
values because in the worst case, the PC value looks valid, we find some 
unrelated FDE data with an associated personality routine, and end up 
calling that, with disastrous consequences.

So it looks to me that the caller of _Unwind_Find_FDE needs to ensure 
that the PC is a valid element of the call stack.  Is this a correct 
assumption?

I have some ideas how make reading the PT_GNU_EH_FRAME data safe, but 
the question is whether we actually need that.

Previous discussions:

https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
   (patch with a spread lock, still not async-signal-safe)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00         ` Nathan Sidwell
@ 2018-01-01  0:00           ` Florian Weimer
  2018-01-01  0:00             ` Jakub Jelinek
  2018-01-01  0:00             ` Nathan Sidwell
  0 siblings, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 01:35 PM, Nathan Sidwell wrote:
> On 06/26/2018 07:21 AM, Florian Weimer wrote:
> 
>> GCC doesn't do this AFAIK, but it's theoretically possible not to 
>> preserve the return address for a noreturn function.  But that would 
>> be very bad for exception handling, so let's hope compilers don't do 
>> this.
> 
> I'd forgotten about noreturn.  Such functions may terminate by throwing 
> an exception (and for our purposes I think pthread_cancel implementatio 
> is sufficiently exception-like):
> 
> from C++ std:  [dcl.attr.noreturn]/2 [ Note: The function may terminate 
> by throwing an exception. — end note ]
> 
> and from doc/extend.texi:
> 
> The @code{noreturn} keyword does not affect the exceptional path
> when that applies: a @code{noreturn}-marked function may still
> return to the caller by throwing an exception or calling
> @code{longjmp}.
> 
> IIRC, in gcc-land you have to give both noreturn and nothrow attributes 
> to make it non-unwindable.

Are you sure?  I was under the impression that GCC did not do this 
because it interferes too much with debugging.

Furthermore, glibc marks abort as nothrow and noreturn, which is a bit 
dubious, considering that it is perfectly fine to throw exception from 
synchronously delivered signals.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00 ` Nathan Sidwell
@ 2018-01-01  0:00   ` Florian Weimer
  2018-01-01  0:00     ` Nathan Sidwell
  2018-01-01  0:00     ` Jakub Jelinek
  0 siblings, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 12:56 PM, Nathan Sidwell wrote:
> On 06/26/2018 05:26 AM, Florian Weimer wrote:
> 
>> So it looks to me that the caller of _Unwind_Find_FDE needs to ensure 
>> that the PC is a valid element of the call stack.  Is this a correct 
>> assumption?
> 
> I thought this was an (implicit?) requirement. You're unwinding a stack 
> to deliver an exception up it.  Are there use cases where that is not 
> the case?

We have something approaching this scenario.

pthread_cancel in glibc unwinds the stack using DWARF information until 
encounters a frame without unwind information, when it switches to 
longjmp to get past that obstacle.

However, at the point of transition from a valid DWARF frame into the 
wilderness (without unwind data), we should still have accurate 
information on the caller's PC, so _Unwind_Find_FDE will reliably fail 
to find any unwind data for it.  It's not a random pointer somewhere 
else, so I think even the pthread_cancel case is fully supported.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00 Invalid program counters and unwinding Florian Weimer
@ 2018-01-01  0:00 ` Nathan Sidwell
  2018-01-01  0:00   ` Florian Weimer
  2018-01-01  0:00 ` Jeff Law
  1 sibling, 1 reply; 19+ messages in thread
From: Nathan Sidwell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 05:26 AM, Florian Weimer wrote:

> So it looks to me that the caller of _Unwind_Find_FDE needs to ensure 
> that the PC is a valid element of the call stack.  Is this a correct 
> assumption?

I thought this was an (implicit?) requirement. You're unwinding a stack 
to deliver an exception up it.  Are there use cases where that is not 
the case?

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00       ` Florian Weimer
@ 2018-01-01  0:00         ` Nathan Sidwell
  2018-01-01  0:00           ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Nathan Sidwell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 07:21 AM, Florian Weimer wrote:

> GCC doesn't do this AFAIK, but it's theoretically possible not to 
> preserve the return address for a noreturn function.  But that would be 
> very bad for exception handling, so let's hope compilers don't do this.

I'd forgotten about noreturn.  Such functions may terminate by throwing 
an exception (and for our purposes I think pthread_cancel implementatio 
is sufficiently exception-like):

from C++ std:  [dcl.attr.noreturn]/2 [ Note: The function may terminate 
by throwing an exception. — end note ]

and from doc/extend.texi:

The @code{noreturn} keyword does not affect the exceptional path
when that applies: a @code{noreturn}-marked function may still
return to the caller by throwing an exception or calling
@code{longjmp}.

IIRC, in gcc-land you have to give both noreturn and nothrow attributes 
to make it non-unwindable.

nathan
-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00     ` Jeff Law
@ 2018-01-01  0:00       ` Florian Weimer
  2018-01-01  0:00       ` Michael Matz
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Jeff Law, GCC, GNU C Library, Binutils, gnu-gabi

On 06/28/2018 04:18 PM, Jeff Law wrote:
> On 06/28/2018 06:30 AM, Florian Weimer wrote:
>> On 06/28/2018 04:16 AM, Jeff Law wrote:
>>>> Previous discussions:
>>>>
>>>> https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
>>>> https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
>>>>     (patch with a spread lock, still not async-signal-safe)
>>
>>> You might also want to look at RH BZ 1293594 which I think has pointers
>>> back to an issue from 2008 :(
>>
>> Interesting.  That does suspiciously look like a concurrent dlclose.
>> It's just that the crash handler crashes, after the application crash. I
>> think this one is really NOTABUG, both technically and from user impact:
>> we do not cause the crash, we just react poorly to the application
>> triggering undefined behavior.
>>
>> In the bug, you mentioned this code fragment for x86-64:
>>
>> 42        unsigned char *pc = context->ra;
>> 43        struct sigcontext *sc;
>> 44        long new_cfa;
>> 45
>> 46        /* movq __NR_rt_sigreturn, %rax ; syscall  */
>> 47        if (*(unsigned char *)(pc+0) == 0x48
>> 48            && *(unsigned long *)(pc+1) == 0x050f0000000fc0c7)
>>
>> I'm not sure I agree that it is “dumb”, but I think it proves
>> conclusively that you cannot feed random addresses to the unwinder. 8-)

> I believe "dumb" is referring to the fact that we're already in a bit of
> a weird state as evidenced by the NULL FDE.  Blindly trying to read the
> contents of the PC that we couldn't map to an FDE is, IMHO, dumb.

I think the code derives from i386, where historically it was necessary 
to unwind with the frame pointer only.  I suspect we still need this 
code there at least, to support legacy binaries.

For x86-64, we should probably not to attempt without tables at all, but 
I have no idea whether that's feasible.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00   ` Florian Weimer
@ 2018-01-01  0:00     ` Jeff Law
  2018-01-01  0:00       ` Florian Weimer
  2018-01-01  0:00       ` Michael Matz
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff Law @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/28/2018 06:30 AM, Florian Weimer wrote:
> On 06/28/2018 04:16 AM, Jeff Law wrote:
>>> Previous discussions:
>>>
>>> https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
>>> https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
>>>    (patch with a spread lock, still not async-signal-safe)
> 
>> You might also want to look at RH BZ 1293594 which I think has pointers
>> back to an issue from 2008 :(
> 
> Interesting.  That does suspiciously look like a concurrent dlclose.
> It's just that the crash handler crashes, after the application crash. I
> think this one is really NOTABUG, both technically and from user impact:
> we do not cause the crash, we just react poorly to the application
> triggering undefined behavior.
> 
> In the bug, you mentioned this code fragment for x86-64:
> 
> 42        unsigned char *pc = context->ra;
> 43        struct sigcontext *sc;
> 44        long new_cfa;
> 45
> 46        /* movq __NR_rt_sigreturn, %rax ; syscall  */
> 47        if (*(unsigned char *)(pc+0) == 0x48
> 48            && *(unsigned long *)(pc+1) == 0x050f0000000fc0c7)
> 
> I'm not sure I agree that it is “dumb”, but I think it proves
> conclusively that you cannot feed random addresses to the unwinder. 8-)
I believe "dumb" is referring to the fact that we're already in a bit of
a weird state as evidenced by the NULL FDE.  Blindly trying to read the
contents of the PC that we couldn't map to an FDE is, IMHO, dumb.

One might even be able to argue in this day and age that we should have
suitable descriptors for everything.  If no suitable descriptor is found
then backtracing should stop.  Lack of suitable descriptors in any code
would be considered a bug in that scenario.

jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00     ` Jakub Jelinek
@ 2018-01-01  0:00       ` Florian Weimer
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 01:25 PM, Jakub Jelinek wrote:
> On Tue, Jun 26, 2018 at 01:01:06PM +0200, Florian Weimer wrote:
>> On 06/26/2018 12:56 PM, Nathan Sidwell wrote:
>>> On 06/26/2018 05:26 AM, Florian Weimer wrote:
>>>
>>>> So it looks to me that the caller of _Unwind_Find_FDE needs to
>>>> ensure that the PC is a valid element of the call stack.  Is this a
>>>> correct assumption?
>>>
>>> I thought this was an (implicit?) requirement. You're unwinding a stack
>>> to deliver an exception up it.  Are there use cases where that is not
>>> the case?
>>
>> We have something approaching this scenario.
>>
>> pthread_cancel in glibc unwinds the stack using DWARF information until
>> encounters a frame without unwind information, when it switches to longjmp
>> to get past that obstacle.
>>
>> However, at the point of transition from a valid DWARF frame into the
>> wilderness (without unwind data), we should still have accurate information
>> on the caller's PC, so _Unwind_Find_FDE will reliably fail to find any
>> unwind data for it.  It's not a random pointer somewhere else, so I think
>> even the pthread_cancel case is fully supported.
> 
> The usual ways to get bogus PCs in the frames is:
> 1) stack corruption
> 2) setcontext/swapcontext with uninitialized or corrupted ucontext_t
> 3) bogus unwind info (compiler or linker etc. bug)

But if that happens, all bets are off, and we could still get a crash 
with the current implementation.

And any approach which does not inhibit concurrent dlclose will only 
make things worse if there are such concurrent calls, which is perhaps 
an unusual combination.

> At least for unwinding, I think we don't and shouldn't care, we assume only
> valid programs.  For cases like _Unwind_Backtrace when used to print info in
> case of fatal signal or stack corruption, it is more questionable, but at
> least the current implmentation doesn't care either.

At least glibc no longer tries to print a backtrace from a corrupted stack.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00     ` Nathan Sidwell
@ 2018-01-01  0:00       ` Florian Weimer
  2018-01-01  0:00         ` Nathan Sidwell
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 01:15 PM, Nathan Sidwell wrote:
> On 06/26/2018 07:01 AM, Florian Weimer wrote:
>> On 06/26/2018 12:56 PM, Nathan Sidwell wrote:
>>> On 06/26/2018 05:26 AM, Florian Weimer wrote:
>>>
>>>> So it looks to me that the caller of _Unwind_Find_FDE needs to 
>>>> ensure that the PC is a valid element of the call stack.  Is this a 
>>>> correct assumption?
>>>
>>> I thought this was an (implicit?) requirement. You're unwinding a 
>>> stack to deliver an exception up it.  Are there use cases where that 
>>> is not the case?
>>
>> We have something approaching this scenario.
>>
>> pthread_cancel in glibc unwinds the stack using DWARF information 
>> until encounters a frame without unwind information, when it switches 
>> to longjmp to get past that obstacle.
> 
> This is a long jump to the originating pthread function at the end of 
> the stack, right?  We not only get past the obstacle, but any and all 
> DWARF frames on top of it.  (just for my understanding)

Essentially yes.  It can also be an intermediate jump buffer, used to to 
support compilation in -fno-exceptions mode.  In that case, unwinding 
tries to proceed from there, again with a valid PC.

> That sounds right.  It's a PC that you could return to if you weren't trying to unwind the stack. 

GCC doesn't do this AFAIK, but it's theoretically possible not to 
preserve the return address for a noreturn function.  But that would be 
very bad for exception handling, so let's hope compilers don't do this.

Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00 Invalid program counters and unwinding Florian Weimer
  2018-01-01  0:00 ` Nathan Sidwell
@ 2018-01-01  0:00 ` Jeff Law
  2018-01-01  0:00   ` Florian Weimer
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff Law @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 03:26 AM, Florian Weimer wrote:
> I'm looking at ways to speed up _Unwind_Find_FDE when libgcc is running
> on top of glibc.  I have something (at the design level, with some of
> the code written) which allows me to get a pointer to the
> PT_GNU_EH_FRAME segment in memory in a lock-free fashion (so it would
> also be async-signal safe).
> 
> This part works also when the program counter used in the search is
> invalid and does not point to within a loaded object, even in the case
> of concurrent dlopen/dlclose.
> 
> However, it's still necessary to read the PT_GNU_EH_FRAME data itself,
> and if _Unwind_Find_FDE is not a valid program counter found on the
> stack (with in a caller, where unmapping it with dlclose would be
> invalid), it could happen that it is a random address in *another*,
> unrelated object, which then gets dlclose'd (which is valid).
> 
> The current glibc-based implementation in libgcc calls dl_iterate_phdr,
> which acquires a lock blocking dlclose for the entire duration of the
> iteration.  But I think this still doesn't support arbitrary, random PC
> values because in the worst case, the PC value looks valid, we find some
> unrelated FDE data with an associated personality routine, and end up
> calling that, with disastrous consequences.
> 
> So it looks to me that the caller of _Unwind_Find_FDE needs to ensure
> that the PC is a valid element of the call stack.  Is this a correct
> assumption?
> 
> I have some ideas how make reading the PT_GNU_EH_FRAME data safe, but
> the question is whether we actually need that.
> 
> Previous discussions:
> 
> https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
> https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
>   (patch with a spread lock, still not async-signal-safe)
You might also want to look at RH BZ 1293594 which I think has pointers
back to an issue from 2008 :(

Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00   ` Florian Weimer
@ 2018-01-01  0:00     ` Nathan Sidwell
  2018-01-01  0:00       ` Florian Weimer
  2018-01-01  0:00     ` Jakub Jelinek
  1 sibling, 1 reply; 19+ messages in thread
From: Nathan Sidwell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 07:01 AM, Florian Weimer wrote:
> On 06/26/2018 12:56 PM, Nathan Sidwell wrote:
>> On 06/26/2018 05:26 AM, Florian Weimer wrote:
>>
>>> So it looks to me that the caller of _Unwind_Find_FDE needs to ensure 
>>> that the PC is a valid element of the call stack.  Is this a correct 
>>> assumption?
>>
>> I thought this was an (implicit?) requirement. You're unwinding a 
>> stack to deliver an exception up it.  Are there use cases where that 
>> is not the case?
> 
> We have something approaching this scenario.
> 
> pthread_cancel in glibc unwinds the stack using DWARF information until 
> encounters a frame without unwind information, when it switches to 
> longjmp to get past that obstacle.

This is a long jump to the originating pthread function at the end of 
the stack, right?  We not only get past the obstacle, but any and all 
DWARF frames on top of it.  (just for my understanding)

> However, at the point of transition from a valid DWARF frame into the 
> wilderness (without unwind data), we should still have accurate 
> information on the caller's PC, so _Unwind_Find_FDE will reliably fail 
> to find any unwind data for it.  It's not a random pointer somewhere 
> else, so I think even the pthread_cancel case is fully supported.

That sounds right.  It's a PC that you could return to if you weren't 
trying to unwind the stack.

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00   ` Florian Weimer
  2018-01-01  0:00     ` Nathan Sidwell
@ 2018-01-01  0:00     ` Jakub Jelinek
  2018-01-01  0:00       ` Florian Weimer
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On Tue, Jun 26, 2018 at 01:01:06PM +0200, Florian Weimer wrote:
> On 06/26/2018 12:56 PM, Nathan Sidwell wrote:
> > On 06/26/2018 05:26 AM, Florian Weimer wrote:
> > 
> > > So it looks to me that the caller of _Unwind_Find_FDE needs to
> > > ensure that the PC is a valid element of the call stack.  Is this a
> > > correct assumption?
> > 
> > I thought this was an (implicit?) requirement. You're unwinding a stack
> > to deliver an exception up it.  Are there use cases where that is not
> > the case?
> 
> We have something approaching this scenario.
> 
> pthread_cancel in glibc unwinds the stack using DWARF information until
> encounters a frame without unwind information, when it switches to longjmp
> to get past that obstacle.
> 
> However, at the point of transition from a valid DWARF frame into the
> wilderness (without unwind data), we should still have accurate information
> on the caller's PC, so _Unwind_Find_FDE will reliably fail to find any
> unwind data for it.  It's not a random pointer somewhere else, so I think
> even the pthread_cancel case is fully supported.

The usual ways to get bogus PCs in the frames is:
1) stack corruption
2) setcontext/swapcontext with uninitialized or corrupted ucontext_t
3) bogus unwind info (compiler or linker etc. bug)

At least for unwinding, I think we don't and shouldn't care, we assume only
valid programs.  For cases like _Unwind_Backtrace when used to print info in
case of fatal signal or stack corruption, it is more questionable, but at
least the current implmentation doesn't care either.  There have been some
requests e.g. to use extremely slow safe accesses like syscalls from the
potential invalid memory, or mincore, or parsing of /proc/self/maps to make
it work even if everything is corrupted, but so far nothing thankfully made
it in.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00           ` Florian Weimer
@ 2018-01-01  0:00             ` Jakub Jelinek
  2018-01-01  0:00             ` Nathan Sidwell
  1 sibling, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Nathan Sidwell, GCC, GNU C Library, Binutils, gnu-gabi

On Tue, Jun 26, 2018 at 01:39:18PM +0200, Florian Weimer wrote:
> > The @code{noreturn} keyword does not affect the exceptional path
> > when that applies: a @code{noreturn}-marked function may still
> > return to the caller by throwing an exception or calling
> > @code{longjmp}.
> > 
> > IIRC, in gcc-land you have to give both noreturn and nothrow attributes
> > to make it non-unwindable.
> 
> Are you sure?  I was under the impression that GCC did not do this because
> it interferes too much with debugging.
> 
> Furthermore, glibc marks abort as nothrow and noreturn, which is a bit
> dubious, considering that it is perfectly fine to throw exception from
> synchronously delivered signals.

There is unwindable and unwindable.  The default on many target is
-fasynchronous-unwind-tables, where at every instruction .eh_frame contains
everything needed to unwind.  nothrow is only about .gcc_except_table if
there is some unwind region etc.  So you can still do _Unwind_Backtrace if
you have noexcept functions.
I don't see what is of interest on noreturn functions, either the calls
them, then you have unwind info as usual, or it can tailcall them (that is
what GCC usually doesn't do) and then it is like any other tail call, you
can't see the original caller, but you can see a caller of that caller (or
perhaps with a bunch of other frames missing), but still something
unwindable.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00           ` Florian Weimer
  2018-01-01  0:00             ` Jakub Jelinek
@ 2018-01-01  0:00             ` Nathan Sidwell
  1 sibling, 0 replies; 19+ messages in thread
From: Nathan Sidwell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On 06/26/2018 07:39 AM, Florian Weimer wrote:
> On 06/26/2018 01:35 PM, Nathan Sidwell wrote:

>> IIRC, in gcc-land you have to give both noreturn and nothrow 
>> attributes to make it non-unwindable.
> 
> Are you sure?  I was under the impression that GCC did not do this 
> because it interferes too much with debugging.

I'm not sure at all.   I remember needing to use nothrow somewhere that 
throw() didn't cut it, and could well have got the details wrong, and 
the compiler has moved on anyway.

> Furthermore, glibc marks abort as nothrow and noreturn, which is a bit 
> dubious, considering that it is perfectly fine to throw exception from 
> synchronously delivered signals.

Hm, that seems contradictory ...

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00     ` Jeff Law
  2018-01-01  0:00       ` Florian Weimer
@ 2018-01-01  0:00       ` Michael Matz
  2018-01-01  0:00         ` Jakub Jelinek
  1 sibling, 1 reply; 19+ messages in thread
From: Michael Matz @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Jeff Law; +Cc: Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

Hi,

On Thu, 28 Jun 2018, Jeff Law wrote:

> I believe "dumb" is referring to the fact that we're already in a bit of 
> a weird state as evidenced by the NULL FDE.  Blindly trying to read the 
> contents of the PC that we couldn't map to an FDE is, IMHO, dumb.
> 
> One might even be able to argue in this day and age that we should have 
> suitable descriptors for everything.  If no suitable descriptor is found 
> then backtracing should stop.  Lack of suitable descriptors in any code 
> would be considered a bug in that scenario.

I disagree.  ASM code often lacks unwind descriptors (now less than in the 
past, but still).  My rule of thumb is always: no descriptor -> has to be 
a framepointer-using routine with standard calling sequence.  (I.e. 
declare the combination of no descriptor and no fp to be a bug).  Some of 
the callee-saved register will temporarily be wrong but unwinding can 
continue.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00           ` Michael Matz
@ 2018-01-01  0:00             ` Florian Weimer
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Michael Matz, Jakub Jelinek
  Cc: Jeff Law, GCC, GNU C Library, Binutils, gnu-gabi

On 07/02/2018 06:14 PM, Michael Matz wrote:
> There is no such language in the psABI, no (at least I haven't found
> anything; you had me worried for a moment :) ).  But there's stronger one:
> all functions through which unwinding is expected must provide CFI.  So,
> yes, such code isn't strictly conforming.  But there we are, there's much
> code that isn't and we still have to sensibly deal with it (if we can).
> IMHO making guesses is better than to stop unwinding.  And IMHO guessing
> that it's FP-using is better than guessing that it's leaf, especially if
> the PC in question was the result of a prior unwinding step (making it
> clear that it certainly was_not_  leaf).

Well, the previous frame could have been a signal handler frame, but I 
see your point.

Anyway, I've proposed a BoF for these topics for the next Cauldron.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00       ` Michael Matz
@ 2018-01-01  0:00         ` Jakub Jelinek
  2018-01-01  0:00           ` Michael Matz
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Michael Matz
  Cc: Jeff Law, Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

On Mon, Jul 02, 2018 at 05:48:32PM +0200, Michael Matz wrote:
> Hi,
> 
> On Thu, 28 Jun 2018, Jeff Law wrote:
> 
> > I believe "dumb" is referring to the fact that we're already in a bit of 
> > a weird state as evidenced by the NULL FDE.  Blindly trying to read the 
> > contents of the PC that we couldn't map to an FDE is, IMHO, dumb.
> > 
> > One might even be able to argue in this day and age that we should have 
> > suitable descriptors for everything.  If no suitable descriptor is found 
> > then backtracing should stop.  Lack of suitable descriptors in any code 
> > would be considered a bug in that scenario.
> 
> I disagree.  ASM code often lacks unwind descriptors (now less than in the 
> past, but still).  My rule of thumb is always: no descriptor -> has to be 
> a framepointer-using routine with standard calling sequence.  (I.e. 
> declare the combination of no descriptor and no fp to be a bug).  Some of 
> the callee-saved register will temporarily be wrong but unwinding can 
> continue.

Doesn't that clash with the x86-64 ABI which says what kind of FDE use by
default if none is found (essentially a standard leaf routine that doesn't
change sp, nor save any registers)?

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Invalid program counters and unwinding
  2018-01-01  0:00         ` Jakub Jelinek
@ 2018-01-01  0:00           ` Michael Matz
  2018-01-01  0:00             ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Matz @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jeff Law, Florian Weimer, GCC, GNU C Library, Binutils, gnu-gabi

Hi,

On Mon, 2 Jul 2018, Jakub Jelinek wrote:

> > I disagree.  ASM code often lacks unwind descriptors (now less than in 
> > the past, but still).  My rule of thumb is always: no descriptor -> 
> > has to be a framepointer-using routine with standard calling sequence.  
> > (I.e. declare the combination of no descriptor and no fp to be a bug).  
> > Some of the callee-saved register will temporarily be wrong but 
> > unwinding can continue.
> 
> Doesn't that clash with the x86-64 ABI which says what kind of FDE use 
> by default if none is found (essentially a standard leaf routine that 
> doesn't change sp, nor save any registers)?

There is no such language in the psABI, no (at least I haven't found 
anything; you had me worried for a moment :) ).  But there's stronger one: 
all functions through which unwinding is expected must provide CFI.  So, 
yes, such code isn't strictly conforming.  But there we are, there's much 
code that isn't and we still have to sensibly deal with it (if we can).  
IMHO making guesses is better than to stop unwinding.  And IMHO guessing 
that it's FP-using is better than guessing that it's leaf, especially if 
the PC in question was the result of a prior unwinding step (making it 
clear that it certainly was _not_ leaf).

And then there are (toy?) compilers that don't emit CFI, but do use FPs 
(totally psABI non-compliant, sure); IMHO we shouldn't pessimize them 
unduly.  Yeah, it's all a bit wonky, but why make it harder for those?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-07-05 19:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-01  0:00 Invalid program counters and unwinding Florian Weimer
2018-01-01  0:00 ` Nathan Sidwell
2018-01-01  0:00   ` Florian Weimer
2018-01-01  0:00     ` Nathan Sidwell
2018-01-01  0:00       ` Florian Weimer
2018-01-01  0:00         ` Nathan Sidwell
2018-01-01  0:00           ` Florian Weimer
2018-01-01  0:00             ` Jakub Jelinek
2018-01-01  0:00             ` Nathan Sidwell
2018-01-01  0:00     ` Jakub Jelinek
2018-01-01  0:00       ` Florian Weimer
2018-01-01  0:00 ` Jeff Law
2018-01-01  0:00   ` Florian Weimer
2018-01-01  0:00     ` Jeff Law
2018-01-01  0:00       ` Florian Weimer
2018-01-01  0:00       ` Michael Matz
2018-01-01  0:00         ` Jakub Jelinek
2018-01-01  0:00           ` Michael Matz
2018-01-01  0:00             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).