public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* overflows in Dwfl_Thread_Callbacks::memory_read callback
@ 2017-06-02 11:28 Milian Wolff
  2017-06-13 16:15 ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Milian Wolff @ 2017-06-02 11:28 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 7026 bytes --]

Hey all,

in the perfparser that Ulf wrote, and to which I'm contributing too, we often 
see abnormal data being passed to the memory_read callback we define. 

I.e. our callback gets invoked with addr=0xFFFFFFFFFFFFFFFF which clearly 
isn't an expected or valid address. So far we ignored this, as things seem to 
work okayish, but I still suspect something is wrong somewhere... Out of 
interest, I would like to ask if anyone has ever seen something like that?

I can easily reproduce this problem. I recorded a session with rr and then 
investigated what is going on. Below is a "screenshot" of my rr replay 
session:

~~~~~~~~~~~~~~

warning: perfparser/memoryRead: Invalid memory read requested by dwfl 
18446744073709551615 [/home/milian/projects/kdab/rnd/hotspot/3rdparty/
perfparser/app/perfsymboltable.cpp:150/pid=31690]

Breakpoint 1, memoryRead (dwfl=0x1baaa80, addr=18446744073709551615, 
result=0x7ffe83f69fe8, arg=0x7ffe83f6c278) at /home/milian/projects/kdab/rnd/
hotspot/3rdparty/perfparser/app/perfsymboltable.cpp:151
151             ui->firstGuessedFrame = ui->frames.length();
(rr) up
#1  0x00007fd107ff7345 in expr_eval (state=state@entry=0x1bca010, 
frame=0x1bcbdf0, ops=0x7ffe83f69ff0, nops=2, 
result=result@entry=0x7ffe83f69fe8, bias=bias@entry=140092229857280)
    at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:501
501           if (! process->callbacks->memory_read (process->dwfl, *result, 
result,
(rr) watch -l *result
Hardware watchpoint 2: -location *result
(rr) reverse-continue 
Continuing.

Hardware watchpoint 2: -location *result

Old value = 18446744073709551615
New value = 18446744073709551575
0x00007fd107ff72fe in do_pop (stack=0x7ffe83f69f20, stack=0x7ffe83f69f20, 
val=0x7ffe83f69fe8) at /home/milian/projects/src/elfutils/libdwfl/
frame_unwind.c:139
139       *val = stack->addrs[--stack->used];
(rr) print stack->used
$1 = 0
(rr) reverse-step
139       *val = stack->addrs[--stack->used];
(rr) print stack->used
$2 = 1
(rr) print stack->addrs[0]
$3 = 18446744073709551615
(rr) print stack->addrs[1]
$4 = 140535749912192
(rr) print stack->addrs[0]
$5 = 18446744073709551615
(rr) watch -l stack->addrs[0]
Hardware watchpoint 3: -location stack->addrs[0]
(rr) reverse-conQuit
(rr) del 2
(rr) reverse-continue 
Continuing.

Hardware watchpoint 3: -location stack->addrs[0]

Old value = 18446744073709551615
New value = 15
0x00007fd107ff6e8a in do_push (stack=0x7ffe83f69f20, val=18446744073709551615) 
at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:127
127       stack->addrs[stack->used++] = val;
(rr) print val
$6 = 18446744073709551615
(rr) up
#1  0x00007fd107ff7a4f in expr_eval (state=state@entry=0x1bca010, 
frame=0x1bcbdf0, ops=0x7ffe83f69ff0, nops=2, 
result=result@entry=0x7ffe83f69fe8, bias=bias@entry=140092229857280)
    at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:361
361               if (! pop (&val1) || ! push (val1 + op->number))
(rr) print val1
$7 = 15
(rr) print op->number
$8 = 18446744073709551600
(rr) watch -l op->number
Hardware watchpoint 4: -location op->number
(rr) del 3
(rr) reverse-continue 
Continuing.

Hardware watchpoint 4: -location op->number

Old value = 18446744073709551600
New value = 0
0x00007fd107fe06e8 in dwarf_frame_register (fs=0x1bcbdf0, regno=regno@entry=6, 
ops_mem=ops_mem@entry=0x7ffe83f69ff0, ops=ops@entry=0x7ffe83f69fd8, 
nops=nops@entry=0x7ffe83f69fe0)
    at /home/milian/projects/src/elfutils/libdw/dwarf_frame_register.c:80
80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
(rr) list
75
76          case reg_offset:
77          case reg_val_offset:
78            ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_call_frame_cfa 
};
79            if (reg->value != 0)
80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
81                                                .number = reg->value };
82            if (reg->rule == reg_val_offset)
83              /* A value, not a location.  */
84              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_stack_value };
(rr) print reg->value
$9 = -16
(rr) watch -l reg->value
Hardware watchpoint 5: -location reg->value
(rr) reverse-continue 
Continuing.

Hardware watchpoint 4: -location op->number

Old value = 0
New value = 18446744073709551560
dwarf_frame_register (fs=0x1bcbdf0, regno=regno@entry=6, 
ops_mem=ops_mem@entry=0x7ffe83f69ff0, ops=ops@entry=0x7ffe83f69fd8, 
nops=nops@entry=0x7ffe83f69fe0)
    at /home/milian/projects/src/elfutils/libdw/dwarf_frame_register.c:80
80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
(rr) del 4
(rr) reverse-continue 
Continuing.

Hardware watchpoint 5: -location reg->value

Old value = -16
New value = 0
0x00007fd107fddf17 in execute_cfi (cache=cache@entry=0x1bb5b00, cie=0x1b9d100, 
state=state@entry=0x7ffe83f69ed0, program=<optimized out>, end=0x7fd108039778 
",", abi_cfi=abi_cfi@entry=false, loc=21649, 
    find_pc=23177) at /home/milian/projects/src/elfutils/libdw/cfi.c:260
260               register_rule (operand, val_offset, offset);
(rr) list
255               get_uleb128 (operand, program, end);
256               cfi_assert (program < end);
257               get_uleb128 (offset, program, end);
258               offset *= cie->data_alignment_factor;
259             val_offset:
260               register_rule (operand, val_offset, offset);
261               continue;
262
263             case DW_CFA_val_offset_sf:
264               get_uleb128 (operand, program, end);
(rr) print offset
$10 = <optimized out>
(rr) print val_offset
No symbol "val_offset" in current context.
(rr) print operand
$11 = <optimized out>
(rr) print operand
$12 = <optimized out>

~~~~~~~~~~~~~~

So from the above I notice a couple of... "interesting" behavior:

- op->number is an unsigned value, but gets the signed value -16 written to 
it, thereby producing the large value 0xFFFFFFFFFFFFFFF0

- even if the above would be signed, the expression "15 - 16" is always going 
to result in a bogus address

- has anyone a clue of what may be going wrong here? Is one of our callbacks 
returning wrong values before? Is the DWARF info broken? Is there any kind of 
DWARF validator or anything like that which I could use here to check? I 
notice that this issue often arises with frames in libc and ld.so. But, so 
far, I have not come up with a good way to write a MWE that reliably tests 
this behavior... Is there maybe something in the elfutils source code repo 
that I could use as a basis?

If anyone else wants to try to reproduce this, I guess it should also be 
doable with vanilla perf:

- compile it with `make NO_LIBUNWIND=1` to get the libdw based unwinder
- record some data with `perf record --call-graph dwarf`
- inspect the data with `perf script -vvv`, you'll notice a lot of lines of 
the form

unwind: access_mem 0xffffffffffffffd7 not inside range 
0x7ffca0a86aa8-0x7ffca0a88aa8

which is also what we see in our perfparser utility.

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: overflows in Dwfl_Thread_Callbacks::memory_read callback
  2017-06-02 11:28 overflows in Dwfl_Thread_Callbacks::memory_read callback Milian Wolff
@ 2017-06-13 16:15 ` Mark Wielaard
  2017-06-14 13:13   ` Milian Wolff
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Wielaard @ 2017-06-13 16:15 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel

On Thu, 2017-06-01 at 22:46 +0200, Milian Wolff wrote:
> in the perfparser that Ulf wrote, and to which I'm contributing too, we often 
> see abnormal data being passed to the memory_read callback we define. 
> 
> I.e. our callback gets invoked with addr=0xFFFFFFFFFFFFFFFF which clearly 
> isn't an expected or valid address. So far we ignored this, as things seem to 
> work okayish, but I still suspect something is wrong somewhere... Out of 
> interest, I would like to ask if anyone has ever seen something like that?

Yes, you sometimes see that some register or location value is zero for
some reason, but the expression parser wants to read some memory at an
offset from that register. e.g. the SP being zero and trying to read at
SP +/- offset to get some value from the stack.

Maybe the expression parser should notice this situation and not even
invoke the memory_read callback in that case since it is obvious it will
fail.

> I can easily reproduce this problem. I recorded a session with rr and then 
> investigated what is going on. Below is a "screenshot" of my rr replay 
> session:
> 
> ~~~~~~~~~~~~~~
> 
> warning: perfparser/memoryRead: Invalid memory read requested by dwfl 
> 18446744073709551615 [/home/milian/projects/kdab/rnd/hotspot/3rdparty/
> perfparser/app/perfsymboltable.cpp:150/pid=31690]
> 
> Breakpoint 1, memoryRead (dwfl=0x1baaa80, addr=18446744073709551615, 
> result=0x7ffe83f69fe8, arg=0x7ffe83f6c278) at /home/milian/projects/kdab/rnd/
> hotspot/3rdparty/perfparser/app/perfsymboltable.cpp:151
> 151             ui->firstGuessedFrame = ui->frames.length();
> (rr) up
> #1  0x00007fd107ff7345 in expr_eval (state=state@entry=0x1bca010, 
> frame=0x1bcbdf0, ops=0x7ffe83f69ff0, nops=2, 
> result=result@entry=0x7ffe83f69fe8, bias=bias@entry=140092229857280)
>     at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:501
> 501           if (! process->callbacks->memory_read (process->dwfl, *result, 
> result,
> (rr) watch -l *result
> Hardware watchpoint 2: -location *result
> (rr) reverse-continue 
> Continuing.
> 
> Hardware watchpoint 2: -location *result
> 
> Old value = 18446744073709551615
> New value = 18446744073709551575
> 0x00007fd107ff72fe in do_pop (stack=0x7ffe83f69f20, stack=0x7ffe83f69f20, 
> val=0x7ffe83f69fe8) at /home/milian/projects/src/elfutils/libdwfl/
> frame_unwind.c:139
> 139       *val = stack->addrs[--stack->used];
> (rr) print stack->used
> $1 = 0
> (rr) reverse-step
> 139       *val = stack->addrs[--stack->used];
> (rr) print stack->used
> $2 = 1
> (rr) print stack->addrs[0]
> $3 = 18446744073709551615
> (rr) print stack->addrs[1]
> $4 = 140535749912192
> (rr) print stack->addrs[0]
> $5 = 18446744073709551615
> (rr) watch -l stack->addrs[0]
> Hardware watchpoint 3: -location stack->addrs[0]
> (rr) reverse-conQuit
> (rr) del 2
> (rr) reverse-continue 
> Continuing.
> 
> Hardware watchpoint 3: -location stack->addrs[0]
> 
> Old value = 18446744073709551615
> New value = 15
> 0x00007fd107ff6e8a in do_push (stack=0x7ffe83f69f20, val=18446744073709551615) 
> at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:127
> 127       stack->addrs[stack->used++] = val;
> (rr) print val
> $6 = 18446744073709551615
> (rr) up
> #1  0x00007fd107ff7a4f in expr_eval (state=state@entry=0x1bca010, 
> frame=0x1bcbdf0, ops=0x7ffe83f69ff0, nops=2, 
> result=result@entry=0x7ffe83f69fe8, bias=bias@entry=140092229857280)
>     at /home/milian/projects/src/elfutils/libdwfl/frame_unwind.c:361
> 361               if (! pop (&val1) || ! push (val1 + op->number))
> (rr) print val1
> $7 = 15
> (rr) print op->number
> $8 = 18446744073709551600
> (rr) watch -l op->number
> Hardware watchpoint 4: -location op->number
> (rr) del 3
> (rr) reverse-continue 
> Continuing.
> 
> Hardware watchpoint 4: -location op->number
> 
> Old value = 18446744073709551600
> New value = 0
> 0x00007fd107fe06e8 in dwarf_frame_register (fs=0x1bcbdf0, regno=regno@entry=6, 
> ops_mem=ops_mem@entry=0x7ffe83f69ff0, ops=ops@entry=0x7ffe83f69fd8, 
> nops=nops@entry=0x7ffe83f69fe0)
>     at /home/milian/projects/src/elfutils/libdw/dwarf_frame_register.c:80
> 80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
> (rr) list
> 75
> 76          case reg_offset:
> 77          case reg_val_offset:
> 78            ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_call_frame_cfa 
> };
> 79            if (reg->value != 0)
> 80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
> 81                                                .number = reg->value };
> 82            if (reg->rule == reg_val_offset)
> 83              /* A value, not a location.  */
> 84              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_stack_value };
> (rr) print reg->value
> $9 = -16
> (rr) watch -l reg->value
> Hardware watchpoint 5: -location reg->value
> (rr) reverse-continue 
> Continuing.
> 
> Hardware watchpoint 4: -location op->number
> 
> Old value = 0
> New value = 18446744073709551560
> dwarf_frame_register (fs=0x1bcbdf0, regno=regno@entry=6, 
> ops_mem=ops_mem@entry=0x7ffe83f69ff0, ops=ops@entry=0x7ffe83f69fd8, 
> nops=nops@entry=0x7ffe83f69fe0)
>     at /home/milian/projects/src/elfutils/libdw/dwarf_frame_register.c:80
> 80              ops_mem[(*nops)++] = (Dwarf_Op) { .atom = DW_OP_plus_uconst,
> (rr) del 4
> (rr) reverse-continue 
> Continuing.
> 
> Hardware watchpoint 5: -location reg->value
> 
> Old value = -16
> New value = 0
> 0x00007fd107fddf17 in execute_cfi (cache=cache@entry=0x1bb5b00, cie=0x1b9d100, 
> state=state@entry=0x7ffe83f69ed0, program=<optimized out>, end=0x7fd108039778 
> ",", abi_cfi=abi_cfi@entry=false, loc=21649, 
>     find_pc=23177) at /home/milian/projects/src/elfutils/libdw/cfi.c:260
> 260               register_rule (operand, val_offset, offset);
> (rr) list
> 255               get_uleb128 (operand, program, end);
> 256               cfi_assert (program < end);
> 257               get_uleb128 (offset, program, end);
> 258               offset *= cie->data_alignment_factor;
> 259             val_offset:
> 260               register_rule (operand, val_offset, offset);
> 261               continue;
> 262
> 263             case DW_CFA_val_offset_sf:
> 264               get_uleb128 (operand, program, end);
> (rr) print offset
> $10 = <optimized out>
> (rr) print val_offset
> No symbol "val_offset" in current context.
> (rr) print operand
> $11 = <optimized out>
> (rr) print operand
> $12 = <optimized out>
> 
> ~~~~~~~~~~~~~~
> 
> So from the above I notice a couple of... "interesting" behavior:
> 
> - op->number is an unsigned value, but gets the signed value -16 written to 
> it, thereby producing the large value 0xFFFFFFFFFFFFFFF0
> 
> - even if the above would be signed, the expression "15 - 16" is always going 
> to result in a bogus address
> 
> - has anyone a clue of what may be going wrong here? Is one of our callbacks 
> returning wrong values before? Is the DWARF info broken? Is there any kind of 
> DWARF validator or anything like that which I could use here to check? I 
> notice that this issue often arises with frames in libc and ld.so. But, so 
> far, I have not come up with a good way to write a MWE that reliably tests 
> this behavior... Is there maybe something in the elfutils source code repo 
> that I could use as a basis?

I am not following the above trace completely, but what is going on
seems to be that we have CFI and want to get a register value. So we
call dwarf_frame_register to determine the DWARF expression operations
that we need to execute to get the register value. dwarf_frame_register
determines that that the register is described by a register offset
value rule, so it generates operations saying see an the CFA
(DW_OP_call_frame_cfa) plus some offset (DW_OP_plus_uconst) as a value
(so read the value from cfa + offset, which is somewhere on the stack).
But then the cfa comes out as 15? That is obviously bogus. But I don't
really understand how that happened. It should be a value somewhere near
the current stack. Then reading 15 - offset (16) clearly fails.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: overflows in Dwfl_Thread_Callbacks::memory_read callback
  2017-06-13 16:15 ` Mark Wielaard
@ 2017-06-14 13:13   ` Milian Wolff
  2017-06-15 20:37     ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Milian Wolff @ 2017-06-14 13:13 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

On Dienstag, 13. Juni 2017 16:06:01 CEST Mark Wielaard wrote:
> On Thu, 2017-06-01 at 22:46 +0200, Milian Wolff wrote:
> > in the perfparser that Ulf wrote, and to which I'm contributing too, we
> > often see abnormal data being passed to the memory_read callback we
> > define.
> > 
> > I.e. our callback gets invoked with addr=0xFFFFFFFFFFFFFFFF which clearly
> > isn't an expected or valid address. So far we ignored this, as things seem
> > to work okayish, but I still suspect something is wrong somewhere... Out
> > of interest, I would like to ask if anyone has ever seen something like
> > that?
> Yes, you sometimes see that some register or location value is zero for
> some reason, but the expression parser wants to read some memory at an
> offset from that register. e.g. the SP being zero and trying to read at
> SP +/- offset to get some value from the stack.
> 
> Maybe the expression parser should notice this situation and not even
> invoke the memory_read callback in that case since it is obvious it will
> fail.

That would be good to do, I guess. Or at least document that this can happen 
and that it is expected behavior. I.e. right now I simply don't know whether 
our code is buggy and triggering this situation, or whether we are 
encountering something "normal" which just is expected to fail and then some 
fallback code will continue to do the right thing.

<snip>

> I am not following the above trace completely, but what is going on
> seems to be that we have CFI and want to get a register value. So we
> call dwarf_frame_register to determine the DWARF expression operations
> that we need to execute to get the register value. dwarf_frame_register
> determines that that the register is described by a register offset
> value rule, so it generates operations saying see an the CFA
> (DW_OP_call_frame_cfa) plus some offset (DW_OP_plus_uconst) as a value
> (so read the value from cfa + offset, which is somewhere on the stack).
> But then the cfa comes out as 15? That is obviously bogus. But I don't
> really understand how that happened. It should be a value somewhere near
> the current stack. Then reading 15 - offset (16) clearly fails.

Right, so that cfa value - is that in the DWARF info? So is the dso "broken"? 
Or is this some runtime value that we returned earlier in our register or 
memory callback implementation?

Thanks

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: overflows in Dwfl_Thread_Callbacks::memory_read callback
  2017-06-14 13:13   ` Milian Wolff
@ 2017-06-15 20:37     ` Mark Wielaard
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2017-06-15 20:37 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel

On Tue, 2017-06-13 at 18:15 +0200, Milian Wolff wrote:
> > I am not following the above trace completely, but what is going on
> > seems to be that we have CFI and want to get a register value. So we
> > call dwarf_frame_register to determine the DWARF expression operations
> > that we need to execute to get the register value. dwarf_frame_register
> > determines that that the register is described by a register offset
> > value rule, so it generates operations saying see an the CFA
> > (DW_OP_call_frame_cfa) plus some offset (DW_OP_plus_uconst) as a value
> > (so read the value from cfa + offset, which is somewhere on the stack).
> > But then the cfa comes out as 15? That is obviously bogus. But I don't
> > really understand how that happened. It should be a value somewhere near
> > the current stack. Then reading 15 - offset (16) clearly fails.
> 
> Right, so that cfa value - is that in the DWARF info? So is the dso "broken"? 
> Or is this some runtime value that we returned earlier in our register or 
> memory callback implementation?

This is how DWARF defines the CFA:

        An area of memory that is allocated on a stack called a “call
        frame.” The call frame is identified by an address on the stack.
        We refer to this address as the Canonical Frame Address or CFA.
        Typically, the CFA is defined to be the value of the stack
        pointer at the call site in the previous frame (which may be
        different from its value on entry to the current frame).

So in practice it points inside the stack where the previous stack
pointer was when this function was called. Various things can often be
found as offset of the CFA, the return address, saved register values,
arguments to the current function etc. You start unwinding by first
calculating the CFA. I cannot immediately see why that went wrong in
this case. But either we picked the wrong CFI (Call Frame Information)
for the current pc and are just calculating something bogus. Or one of
the needed values to calculate the CFA was garbage (either a bad
register value or a bad stack value).

Cheers,

Makr

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-14 18:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-02 11:28 overflows in Dwfl_Thread_Callbacks::memory_read callback Milian Wolff
2017-06-13 16:15 ` Mark Wielaard
2017-06-14 13:13   ` Milian Wolff
2017-06-15 20:37     ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).