public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* Debugging ld.so in gdb
@ 2022-02-04 13:45 Jacob Kroon
  2022-02-04 13:58 ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 13:45 UTC (permalink / raw)
  To: gdb

Hi,

Since a week or two I have started to see a segfault on my updated
Fedora 35 system. I suspect the segfault is related to a recent glibc
update.

The segfault I see happens when I run the following:

$ ldd ./mylib.so

I narrowed it down to running:

$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./mylib.so

"coredumpctl info" gives me:

> Stack trace of thread 143567:
> #0  0x00007ff428f73590 n/a (n/a + 0x0)
> #1  0x00007ff428f8af0f _dl_map_object_deps (/usr/lib64/ld-linux-x86-64.so.2 + 0x3f0f)
> #2  0x00007ff428fa6970 dl_main (/usr/lib64/ld-linux-x86-64.so.2 + 0x1f970)
> #3  0x00007ff428fa2c7c _dl_sysdep_start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1bc7c)
> #4  0x00007ff428fa4678 _dl_start_final (/usr/lib64/ld-linux-x86-64.so.2 + 0x1d678)
> #5  0x00007ff428fa36a8 _start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1c6a8)

but inspecting in gdb using "coredumpctl debug" doesn't give me any sane
backtrace.

The .so is part of a Yocto build. If I copy the file out from its build
directory to $HOME and run ldd on it, then there is no crash. So I
suspect RUNPATH is involved somehow since it contains $ORIGIN.

Any ideas of what I can do to investigate further ?

Regards Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 13:45 Debugging ld.so in gdb Jacob Kroon
@ 2022-02-04 13:58 ` Florian Weimer
  2022-02-04 14:09   ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-04 13:58 UTC (permalink / raw)
  To: Jacob Kroon via Gdb

* Jacob Kroon via Gdb:

> Since a week or two I have started to see a segfault on my updated
> Fedora 35 system. I suspect the segfault is related to a recent glibc
> update.
>
> The segfault I see happens when I run the following:
>
> $ ldd ./mylib.so
>
> I narrowed it down to running:
>
> $ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./mylib.so
>
> "coredumpctl info" gives me:
>
>> Stack trace of thread 143567:
>> #0  0x00007ff428f73590 n/a (n/a + 0x0)
>> #1  0x00007ff428f8af0f _dl_map_object_deps (/usr/lib64/ld-linux-x86-64.so.2 + 0x3f0f)
>> #2  0x00007ff428fa6970 dl_main (/usr/lib64/ld-linux-x86-64.so.2 + 0x1f970)
>> #3  0x00007ff428fa2c7c _dl_sysdep_start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1bc7c)
>> #4  0x00007ff428fa4678 _dl_start_final (/usr/lib64/ld-linux-x86-64.so.2 + 0x1d678)
>> #5  0x00007ff428fa36a8 _start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1c6a8)
>
> but inspecting in gdb using "coredumpctl debug" doesn't give me any sane
> backtrace.
>
> The .so is part of a Yocto build. If I copy the file out from its build
> directory to $HOME and run ldd on it, then there is no crash. So I
> suspect RUNPATH is involved somehow since it contains $ORIGIN.
>
> Any ideas of what I can do to investigate further ?

I suggest to run ld.so under GDB, with

  set startup-with-shell off
  set environment LD_TRACE_LOADED_OBJECTS 1
  b _start
  run ./mylib.so
  record btrace pt
  continue

And after the crash, look at

  record instruction-history

to see how it reached the crash.  This assumes that you have an
execution environment that supports branch tracing.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 13:58 ` Florian Weimer
@ 2022-02-04 14:09   ` Jacob Kroon
  2022-02-04 14:22     ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 14:09 UTC (permalink / raw)
  To: Florian Weimer, Jacob Kroon via Gdb

Hi Florian,

On 2/4/22 14:58, Florian Weimer wrote:
> * Jacob Kroon via Gdb:
> 
>> Since a week or two I have started to see a segfault on my updated
>> Fedora 35 system. I suspect the segfault is related to a recent glibc
>> update.
>>
>> The segfault I see happens when I run the following:
>>
>> $ ldd ./mylib.so
>>
>> I narrowed it down to running:
>>
>> $ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./mylib.so
>>
>> "coredumpctl info" gives me:
>>
>>> Stack trace of thread 143567:
>>> #0  0x00007ff428f73590 n/a (n/a + 0x0)
>>> #1  0x00007ff428f8af0f _dl_map_object_deps (/usr/lib64/ld-linux-x86-64.so.2 + 0x3f0f)
>>> #2  0x00007ff428fa6970 dl_main (/usr/lib64/ld-linux-x86-64.so.2 + 0x1f970)
>>> #3  0x00007ff428fa2c7c _dl_sysdep_start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1bc7c)
>>> #4  0x00007ff428fa4678 _dl_start_final (/usr/lib64/ld-linux-x86-64.so.2 + 0x1d678)
>>> #5  0x00007ff428fa36a8 _start (/usr/lib64/ld-linux-x86-64.so.2 + 0x1c6a8)
>>
>> but inspecting in gdb using "coredumpctl debug" doesn't give me any sane
>> backtrace.
>>
>> The .so is part of a Yocto build. If I copy the file out from its build
>> directory to $HOME and run ldd on it, then there is no crash. So I
>> suspect RUNPATH is involved somehow since it contains $ORIGIN.
>>
>> Any ideas of what I can do to investigate further ?
> 
> I suggest to run ld.so under GDB, with
> 
>   set startup-with-shell off
>   set environment LD_TRACE_LOADED_OBJECTS 1
>   b _start
>   run ./mylib.so
>   record btrace pt
>   continue
> 
> And after the crash, look at
> 
>   record instruction-history
> 
> to see how it reached the crash.  This assumes that you have an
> execution environment that supports branch tracing.
> 

This is what I get, following the instructions above:

> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    

Does that make sense ? Any other information I can provide. This is with
glibc-2.34-24.fc35.x86_64, Fedora 35.

Regards Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 14:09   ` Jacob Kroon
@ 2022-02-04 14:22     ` Florian Weimer
  2022-02-04 14:27       ` Jacob Kroon
  2022-02-04 14:45       ` Jacob Kroon
  0 siblings, 2 replies; 24+ messages in thread
From: Florian Weimer @ 2022-02-04 14:22 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

> This is what I get, following the instructions above:
>
>> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
>> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
>> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
>> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
>> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
>> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
>> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
>> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
>> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
>> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    
>
> Does that make sense ? Any other information I can provide. This is with
> glibc-2.34-24.fc35.x86_64, Fedora 35.

This doesn't really make sense.  There's probably some GDB option to get
a longer trace.

If it is crashing at the RET, it means that either code has been mapped
over, or the stack has been corrupted.  At the crash site, what does

  print *(void**)$rsp

print?

  disassemble *(void**)$rsp

could also be interesting.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 14:22     ` Florian Weimer
@ 2022-02-04 14:27       ` Jacob Kroon
  2022-02-04 16:09         ` Florian Weimer
  2022-02-04 14:45       ` Jacob Kroon
  1 sibling, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 14:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/4/22 15:22, Florian Weimer wrote:
> * Jacob Kroon:
> 
>> This is what I get, following the instructions above:
>>
>>> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
>>> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
>>> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
>>> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
>>> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
>>> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
>>> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
>>> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
>>> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
>>> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    
>>
>> Does that make sense ? Any other information I can provide. This is with
>> glibc-2.34-24.fc35.x86_64, Fedora 35.
> 
> This doesn't really make sense.  There's probably some GDB option to get
> a longer trace.
> 
> If it is crashing at the RET, it means that either code has been mapped
> over, or the stack has been corrupted.  At the crash site, what does
> 
>   print *(void**)$rsp
> 
> print?
> 

$2 = (void *) 0x7ffff7d31b70

>   disassemble *(void**)$rsp
> 
> could also be interesting.
> 

"No function contains specified address"

Let me see if I can find some gdb option to get a longer trace.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 14:22     ` Florian Weimer
  2022-02-04 14:27       ` Jacob Kroon
@ 2022-02-04 14:45       ` Jacob Kroon
  1 sibling, 0 replies; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 14:45 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/4/22 15:22, Florian Weimer wrote:
> * Jacob Kroon:
> 
>> This is what I get, following the instructions above:
>>
>>> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
>>> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
>>> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
>>> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
>>> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
>>> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
>>> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
>>> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
>>> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
>>> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    
>>
>> Does that make sense ? Any other information I can provide. This is with
>> glibc-2.34-24.fc35.x86_64, Fedora 35.
> 
> This doesn't really make sense.  There's probably some GDB option to get
> a longer trace.
> 
> If it is crashing at the RET, it means that either code has been mapped
> over, or the stack has been corrupted.  At the crash site, what does
> 
>   print *(void**)$rsp
> 
> print?
> 
>   disassemble *(void**)$rsp
> 
> could also be interesting.
> 
> Thanks,
> Florian
> 

I put a breakpoint in "dfs_traversal" and each time I stop in there the
backtrace looks ok, but once the crash has happened, "bt" shows:

> #0  0x00007ffff7fad590 in ?? ()
> #1  0x00007ffff7d31b70 in ?? ()
> #2  0x00007ffff7d32830 in ?? ()
> #3  0x00007ffff7fae150 in ?? ()
> #4  0x00007ffff7fae730 in ?? ()
> #5  0x00007ffff7d32160 in ?? ()
> #6  0x00007ffff7952d30 in ?? ()
> #7  0x00007ffff79d1920 in ?? ()
> #8  0x00007ffff7d31000 in ?? ()
> #9  0x00007ffff79d1ef0 in ?? ()
> #10 0x00007ffff79d24c0 in ?? ()
> #11 0x00007ffff7952000 in ?? ()
> #12 0x00007ffff7952660 in ?? ()
> #13 0x00007ffff79537a0 in ?? ()
> #14 0x00007ffff7d31570 in ?? ()
> #15 0x00007ffff7ffda30 in _rtld_local ()
> #16 0x0000000000000001 in ?? ()
> #17 0xffffffffa5c00000 in ?? ()
> #18 0xffffeffc0b0e0000 in ?? ()
> #19 0x00007ffff795a409 in ?? ()
> #20 0x0000000000000000 in ?? ()

so maybe the stack gets corrupted..

Jacob


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 14:27       ` Jacob Kroon
@ 2022-02-04 16:09         ` Florian Weimer
  2022-02-04 16:53           ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-04 16:09 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

> On 2/4/22 15:22, Florian Weimer wrote:
>> * Jacob Kroon:
>> 
>>> This is what I get, following the instructions above:
>>>
>>>> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
>>>> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
>>>> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
>>>> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
>>>> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
>>>> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
>>>> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
>>>> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
>>>> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
>>>> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    
>>>
>>> Does that make sense ? Any other information I can provide. This is with
>>> glibc-2.34-24.fc35.x86_64, Fedora 35.
>> 
>> This doesn't really make sense.  There's probably some GDB option to get
>> a longer trace.
>> 
>> If it is crashing at the RET, it means that either code has been mapped
>> over, or the stack has been corrupted.  At the crash site, what does
>> 
>>   print *(void**)$rsp
>> 
>> print?
>> 
>
> $2 = (void *) 0x7ffff7d31b70
>
>>   disassemble *(void**)$rsp
>> 
>> could also be interesting.
>> 
>
> "No function contains specified address"
>
> Let me see if I can find some gdb option to get a longer trace.

Looks like the code at that address has been unmapped (or the link map
is at least gone from a GDB perspective).  Maybe you can see what was at
the address before using “info files”?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 16:09         ` Florian Weimer
@ 2022-02-04 16:53           ` Jacob Kroon
  2022-02-04 17:04             ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 16:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/4/22 17:09, Florian Weimer wrote:
> * Jacob Kroon:
> 
>> On 2/4/22 15:22, Florian Weimer wrote:
>>> * Jacob Kroon:
>>>
>>>> This is what I get, following the instructions above:
>>>>
>>>>> 171966	   0x00007ffff7fd85a0 <dfs_traversal+80>:	mov    0x0(%r13),%rax
>>>>> 171967	   0x00007ffff7fd85a4 <dfs_traversal+84>:	lea    -0x8(%rax),%rdx
>>>>> 171968	   0x00007ffff7fd85a8 <dfs_traversal+88>:	mov    %rdx,0x0(%r13)
>>>>> 171969	   0x00007ffff7fd85ac <dfs_traversal+92>:	mov    %rbp,-0x8(%rax)
>>>>> 171970	   0x00007ffff7fd85b0 <dfs_traversal+96>:	add    $0x8,%rsp
>>>>> 171971	   0x00007ffff7fd85b4 <dfs_traversal+100>:	pop    %rbx
>>>>> 171972	   0x00007ffff7fd85b5 <dfs_traversal+101>:	pop    %rbp
>>>>> 171973	   0x00007ffff7fd85b6 <dfs_traversal+102>:	pop    %r12
>>>>> 171974	   0x00007ffff7fd85b8 <dfs_traversal+104>:	pop    %r13
>>>>> 171975	   0x00007ffff7fd85ba <dfs_traversal+106>:	ret    
>>>>
>>>> Does that make sense ? Any other information I can provide. This is with
>>>> glibc-2.34-24.fc35.x86_64, Fedora 35.
>>>
>>> This doesn't really make sense.  There's probably some GDB option to get
>>> a longer trace.
>>>
>>> If it is crashing at the RET, it means that either code has been mapped
>>> over, or the stack has been corrupted.  At the crash site, what does
>>>
>>>   print *(void**)$rsp
>>>
>>> print?
>>>
>>
>> $2 = (void *) 0x7ffff7d31b70
>>
>>>   disassemble *(void**)$rsp
>>>
>>> could also be interesting.
>>>
>>
>> "No function contains specified address"
>>
>> Let me see if I can find some gdb option to get a longer trace.
> 
> Looks like the code at that address has been unmapped (or the link map
> is at least gone from a GDB perspective).  Maybe you can see what was at
> the address before using “info files”?
> 
> Thanks,
> Florian
> 

I couldn't see that address anywhere in the output of "show files".

But I did "full" recording, and found a place where just stepping a
single instruction broke gdb interpreting the backtrace, if that is of
any help. This is what I do:

1. goto instruction 225037
2. print backtrace (looks sane)
3. do "disas"
4. step one instruction with "stepi"
5. print backtrace (looks garbled)
6. do "disas"

> (gdb) record goto 225037
> Go backward to insn number 225037
> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
> 175	  **rpo = map;
> (gdb) bt
> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd3b0) at dl-sort-maps.c:143
> #2  dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fadb70, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:155
> #3  0x00007ffff7fd89cd in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd3b0) at dl-sort-maps.c:143
> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
> #5  _dl_sort_maps (maps=maps@entry=0x7ffff7953de0, nmaps=nmaps@entry=15, skip=<optimized out>, for_fini=for_fini@entry=false) at dl-sort-maps.c:299
> #6  0x00007ffff7fcaf0f in _dl_map_object_deps (map=<optimized out>, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=<optimized out>, open_mode=<optimized out>)
>     at dl-deps.c:616
> #7  0x00007ffff7fe6970 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1968
> #8  0x00007ffff7fe2c7c in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fe4bb0 <dl_main>) at ../elf/dl-sysdep.c:264
> #9  0x00007ffff7fe4678 in _dl_start_final (arg=0x7fffffffdf50) at rtld.c:493
> #10 _dl_start (arg=0x7fffffffdf50) at rtld.c:587
> #11 0x00007ffff7fe36a8 in _start ()
> (gdb) disas
> Dump of assembler code for function dfs_traversal:
>    0x00007ffff7fd8550 <+0>:	push   %r13
>    0x00007ffff7fd8552 <+2>:	mov    %rdi,%r13
>    0x00007ffff7fd8555 <+5>:	push   %r12
>    0x00007ffff7fd8557 <+7>:	mov    %rdx,%r12
>    0x00007ffff7fd855a <+10>:	push   %rbp
>    0x00007ffff7fd855b <+11>:	mov    %rsi,%rbp
>    0x00007ffff7fd855e <+14>:	push   %rbx
>    0x00007ffff7fd855f <+15>:	sub    $0x8,%rsp
>    0x00007ffff7fd8563 <+19>:	mov    0x3d0(%rsi),%rax
>    0x00007ffff7fd856a <+26>:	orb    $0x1,0x31d(%rsi)
>    0x00007ffff7fd8571 <+33>:	test   %rax,%rax
>    0x00007ffff7fd8574 <+36>:	je     0x7ffff7fd859b <dfs_traversal+75>
>    0x00007ffff7fd8576 <+38>:	mov    (%rax),%rsi
>    0x00007ffff7fd8579 <+41>:	test   %rsi,%rsi
>    0x00007ffff7fd857c <+44>:	je     0x7ffff7fd859b <dfs_traversal+75>
>    0x00007ffff7fd857e <+46>:	mov    $0x8,%ebx
>    0x00007ffff7fd8583 <+51>:	testw  $0x180,0x31c(%rsi)
>    0x00007ffff7fd858c <+60>:	je     0x7ffff7fd85c0 <dfs_traversal+112>
>    0x00007ffff7fd858e <+62>:	mov    (%rax,%rbx,1),%rsi
>    0x00007ffff7fd8592 <+66>:	add    $0x8,%rbx
>    0x00007ffff7fd8596 <+70>:	test   %rsi,%rsi
>    0x00007ffff7fd8599 <+73>:	jne    0x7ffff7fd8583 <dfs_traversal+51>
>    0x00007ffff7fd859b <+75>:	test   %r12,%r12
>    0x00007ffff7fd859e <+78>:	jne    0x7ffff7fd85e0 <dfs_traversal+144>
>    0x00007ffff7fd85a0 <+80>:	mov    0x0(%r13),%rax
>    0x00007ffff7fd85a4 <+84>:	lea    -0x8(%rax),%rdx
>    0x00007ffff7fd85a8 <+88>:	mov    %rdx,0x0(%r13)
> => 0x00007ffff7fd85ac <+92>:	mov    %rbp,-0x8(%rax)
>    0x00007ffff7fd85b0 <+96>:	add    $0x8,%rsp
>    0x00007ffff7fd85b4 <+100>:	pop    %rbx
>    0x00007ffff7fd85b5 <+101>:	pop    %rbp
>    0x00007ffff7fd85b6 <+102>:	pop    %r12
>    0x00007ffff7fd85b8 <+104>:	pop    %r13
>    0x00007ffff7fd85ba <+106>:	ret    
>    0x00007ffff7fd85bb <+107>:	nopl   0x0(%rax,%rax,1)
>    0x00007ffff7fd85c0 <+112>:	testb  $0x1,0x31d(%rsi)
>    0x00007ffff7fd85c7 <+119>:	jne    0x7ffff7fd858e <dfs_traversal+62>
>    0x00007ffff7fd85c9 <+121>:	mov    %r12,%rdx
>    0x00007ffff7fd85cc <+124>:	mov    %r13,%rdi
>    0x00007ffff7fd85cf <+127>:	call   0x7ffff7fd8550 <dfs_traversal>
>    0x00007ffff7fd85d4 <+132>:	mov    0x3d0(%rbp),%rax
>    0x00007ffff7fd85db <+139>:	jmp    0x7ffff7fd858e <dfs_traversal+62>
>    0x00007ffff7fd85dd <+141>:	nopl   (%rax)
>    0x00007ffff7fd85e0 <+144>:	mov    0x3d8(%rbp),%rax
>    0x00007ffff7fd85e7 <+151>:	test   %rax,%rax
>    0x00007ffff7fd85ea <+154>:	je     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd85ec <+156>:	mov    (%rax),%ebx
>    0x00007ffff7fd85ee <+158>:	movb   $0x1,(%r12)
>    0x00007ffff7fd85f3 <+163>:	sub    $0x1,%ebx
>    0x00007ffff7fd85f6 <+166>:	js     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd85f8 <+168>:	movslq %ebx,%rdx
>    0x00007ffff7fd85fb <+171>:	mov    0x8(%rax,%rdx,8),%rsi
>    0x00007ffff7fd8600 <+176>:	testw  $0x180,0x31c(%rsi)
>    0x00007ffff7fd8609 <+185>:	je     0x7ffff7fd8619 <dfs_traversal+201>
>    0x00007ffff7fd860b <+187>:	sub    $0x1,%ebx
>    0x00007ffff7fd860e <+190>:	jb     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd8610 <+192>:	mov    0x3d8(%rbp),%rax
>    0x00007ffff7fd8617 <+199>:	jmp    0x7ffff7fd85f8 <dfs_traversal+168>
>    0x00007ffff7fd8619 <+201>:	testb  $0x1,0x31d(%rsi)
>    0x00007ffff7fd8620 <+208>:	jne    0x7ffff7fd860b <dfs_traversal+187>
>    0x00007ffff7fd8622 <+210>:	mov    %r12,%rdx
>    0x00007ffff7fd8625 <+213>:	mov    %r13,%rdi
>    0x00007ffff7fd8628 <+216>:	call   0x7ffff7fd8550 <dfs_traversal>
>    0x00007ffff7fd862d <+221>:	jmp    0x7ffff7fd860b <dfs_traversal+187>
> End of assembler dump.
> (gdb) stepi
> 0x00007ffff7fd85b0	176	}
> (gdb) bt
> #0  0x00007ffff7fd85b0 in dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:176
> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd3b0) at dl-sort-maps.c:143
> #2  dfs_traversal (rpo=0x7fffffffd3b0, map=0x7ffff7fadb70, do_reldeps=0x0) at dl-sort-maps.c:155
> #3  0x00007ffff7fad590 in ?? ()
> #4  0x00007ffff7d31b70 in ?? ()
> #5  0x00007ffff7d32830 in ?? ()
> #6  0x00007ffff7fae150 in ?? ()
> #7  0x00007ffff7fae730 in ?? ()
> #8  0x00007ffff7d32160 in ?? ()
> #9  0x00007ffff7952d30 in ?? ()
> #10 0x00007ffff79d1920 in ?? ()
> #11 0x00007ffff7d31000 in ?? ()
> #12 0x00007ffff79d1ef0 in ?? ()
> #13 0x00007ffff79d24c0 in ?? ()
> #14 0x00007ffff7952000 in ?? ()
> #15 0x00007ffff7952660 in ?? ()
> #16 0x00007ffff79537a0 in ?? ()
> #17 0x00007ffff7d31570 in ?? ()
> #18 0x00007ffff7ffda30 in _rtld_local ()
> #19 0x0000000000000001 in ?? ()
> #20 0xffffffffa5c00000 in ?? ()
> #21 0xffffeffc0b0e0000 in ?? ()
> #22 0x00007ffff795a409 in ?? ()
> #23 0x0000000000000000 in ?? ()
> (gdb) disas
> Dump of assembler code for function dfs_traversal:
>    0x00007ffff7fd8550 <+0>:	push   %r13
>    0x00007ffff7fd8552 <+2>:	mov    %rdi,%r13
>    0x00007ffff7fd8555 <+5>:	push   %r12
>    0x00007ffff7fd8557 <+7>:	mov    %rdx,%r12
>    0x00007ffff7fd855a <+10>:	push   %rbp
>    0x00007ffff7fd855b <+11>:	mov    %rsi,%rbp
>    0x00007ffff7fd855e <+14>:	push   %rbx
>    0x00007ffff7fd855f <+15>:	sub    $0x8,%rsp
>    0x00007ffff7fd8563 <+19>:	mov    0x3d0(%rsi),%rax
>    0x00007ffff7fd856a <+26>:	orb    $0x1,0x31d(%rsi)
>    0x00007ffff7fd8571 <+33>:	test   %rax,%rax
>    0x00007ffff7fd8574 <+36>:	je     0x7ffff7fd859b <dfs_traversal+75>
>    0x00007ffff7fd8576 <+38>:	mov    (%rax),%rsi
>    0x00007ffff7fd8579 <+41>:	test   %rsi,%rsi
>    0x00007ffff7fd857c <+44>:	je     0x7ffff7fd859b <dfs_traversal+75>
>    0x00007ffff7fd857e <+46>:	mov    $0x8,%ebx
>    0x00007ffff7fd8583 <+51>:	testw  $0x180,0x31c(%rsi)
>    0x00007ffff7fd858c <+60>:	je     0x7ffff7fd85c0 <dfs_traversal+112>
>    0x00007ffff7fd858e <+62>:	mov    (%rax,%rbx,1),%rsi
>    0x00007ffff7fd8592 <+66>:	add    $0x8,%rbx
>    0x00007ffff7fd8596 <+70>:	test   %rsi,%rsi
>    0x00007ffff7fd8599 <+73>:	jne    0x7ffff7fd8583 <dfs_traversal+51>
>    0x00007ffff7fd859b <+75>:	test   %r12,%r12
>    0x00007ffff7fd859e <+78>:	jne    0x7ffff7fd85e0 <dfs_traversal+144>
>    0x00007ffff7fd85a0 <+80>:	mov    0x0(%r13),%rax
>    0x00007ffff7fd85a4 <+84>:	lea    -0x8(%rax),%rdx
>    0x00007ffff7fd85a8 <+88>:	mov    %rdx,0x0(%r13)
>    0x00007ffff7fd85ac <+92>:	mov    %rbp,-0x8(%rax)
> => 0x00007ffff7fd85b0 <+96>:	add    $0x8,%rsp
>    0x00007ffff7fd85b4 <+100>:	pop    %rbx
>    0x00007ffff7fd85b5 <+101>:	pop    %rbp
>    0x00007ffff7fd85b6 <+102>:	pop    %r12
>    0x00007ffff7fd85b8 <+104>:	pop    %r13
>    0x00007ffff7fd85ba <+106>:	ret    
>    0x00007ffff7fd85bb <+107>:	nopl   0x0(%rax,%rax,1)
>    0x00007ffff7fd85c0 <+112>:	testb  $0x1,0x31d(%rsi)
>    0x00007ffff7fd85c7 <+119>:	jne    0x7ffff7fd858e <dfs_traversal+62>
>    0x00007ffff7fd85c9 <+121>:	mov    %r12,%rdx
>    0x00007ffff7fd85cc <+124>:	mov    %r13,%rdi
>    0x00007ffff7fd85cf <+127>:	call   0x7ffff7fd8550 <dfs_traversal>
>    0x00007ffff7fd85d4 <+132>:	mov    0x3d0(%rbp),%rax
>    0x00007ffff7fd85db <+139>:	jmp    0x7ffff7fd858e <dfs_traversal+62>
>    0x00007ffff7fd85dd <+141>:	nopl   (%rax)
>    0x00007ffff7fd85e0 <+144>:	mov    0x3d8(%rbp),%rax
>    0x00007ffff7fd85e7 <+151>:	test   %rax,%rax
>    0x00007ffff7fd85ea <+154>:	je     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd85ec <+156>:	mov    (%rax),%ebx
>    0x00007ffff7fd85ee <+158>:	movb   $0x1,(%r12)
>    0x00007ffff7fd85f3 <+163>:	sub    $0x1,%ebx
>    0x00007ffff7fd85f6 <+166>:	js     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd85f8 <+168>:	movslq %ebx,%rdx
>    0x00007ffff7fd85fb <+171>:	mov    0x8(%rax,%rdx,8),%rsi
>    0x00007ffff7fd8600 <+176>:	testw  $0x180,0x31c(%rsi)
>    0x00007ffff7fd8609 <+185>:	je     0x7ffff7fd8619 <dfs_traversal+201>
>    0x00007ffff7fd860b <+187>:	sub    $0x1,%ebx
>    0x00007ffff7fd860e <+190>:	jb     0x7ffff7fd85a0 <dfs_traversal+80>
>    0x00007ffff7fd8610 <+192>:	mov    0x3d8(%rbp),%rax
>    0x00007ffff7fd8617 <+199>:	jmp    0x7ffff7fd85f8 <dfs_traversal+168>
>    0x00007ffff7fd8619 <+201>:	testb  $0x1,0x31d(%rsi)
>    0x00007ffff7fd8620 <+208>:	jne    0x7ffff7fd860b <dfs_traversal+187>
>    0x00007ffff7fd8622 <+210>:	mov    %r12,%rdx
>    0x00007ffff7fd8625 <+213>:	mov    %r13,%rdi
>    0x00007ffff7fd8628 <+216>:	call   0x7ffff7fd8550 <dfs_traversal>
>    0x00007ffff7fd862d <+221>:	jmp    0x7ffff7fd860b <dfs_traversal+187>
> End of assembler dump.


/Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 16:53           ` Jacob Kroon
@ 2022-02-04 17:04             ` Florian Weimer
  2022-02-04 17:11               ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-04 17:04 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

> I couldn't see that address anywhere in the output of "show files".
>
> But I did "full" recording, and found a place where just stepping a
> single instruction broke gdb interpreting the backtrace, if that is of
> any help. This is what I do:
>
> 1. goto instruction 225037
> 2. print backtrace (looks sane)
> 3. do "disas"
> 4. step one instruction with "stepi"
> 5. print backtrace (looks garbled)
> 6. do "disas"

Interesting.  Can you figure out where *rpo points right before things
go wrong?  If the debuginfo doesn't work, this should do it:

  print (void *)$rax - 8

Maybe also look at map->l_name at this point, and further up the stack,
in dl_sort_maps, at maps and nmaps.  It looks like we run off the end of
the array and write garbage to other areas of the process. 8-(

This must be caused by something unusual in the object dependencies.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 17:04             ` Florian Weimer
@ 2022-02-04 17:11               ` Jacob Kroon
  2022-02-04 17:15                 ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-04 17:11 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/4/22 18:04, Florian Weimer wrote:
> * Jacob Kroon:
> 
>> I couldn't see that address anywhere in the output of "show files".
>>
>> But I did "full" recording, and found a place where just stepping a
>> single instruction broke gdb interpreting the backtrace, if that is of
>> any help. This is what I do:
>>
>> 1. goto instruction 225037
>> 2. print backtrace (looks sane)
>> 3. do "disas"
>> 4. step one instruction with "stepi"
>> 5. print backtrace (looks garbled)
>> 6. do "disas"
> 
> Interesting.  Can you figure out where *rpo points right before things
> go wrong?  If the debuginfo doesn't work, this should do it:
> 
>   print (void *)$rax - 8
> 

This is what I get:

> (gdb) record goto 225037
> Go backward to insn number 225037
> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
> 175	  **rpo = map;
> (gdb) print *rpo
> $13 = (struct link_map **) 0x7fffffffd2c8
> (gdb) print (void *)$rax - 8
> $14 = (void *) 0x7fffffffd2c8



> Maybe also look at map->l_name at this point, and further up the stack,
> in dl_sort_maps, at maps and nmaps.  It looks like we run off the end of
> the array and write garbage to other areas of the process. 8-(
> 

And this:

> (gdb) print map->l_name 
> $15 = 0x7ffff7fad500 "/tmp/ramdisk/jacob-linux-master-glibc/work/x86_64-linux/icedtea7-native/2.1.3-r1.0/icedtea-2.1.3/build/openjdk.build-boot/lib/amd64/libjava.so"

I couldn't find the other variables, maybe I need to get more acquainted
 with the sourcecode here.

> This must be caused by something unusual in the object dependencies.
> 
> Thanks,
> Florian
> 

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 17:11               ` Jacob Kroon
@ 2022-02-04 17:15                 ` Florian Weimer
  2022-02-07  8:36                   ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-04 17:15 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

>> Interesting.  Can you figure out where *rpo points right before things
>> go wrong?  If the debuginfo doesn't work, this should do it:
>> 
>>   print (void *)$rax - 8
>> 
>
> This is what I get:
>
>> (gdb) record goto 225037
>> Go backward to insn number 225037
>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd3b0, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>> 175	  **rpo = map;
>> (gdb) print *rpo
>> $13 = (struct link_map **) 0x7fffffffd2c8
>> (gdb) print (void *)$rax - 8
>> $14 = (void *) 0x7fffffffd2c8

I wonder where 0x7fffffffd2c8 is located.  Maybe on the minimal malloc
heap?

>> Maybe also look at map->l_name at this point, and further up the stack,
>> in dl_sort_maps, at maps and nmaps.  It looks like we run off the end of
>> the array and write garbage to other areas of the process. 8-(
>> 
>
> And this:
>
>> (gdb) print map->l_name 
>> $15 = 0x7ffff7fad500 "/tmp/ramdisk/jacob-linux-master-glibc/work/x86_64-linux/icedtea7-native/2.1.3-r1.0/icedtea-2.1.3/build/openjdk.build-boot/lib/amd64/libjava.so"
>
> I couldn't find the other variables, maybe I need to get more acquainted
>  with the sourcecode here.

I think you have to enter “up” a couple of times.

I suspect we are writing beyond the start of the array passed to
_dl_sort_maps.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-04 17:15                 ` Florian Weimer
@ 2022-02-07  8:36                   ` Jacob Kroon
  2022-02-07 11:46                     ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07  8:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

Hi Florian,

On 2/4/22 18:15, Florian Weimer wrote:
> I suspect we are writing beyond the start of the array passed to
> _dl_sort_maps.
> 

It looks like it is writing passed the beginning of the rpo[] array in
_dl_sort_maps_dfs(). The output below is right before the crash happens
(stepping one instruction garbles the backtrace):

> (gdb) bt
> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
> #2  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fadb70, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:155
> #3  0x00007ffff7fd89cd in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
> #5  _dl_sort_maps (maps=maps@entry=0x7ffff7953de0, nmaps=nmaps@entry=15, skip=<optimized out>, for_fini=for_fini@entry=false) at dl-sort-maps.c:299
> #6  0x00007ffff7fcaf0f in _dl_map_object_deps (map=<optimized out>, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=<optimized out>, 
>     open_mode=<optimized out>) at dl-deps.c:616
> #7  0x00007ffff7fe6970 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1968
> #8  0x00007ffff7fe2c7c in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fe4bb0 <dl_main>) at ../elf/dl-sysdep.c:264
> #9  0x00007ffff7fe4678 in _dl_start_final (arg=0x7fffffffdec0) at rtld.c:493
> #10 _dl_start (arg=0x7fffffffdec0) at rtld.c:587
> #11 0x00007ffff7fe36a8 in _start ()
> (gdb) f 0
> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
> 175       **rpo = map;
> (gdb) print *rpo
> $62 = (struct link_map **) 0x7fffffffd238
> (gdb) f 4
> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
> 233           dfs_traversal (&rpo_head, maps[i], do_reldeps_ref);
> (gdb) print &rpo[-1]
> $63 = (struct link_map **) 0x7fffffffd238

I inspected the "maps" vector and it containes *multiple* entries to
"libjvm.so", is that allowed ? I wonder if "nmaps" is calculated
correctly, since that determines the array size. Can I verify that somehow ?

Any other ideas I should try ?

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07  8:36                   ` Jacob Kroon
@ 2022-02-07 11:46                     ` Jacob Kroon
  2022-02-07 11:55                       ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 11:46 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/7/22 09:36, Jacob Kroon wrote:
> Hi Florian,
> 
> On 2/4/22 18:15, Florian Weimer wrote:
>> I suspect we are writing beyond the start of the array passed to
>> _dl_sort_maps.
>>
> 
> It looks like it is writing passed the beginning of the rpo[] array in
> _dl_sort_maps_dfs(). The output below is right before the crash happens
> (stepping one instruction garbles the backtrace):
> 
>> (gdb) bt
>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>> #2  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fadb70, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:155
>> #3  0x00007ffff7fd89cd in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>> #5  _dl_sort_maps (maps=maps@entry=0x7ffff7953de0, nmaps=nmaps@entry=15, skip=<optimized out>, for_fini=for_fini@entry=false) at dl-sort-maps.c:299
>> #6  0x00007ffff7fcaf0f in _dl_map_object_deps (map=<optimized out>, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=<optimized out>, 
>>     open_mode=<optimized out>) at dl-deps.c:616
>> #7  0x00007ffff7fe6970 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1968
>> #8  0x00007ffff7fe2c7c in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fe4bb0 <dl_main>) at ../elf/dl-sysdep.c:264
>> #9  0x00007ffff7fe4678 in _dl_start_final (arg=0x7fffffffdec0) at rtld.c:493
>> #10 _dl_start (arg=0x7fffffffdec0) at rtld.c:587
>> #11 0x00007ffff7fe36a8 in _start ()
>> (gdb) f 0
>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>> 175       **rpo = map;
>> (gdb) print *rpo
>> $62 = (struct link_map **) 0x7fffffffd238
>> (gdb) f 4
>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>> 233           dfs_traversal (&rpo_head, maps[i], do_reldeps_ref);
>> (gdb) print &rpo[-1]
>> $63 = (struct link_map **) 0x7fffffffd238
> 
> I inspected the "maps" vector and it containes *multiple* entries to
> "libjvm.so", is that allowed ? I wonder if "nmaps" is calculated
> correctly, since that determines the array size. Can I verify that somehow ?
> 

Actually that is was not correct. "maps[]->l_name" does not contain any
"libjvm.so" at all, but the resulting rpo[] does contain several
"libjvm.so" entries.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 11:46                     ` Jacob Kroon
@ 2022-02-07 11:55                       ` Florian Weimer
  2022-02-07 12:15                         ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-07 11:55 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

> On 2/7/22 09:36, Jacob Kroon wrote:
>> Hi Florian,
>> 
>> On 2/4/22 18:15, Florian Weimer wrote:
>>> I suspect we are writing beyond the start of the array passed to
>>> _dl_sort_maps.
>>>
>> 
>> It looks like it is writing passed the beginning of the rpo[] array in
>> _dl_sort_maps_dfs(). The output below is right before the crash happens
>> (stepping one instruction garbles the backtrace):
>> 
>>> (gdb) bt
>>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>>> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>>> #2  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fadb70, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:155
>>> #3  0x00007ffff7fd89cd in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>>> #5  _dl_sort_maps (maps=maps@entry=0x7ffff7953de0, nmaps=nmaps@entry=15, skip=<optimized out>, for_fini=for_fini@entry=false) at dl-sort-maps.c:299
>>> #6  0x00007ffff7fcaf0f in _dl_map_object_deps (map=<optimized out>, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=<optimized out>, 
>>>     open_mode=<optimized out>) at dl-deps.c:616
>>> #7  0x00007ffff7fe6970 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1968
>>> #8  0x00007ffff7fe2c7c in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fe4bb0 <dl_main>) at ../elf/dl-sysdep.c:264
>>> #9  0x00007ffff7fe4678 in _dl_start_final (arg=0x7fffffffdec0) at rtld.c:493
>>> #10 _dl_start (arg=0x7fffffffdec0) at rtld.c:587
>>> #11 0x00007ffff7fe36a8 in _start ()
>>> (gdb) f 0
>>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>>> 175       **rpo = map;
>>> (gdb) print *rpo
>>> $62 = (struct link_map **) 0x7fffffffd238
>>> (gdb) f 4
>>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>>> 233           dfs_traversal (&rpo_head, maps[i], do_reldeps_ref);
>>> (gdb) print &rpo[-1]
>>> $63 = (struct link_map **) 0x7fffffffd238
>> 
>> I inspected the "maps" vector and it containes *multiple* entries to
>> "libjvm.so", is that allowed ? I wonder if "nmaps" is calculated
>> correctly, since that determines the array size. Can I verify that somehow ?

Curiously I was wondering about duplicates as well as I couldn't sleep.

Do you use DT_FILTER or anything unusual?  What about LD_PRELOAD?

> Actually that is was not correct. "maps[]->l_name" does not contain any
> "libjvm.so" at all, but the resulting rpo[] does contain several
> "libjvm.so" entries.

What I find really confusing is that this is not the result of a dlopen
call.  I definitely would expect that the maps array contains *all*
objects that are being loaded.  Clearly this is not the case here.
Somehow certain objects are missing, and then they get written into the
rpo array.

Please try to find libjvm.so among the l_initfini arrays of the objects.
It must be present somewhere.  I assume it's also on the main list,
which starts off at _rtld_global._dl_ns[0]._ns_loaded.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 11:55                       ` Florian Weimer
@ 2022-02-07 12:15                         ` Jacob Kroon
  2022-02-07 12:27                           ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 12:15 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/7/22 12:55, Florian Weimer wrote:
> * Jacob Kroon:
> 
>> On 2/7/22 09:36, Jacob Kroon wrote:
>>> Hi Florian,
>>>
>>> On 2/4/22 18:15, Florian Weimer wrote:
>>>> I suspect we are writing beyond the start of the array passed to
>>>> _dl_sort_maps.
>>>>
>>>
>>> It looks like it is writing passed the beginning of the rpo[] array in
>>> _dl_sort_maps_dfs(). The output below is right before the crash happens
>>> (stepping one instruction garbles the backtrace):
>>>
>>>> (gdb) bt
>>>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>>>> #1  0x00007ffff7fd85d4 in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>>>> #2  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fadb70, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:155
>>>> #3  0x00007ffff7fd89cd in dfs_traversal (do_reldeps=0x0, map=<optimized out>, rpo=0x7fffffffd320) at dl-sort-maps.c:143
>>>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>>>> #5  _dl_sort_maps (maps=maps@entry=0x7ffff7953de0, nmaps=nmaps@entry=15, skip=<optimized out>, for_fini=for_fini@entry=false) at dl-sort-maps.c:299
>>>> #6  0x00007ffff7fcaf0f in _dl_map_object_deps (map=<optimized out>, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=<optimized out>, 
>>>>     open_mode=<optimized out>) at dl-deps.c:616
>>>> #7  0x00007ffff7fe6970 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1968
>>>> #8  0x00007ffff7fe2c7c in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fe4bb0 <dl_main>) at ../elf/dl-sysdep.c:264
>>>> #9  0x00007ffff7fe4678 in _dl_start_final (arg=0x7fffffffdec0) at rtld.c:493
>>>> #10 _dl_start (arg=0x7fffffffdec0) at rtld.c:587
>>>> #11 0x00007ffff7fe36a8 in _start ()
>>>> (gdb) f 0
>>>> #0  dfs_traversal (rpo=rpo@entry=0x7fffffffd320, map=0x7ffff7fad590, do_reldeps=do_reldeps@entry=0x0) at dl-sort-maps.c:175
>>>> 175       **rpo = map;
>>>> (gdb) print *rpo
>>>> $62 = (struct link_map **) 0x7fffffffd238
>>>> (gdb) f 4
>>>> #4  _dl_sort_maps_dfs (skip=<optimized out>, for_fini=<optimized out>, nmaps=15, maps=0x7ffff7953de0) at dl-sort-maps.c:233
>>>> 233           dfs_traversal (&rpo_head, maps[i], do_reldeps_ref);
>>>> (gdb) print &rpo[-1]
>>>> $63 = (struct link_map **) 0x7fffffffd238
>>>
>>> I inspected the "maps" vector and it containes *multiple* entries to
>>> "libjvm.so", is that allowed ? I wonder if "nmaps" is calculated
>>> correctly, since that determines the array size. Can I verify that somehow ?
> 
> Curiously I was wondering about duplicates as well as I couldn't sleep.
> 
> Do you use DT_FILTER or anything unusual?  What about LD_PRELOAD?
> 

Not sure what "DT_FILTER" is, but it is not something that shows up if I
run "readelf -a" on the library in question. I don't have LD_PRELOAD set
in my environment.

>> Actually that is was not correct. "maps[]->l_name" does not contain any
>> "libjvm.so" at all, but the resulting rpo[] does contain several
>> "libjvm.so" entries.
> 
> What I find really confusing is that this is not the result of a dlopen
> call.  I definitely would expect that the maps array contains *all*
> objects that are being loaded.  Clearly this is not the case here.
> Somehow certain objects are missing, and then they get written into the
> rpo array.
> 
> Please try to find libjvm.so among the l_initfini arrays of the objects.

I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
arrays, yes.

> It must be present somewhere.  I assume it's also on the main list,
> which starts off at _rtld_global._dl_ns[0]._ns_loaded.

Hmm how do I iterate over that data structure ?

See below:
> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
> $186 = 0x7ffff7ff1d97 ""
> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
> $187 = 0x0
> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
> $188 = 0x0
> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
> $189 = 0x7ffff7ff1d97 ""
> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
> Cannot access memory at address 0x8
> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
> Cannot access memory at address 0x8

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 12:15                         ` Jacob Kroon
@ 2022-02-07 12:27                           ` Florian Weimer
  2022-02-07 12:32                             ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-07 12:27 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: Jacob Kroon via Gdb

* Jacob Kroon:

>> What I find really confusing is that this is not the result of a dlopen
>> call.  I definitely would expect that the maps array contains *all*
>> objects that are being loaded.  Clearly this is not the case here.
>> Somehow certain objects are missing, and then they get written into the
>> rpo array.
>> 
>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>
> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
> arrays, yes.

Okay, and of course there is an assumption that those make it to the
maps.  No wonder we run off the array.

>> It must be present somewhere.  I assume it's also on the main list,
>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>
> Hmm how do I iterate over that data structure ?
>
> See below:
>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>> $186 = 0x7ffff7ff1d97 ""
>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>> $187 = 0x0
>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>> $188 = 0x0
>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>> $189 = 0x7ffff7ff1d97 ""
>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>> Cannot access memory at address 0x8
>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>> Cannot access memory at address 0x8

It's a list chained by l_prev/l_next.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 12:27                           ` Florian Weimer
@ 2022-02-07 12:32                             ` Jacob Kroon
  2022-02-07 13:39                               ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 12:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jacob Kroon via Gdb

On 2/7/22 13:27, Florian Weimer wrote:
> * Jacob Kroon:
> 
>>> What I find really confusing is that this is not the result of a dlopen
>>> call.  I definitely would expect that the maps array contains *all*
>>> objects that are being loaded.  Clearly this is not the case here.
>>> Somehow certain objects are missing, and then they get written into the
>>> rpo array.
>>>
>>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>>
>> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
>> arrays, yes.
> 
> Okay, and of course there is an assumption that those make it to the
> maps.  No wonder we run off the array.
> 
>>> It must be present somewhere.  I assume it's also on the main list,
>>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>>
>> Hmm how do I iterate over that data structure ?
>>
>> See below:
>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>>> $186 = 0x7ffff7ff1d97 ""
>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>>> $187 = 0x0
>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>>> $188 = 0x0
>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>>> $189 = 0x7ffff7ff1d97 ""
>>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>>> Cannot access memory at address 0x8
>>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>>> Cannot access memory at address 0x8
> 
> It's a list chained by l_prev/l_next.
> 

Ok, yes "libjvm.so" is there, in multiple entries.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 12:32                             ` Jacob Kroon
@ 2022-02-07 13:39                               ` Jacob Kroon
  2022-02-07 13:45                                 ` Jacob Kroon
  2022-02-07 14:07                                 ` Florian Weimer
  0 siblings, 2 replies; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 13:39 UTC (permalink / raw)
  To: Florian Weimer, cltang, adhemerval.zanella; +Cc: Jacob Kroon via Gdb

On 2/7/22 13:32, Jacob Kroon wrote:
> On 2/7/22 13:27, Florian Weimer wrote:
>> * Jacob Kroon:
>>
>>>> What I find really confusing is that this is not the result of a dlopen
>>>> call.  I definitely would expect that the maps array contains *all*
>>>> objects that are being loaded.  Clearly this is not the case here.
>>>> Somehow certain objects are missing, and then they get written into the
>>>> rpo array.
>>>>
>>>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>>>
>>> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
>>> arrays, yes.
>>
>> Okay, and of course there is an assumption that those make it to the
>> maps.  No wonder we run off the array.
>>
>>>> It must be present somewhere.  I assume it's also on the main list,
>>>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>>>
>>> Hmm how do I iterate over that data structure ?
>>>
>>> See below:
>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>>>> $186 = 0x7ffff7ff1d97 ""
>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>>>> $187 = 0x0
>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>>>> $188 = 0x0
>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>>>> $189 = 0x7ffff7ff1d97 ""
>>>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>>>> Cannot access memory at address 0x8
>>>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>>>> Cannot access memory at address 0x8
>>
>> It's a list chained by l_prev/l_next.
>>
> 
> Ok, yes "libjvm.so" is there, in multiple entries.
> 

I managed to build glibc master, and yes it also crashes. Reverting the
suspicious commit:

commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
Author: Chung-Lin Tang <cltang@codesourcery.com>
Date:   Thu Oct 21 21:41:22 2021 +0800

    elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)

fixes the crash. Adding a couple of more people.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 13:39                               ` Jacob Kroon
@ 2022-02-07 13:45                                 ` Jacob Kroon
  2022-02-07 13:53                                   ` Adhemerval Zanella
  2022-02-07 14:07                                 ` Florian Weimer
  1 sibling, 1 reply; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 13:45 UTC (permalink / raw)
  To: Florian Weimer, cltang, adhemerval.zanella; +Cc: Jacob Kroon via Gdb

On 2/7/22 14:39, Jacob Kroon wrote:
> On 2/7/22 13:32, Jacob Kroon wrote:
>> On 2/7/22 13:27, Florian Weimer wrote:
>>> * Jacob Kroon:
>>>
>>>>> What I find really confusing is that this is not the result of a dlopen
>>>>> call.  I definitely would expect that the maps array contains *all*
>>>>> objects that are being loaded.  Clearly this is not the case here.
>>>>> Somehow certain objects are missing, and then they get written into the
>>>>> rpo array.
>>>>>
>>>>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>>>>
>>>> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
>>>> arrays, yes.
>>>
>>> Okay, and of course there is an assumption that those make it to the
>>> maps.  No wonder we run off the array.
>>>
>>>>> It must be present somewhere.  I assume it's also on the main list,
>>>>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>>>>
>>>> Hmm how do I iterate over that data structure ?
>>>>
>>>> See below:
>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>>>>> $186 = 0x7ffff7ff1d97 ""
>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>>>>> $187 = 0x0
>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>>>>> $188 = 0x0
>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>>>>> $189 = 0x7ffff7ff1d97 ""
>>>>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>>>>> Cannot access memory at address 0x8
>>>>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>>>>> Cannot access memory at address 0x8
>>>
>>> It's a list chained by l_prev/l_next.
>>>
>>
>> Ok, yes "libjvm.so" is there, in multiple entries.
>>
> 
> I managed to build glibc master, and yes it also crashes. Reverting the
> suspicious commit:
> 
> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
> Author: Chung-Lin Tang <cltang@codesourcery.com>
> Date:   Thu Oct 21 21:41:22 2021 +0800
> 
>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
> 
> fixes the crash. Adding a couple of more people.
> 

And yes, using master (or host) glibc and running:

$ GLIBC_TUNABLES="glibc.rtld.dynamic_sort=1" ldd mylib.so

also works without crashing.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 13:45                                 ` Jacob Kroon
@ 2022-02-07 13:53                                   ` Adhemerval Zanella
  2022-02-07 13:54                                     ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella @ 2022-02-07 13:53 UTC (permalink / raw)
  To: Jacob Kroon, Florian Weimer, cltang; +Cc: Jacob Kroon via Gdb



On 07/02/2022 10:45, Jacob Kroon wrote:
> On 2/7/22 14:39, Jacob Kroon wrote:
>> On 2/7/22 13:32, Jacob Kroon wrote:
>>> On 2/7/22 13:27, Florian Weimer wrote:
>>>> * Jacob Kroon:
>>>>
>>>>>> What I find really confusing is that this is not the result of a dlopen
>>>>>> call.  I definitely would expect that the maps array contains *all*
>>>>>> objects that are being loaded.  Clearly this is not the case here.
>>>>>> Somehow certain objects are missing, and then they get written into the
>>>>>> rpo array.
>>>>>>
>>>>>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>>>>>
>>>>> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
>>>>> arrays, yes.
>>>>
>>>> Okay, and of course there is an assumption that those make it to the
>>>> maps.  No wonder we run off the array.
>>>>
>>>>>> It must be present somewhere.  I assume it's also on the main list,
>>>>>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>>>>>
>>>>> Hmm how do I iterate over that data structure ?
>>>>>
>>>>> See below:
>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>>>>>> $186 = 0x7ffff7ff1d97 ""
>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>>>>>> $187 = 0x0
>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>>>>>> $188 = 0x0
>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>>>>>> $189 = 0x7ffff7ff1d97 ""
>>>>>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>>>>>> Cannot access memory at address 0x8
>>>>>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>>>>>> Cannot access memory at address 0x8
>>>>
>>>> It's a list chained by l_prev/l_next.
>>>>
>>>
>>> Ok, yes "libjvm.so" is there, in multiple entries.
>>>
>>
>> I managed to build glibc master, and yes it also crashes. Reverting the
>> suspicious commit:
>>
>> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
>> Author: Chung-Lin Tang <cltang@codesourcery.com>
>> Date:   Thu Oct 21 21:41:22 2021 +0800
>>
>>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
>>
>> fixes the crash. Adding a couple of more people.
>>
> 
> And yes, using master (or host) glibc and running:
> 
> $ GLIBC_TUNABLES="glibc.rtld.dynamic_sort=1" ldd mylib.so
> 
> also works without crashing.

It is really hard to understand the issue you are seeing without the
context, since I was cced with a cropped thread. Could you send the
the full thread or the original issue?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 13:53                                   ` Adhemerval Zanella
@ 2022-02-07 13:54                                     ` Jacob Kroon
  0 siblings, 0 replies; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 13:54 UTC (permalink / raw)
  To: Adhemerval Zanella, Florian Weimer, cltang; +Cc: Jacob Kroon via Gdb

On 2/7/22 14:53, Adhemerval Zanella wrote:
> 
> 
> On 07/02/2022 10:45, Jacob Kroon wrote:
>> On 2/7/22 14:39, Jacob Kroon wrote:
>>> On 2/7/22 13:32, Jacob Kroon wrote:
>>>> On 2/7/22 13:27, Florian Weimer wrote:
>>>>> * Jacob Kroon:
>>>>>
>>>>>>> What I find really confusing is that this is not the result of a dlopen
>>>>>>> call.  I definitely would expect that the maps array contains *all*
>>>>>>> objects that are being loaded.  Clearly this is not the case here.
>>>>>>> Somehow certain objects are missing, and then they get written into the
>>>>>>> rpo array.
>>>>>>>
>>>>>>> Please try to find libjvm.so among the l_initfini arrays of the objects.
>>>>>>
>>>>>> I do find libjvm.so in a couple of the maps[]->l_initfini[]->l_name
>>>>>> arrays, yes.
>>>>>
>>>>> Okay, and of course there is an assumption that those make it to the
>>>>> maps.  No wonder we run off the array.
>>>>>
>>>>>>> It must be present somewhere.  I assume it's also on the main list,
>>>>>>> which starts off at _rtld_global._dl_ns[0]._ns_loaded.
>>>>>>
>>>>>> Hmm how do I iterate over that data structure ?
>>>>>>
>>>>>> See below:
>>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[0]->l_name
>>>>>>> $186 = 0x7ffff7ff1d97 ""
>>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[1]->l_name
>>>>>>> $187 = 0x0
>>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded[2]->l_name
>>>>>>> $188 = 0x0
>>>>>>> (gdb) print _rtld_global._dl_ns[0]._ns_loaded->l_name
>>>>>>> $189 = 0x7ffff7ff1d97 ""
>>>>>>> (gdb) print _rtld_global._dl_ns[1]._ns_loaded->l_name
>>>>>>> Cannot access memory at address 0x8
>>>>>>> (gdb) print _rtld_global._dl_ns[2]._ns_loaded->l_name
>>>>>>> Cannot access memory at address 0x8
>>>>>
>>>>> It's a list chained by l_prev/l_next.
>>>>>
>>>>
>>>> Ok, yes "libjvm.so" is there, in multiple entries.
>>>>
>>>
>>> I managed to build glibc master, and yes it also crashes. Reverting the
>>> suspicious commit:
>>>
>>> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
>>> Author: Chung-Lin Tang <cltang@codesourcery.com>
>>> Date:   Thu Oct 21 21:41:22 2021 +0800
>>>
>>>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
>>>
>>> fixes the crash. Adding a couple of more people.
>>>
>>
>> And yes, using master (or host) glibc and running:
>>
>> $ GLIBC_TUNABLES="glibc.rtld.dynamic_sort=1" ldd mylib.so
>>
>> also works without crashing.
> 
> It is really hard to understand the issue you are seeing without the
> context, since I was cced with a cropped thread. Could you send the
> the full thread or the original issue?


The whole thread is archived here:
https://sourceware.org/pipermail/gdb/2022-February/049884.html

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 13:39                               ` Jacob Kroon
  2022-02-07 13:45                                 ` Jacob Kroon
@ 2022-02-07 14:07                                 ` Florian Weimer
  2022-02-07 16:28                                   ` Florian Weimer
  1 sibling, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-07 14:07 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: cltang, adhemerval.zanella, Jacob Kroon via Gdb

* Jacob Kroon:

> I managed to build glibc master, and yes it also crashes. Reverting the
> suspicious commit:
>
> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
> Author: Chung-Lin Tang <cltang@codesourcery.com>
> Date:   Thu Oct 21 21:41:22 2021 +0800
>
>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
>
> fixes the crash. Adding a couple of more people.

Sorry, that is completely expected because this is where the faulty code
was added.

I plan to stare at _dl_map_object_deps a bit, to figure out where the
discrepancy between l_initfini for the main program and the loaded
objects comes from.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 14:07                                 ` Florian Weimer
@ 2022-02-07 16:28                                   ` Florian Weimer
  2022-02-07 17:04                                     ` Jacob Kroon
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2022-02-07 16:28 UTC (permalink / raw)
  To: Jacob Kroon; +Cc: cltang, adhemerval.zanella, Jacob Kroon via Gdb

* Florian Weimer:

> * Jacob Kroon:
>
>> I managed to build glibc master, and yes it also crashes. Reverting the
>> suspicious commit:
>>
>> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
>> Author: Chung-Lin Tang <cltang@codesourcery.com>
>> Date:   Thu Oct 21 21:41:22 2021 +0800
>>
>>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
>>
>> fixes the crash. Adding a couple of more people.
>
> Sorry, that is completely expected because this is where the faulty code
> was added.
>
> I plan to stare at _dl_map_object_deps a bit, to figure out where the
> discrepancy between l_initfini for the main program and the loaded
> objects comes from.

I can see that we do not add l_fake objects (that failed to load) to the
main search list (and nlist is not incremented).  But we do not remove
them from the individual list of dependencies, leading to this
discrepancy.

This would be consistent with this bug report:

  Dynamic loader DFS algorithm segfaults on missing libraries
  <https://sourceware.org/bugzilla/show_bug.cgi?id=28868>

If you run with GLIBC_TUNABLES=glibc.rtld.dynamic_sort=0, do you see
“not found” lines in the ldd output?

If yes, do these surprising libjvm.so objects have l_fake set in their
link map?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Debugging ld.so in gdb
  2022-02-07 16:28                                   ` Florian Weimer
@ 2022-02-07 17:04                                     ` Jacob Kroon
  0 siblings, 0 replies; 24+ messages in thread
From: Jacob Kroon @ 2022-02-07 17:04 UTC (permalink / raw)
  To: Florian Weimer; +Cc: cltang, adhemerval.zanella, Jacob Kroon via Gdb

On 2/7/22 17:28, Florian Weimer wrote:
> * Florian Weimer:
> 
>> * Jacob Kroon:
>>
>>> I managed to build glibc master, and yes it also crashes. Reverting the
>>> suspicious commit:
>>>
>>> commit 15a0c5730d1d5aeb95f50c9ec7470640084feae8
>>> Author: Chung-Lin Tang <cltang@codesourcery.com>
>>> Date:   Thu Oct 21 21:41:22 2021 +0800
>>>
>>>     elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
>>>
>>> fixes the crash. Adding a couple of more people.
>>
>> Sorry, that is completely expected because this is where the faulty code
>> was added.
>>
>> I plan to stare at _dl_map_object_deps a bit, to figure out where the
>> discrepancy between l_initfini for the main program and the loaded
>> objects comes from.
> 
> I can see that we do not add l_fake objects (that failed to load) to the
> main search list (and nlist is not incremented).  But we do not remove
> them from the individual list of dependencies, leading to this
> discrepancy.
> 
> This would be consistent with this bug report:
> 
>   Dynamic loader DFS algorithm segfaults on missing libraries
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=28868>
> 
> If you run with GLIBC_TUNABLES=glibc.rtld.dynamic_sort=0, do you see
> “not found” lines in the ldd output?
> 

I assume you meant "glibc.rtld.dynamic_sort=1", and not
"glibc.rtld.dynamic_sort=0", since that also crashes. With "1" I see no
crash, and three entries of "libjvm.so => not found".

> If yes, do these surprising libjvm.so objects have l_fake set in their
> link map?
> 

Yes, looking at the "libjvm.so" entries in rpo[] right before the crash,
I see l_faked == 1 for all three, and *only* for those three.

Jacob

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-02-07 17:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04 13:45 Debugging ld.so in gdb Jacob Kroon
2022-02-04 13:58 ` Florian Weimer
2022-02-04 14:09   ` Jacob Kroon
2022-02-04 14:22     ` Florian Weimer
2022-02-04 14:27       ` Jacob Kroon
2022-02-04 16:09         ` Florian Weimer
2022-02-04 16:53           ` Jacob Kroon
2022-02-04 17:04             ` Florian Weimer
2022-02-04 17:11               ` Jacob Kroon
2022-02-04 17:15                 ` Florian Weimer
2022-02-07  8:36                   ` Jacob Kroon
2022-02-07 11:46                     ` Jacob Kroon
2022-02-07 11:55                       ` Florian Weimer
2022-02-07 12:15                         ` Jacob Kroon
2022-02-07 12:27                           ` Florian Weimer
2022-02-07 12:32                             ` Jacob Kroon
2022-02-07 13:39                               ` Jacob Kroon
2022-02-07 13:45                                 ` Jacob Kroon
2022-02-07 13:53                                   ` Adhemerval Zanella
2022-02-07 13:54                                     ` Jacob Kroon
2022-02-07 14:07                                 ` Florian Weimer
2022-02-07 16:28                                   ` Florian Weimer
2022-02-07 17:04                                     ` Jacob Kroon
2022-02-04 14:45       ` Jacob Kroon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).