public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Luis Machado <luis.machado@arm.com>
To: John Baldwin <jhb@FreeBSD.org>, gdb-patches@sourceware.org
Subject: Re: [PATCH] Update auxv cache when there is no auxv cached data
Date: Tue, 2 Aug 2022 16:05:58 +0100	[thread overview]
Message-ID: <f10c7cc8-e9e3-3573-7799-a326eac5e5ef@arm.com> (raw)
In-Reply-To: <ff02411c-c39e-5a25-4205-24eb4dca1643@FreeBSD.org>

On 7/25/22 20:13, John Baldwin wrote:
> On 7/25/22 11:03 AM, Luis Machado wrote:
>> On 7/25/22 17:05, John Baldwin wrote:
>>> On 7/19/22 7:45 AM, Luis Machado via Gdb-patches wrote:
>>>> While adding support for MTE corefiles and running the MTE corefile tests,
>>>> I noticed a strange situation where loading the symbol file + core file
>>>> through the command line has a different behavior compared to firing up
>>>> GDB, loading the symbol file with the "file" command and then loading the
>>>> core file with the "core" command.
>>>>
>>>> I tracked this down to gdb/auxv.c:get_auxv_inferior_data returning a valid
>>>> info struct, but with no auxv data.
>>>>
>>>> We've been doing the auxv caching for a while now, but sometime between
>>>> enabling auxv data caching and now, we turned the auxv data into an optional
>>>> value.
>>>
>>> The commit to use gdb::optional<> was this one in 2018:
>>>
>>> commit 9018be22e022e6db2ba07c4e407c7244022bc69a
>>> Author: Simon Marchi <simon.marchi@ericsson.com>
>>> Date:   Sat Apr 7 13:19:12 2018 -0400
>>>
>>>       Make target_read_alloc & al return vectors
>>>       This patch started by changing target_read_alloc_1 to return a
>>>       byte_vector, to avoid manual memory management (in target_read_alloc_1
>>>       and in the callers).  To communicate failures to the callers, it
>>>       actually returns a gdb::optional<gdb::byte_vector>.
>>>       Adjusting target_read_stralloc was a bit more tricky, since it wants to
>>>       return a buffer of char, and not gdb_byte.  Since you can't just cast a
>>>       gdb::byte_vector into a gdb::def_vector<char>, I made
>>>       target_read_alloc_1 templated, so both versions (that return vectors of
>>>       gdb_byte and char) are generated.  Since target_read_stralloc now
>>>       returns a gdb::char_vector instead of a gdb::unique_xmalloc_ptr<char>, a
>>>       few callers need to be adjusted.
>>>
>>> But looking at it's diff, it was always caching negative results even
>>> before this change.  This is the relevant hunk of that commit:
>>
>> True. I think my analysis went bad when I mixed up data before (a pointer) the optional conversion and after (an optional vector).
>>
>> So we used to access the contents of auxv through a data pointer, but now we access it through data->data ().
>>>
>>> @@ -358,9 +356,8 @@ get_auxv_inferior_data (struct target_ops *ops)
>>>      info = (struct auxv_info *) inferior_data (inf, auxv_inferior_data);
>>>      if (info == NULL)
>>>        {
>>> -      info = XCNEW (struct auxv_info);
>>> -      info->length = target_read_alloc (ops, TARGET_OBJECT_AUXV,
>>> -                    NULL, &info->data);
>>> +      info = new auxv_info;
>>> +      info->data = target_read_alloc (ops, TARGET_OBJECT_AUXV, NULL);
>>>          set_inferior_data (inf, auxv_inferior_data, info);
>>>        }
>>>
>>> So even before it seems that we would have cached a "negative" result.
>>>
>>> I believe your change will mean that we will keep asking over and over
>>> for the auxv data if a target doesn't support it and that it will only
>>> support positive caching.  I wonder if a recent change means that in
>>> your test case something is trying to fetch the AUXV vector before the
>>> target can read it, e.g. trying to fetch AT_HWCAP* when determining the
>>> target description for a core file or the like?  It seems like for
>>> the core target in corelow.c it should be fine to fetch AUXV data in
>>> the gdbarch_core_read_description hook though.  Did you try setting a
>>> breakpoint for when this function was called to see when it is being
>>> cached?
>>
>> Yes, we do try to fetch data from auxv before it exists. I think it's always been this way for this particular case.
>>
>>>
>>> I wonder if the issue is that parsing the symbol file without a core
>>> is trying to fetch hwcap actually.  Presumably a breakpoint would let
>>> you see that case?  Maybe the fetch of hwcap needs to be guarded by
>>> something like "target_has_execution" or the like?
>>>
>>
>> No, it is trying to fetch AT_ENTRY to figure out the relocation that should be applied to the executable (if any).
>>
>> When we finally load the (native) corefile, then we have access to the AUXV data through the corefile data.
>>
>> This doesn't reproduce with a gcore corefile due to the fact that we now include the target description GDB was using when
>> the corefile was generated.
>>
>> Here's the backtrace:
>>
>> -- 
>>
>> #0  get_auxv_inferior_data (ops=0xaaaaad3b0760 <exec_ops>) at ../../../repos/binutils-gdb/gdb/auxv.c:359
>> #1  0x0000aaaaab28cdc8 in target_auxv_search (ops=0xaaaaad3b0760 <exec_ops>, match=0x9, valp=0xffffffffe988)
>>       at ../../../repos/binutils-gdb/gdb/auxv.c:381
>> #2  0x0000aaaaab920110 in svr4_exec_displacement (displacementp=0xffffffffebc0) at ../../../repos/binutils-gdb/gdb/solib-svr4.c:2482
>> #3  0x0000aaaaab920e4c in svr4_relocate_main_executable () at ../../../repos/binutils-gdb/gdb/solib-svr4.c:2878
>> #4  0x0000aaaaab921008 in svr4_solib_create_inferior_hook (from_tty=1) at ../../../repos/binutils-gdb/gdb/solib-svr4.c:2933
>> #5  0x0000aaaaab9288a0 in solib_create_inferior_hook (from_tty=1) at ../../../repos/binutils-gdb/gdb/solib.c:1285
>> #6  0x0000aaaaab9747cc in symbol_file_command (args=0xfffffffff6c6 "~/aarch64-mte-gcore", from_tty=1)
>>       at ../../../repos/binutils-gdb/gdb/symfile.c:1655
>> #7  0x0000aaaaab500fc4 in file_command (arg=0xfffffffff6c6 "~/aarch64-mte-gcore", from_tty=1) at ../../../repos/binutils-gdb/gdb/exec.c:555
>> #8  0x0000aaaaab3519f0 in do_simple_func (args=0xfffffffff6c6 "~/aarch64-mte-gcore", from_tty=1, c=0xaaaaad786a60)
>>       at ../../../repos/binutils-gdb/gdb/cli/cli-decode.c:95
>> #9  0x0000aaaaab3572c8 in cmd_func (cmd=0xaaaaad786a60, args=0xfffffffff6c6 "~/aarch64-mte-gcore", from_tty=1)
>>       at ../../../repos/binutils-gdb/gdb/cli/cli-decode.c:2516
>> #10 0x0000aaaaab9f0ed4 in execute_command (p=0xfffffffff6d8 "e", from_tty=1) at ../../../repos/binutils-gdb/gdb/top.c:699
>> #11 0x0000aaaaab68c75c in catch_command_errors (command=0xaaaaab9f0984 <execute_command(char const*, int)>,
>>       arg=0xfffffffff6c1 "file ~/aarch64-mte-gcore", from_tty=1, do_bp_actions=true) at ../../../repos/binutils-gdb/gdb/main.c:513
>> #12 0x0000aaaaab68c9ac in execute_cmdargs (cmdarg_vec=0xfffffffff080, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0xffffffffefd4)
>>       at ../../../repos/binutils-gdb/gdb/main.c:608
>> #13 0x0000aaaaab68df18 in captured_main_1 (context=0xfffffffff280) at ../../../repos/binutils-gdb/gdb/main.c:1298
>> #14 0x0000aaaaab68e0c0 in captured_main (data=0xfffffffff280) at ../../../repos/binutils-gdb/gdb/main.c:1319
>> #15 0x0000aaaaab68e11c in gdb_main (During symbol reading: const value length mismatch for 'sigall_set', got 128, expected 0
>> args=0xfffffffff280) at ../../../repos/binutils-gdb/gdb/main.c:1344
>> #16 0x0000aaaaab1655ec in main (argc=5, argv=0xfffffffff418) at ../../../repos/binutils-gdb/gdb/gdb.c:32
>>
>> -- 
>>
>> I think this was a latent bug and I just happened to run into it. It doesn't make sense to cache the output of the auxv from a target that
>> doesn't have auxv data. But we don't want to keep requesting things over and over again.
> 
> Hmmm.  I think in some ways what we would like is something like:
> 
> if (!target_has_all_memory ())
>     return;
> 
> before calling svr4_relocate_main_executable.  However, we don't yet have
> a target_has_all_memory() wrapper function.  Remote, process_stratum (native),
> and core file targets all have has_all_memory() overrides, but the exec
> target does not.  We can't use target_has_memory() because the exec target
> returns true for that.
> 
> It doesn't really make sense to relocate the main executable when we don't
> have an address space.  Maybe that is the better question to ask if it's
> askable?  That is, does the current inferior have a real address space, or
> is it just a plain symbol file via the exec target without a real address
> space yet.  Looking around though, there doesn't seem to be a good way to
> ask that question.

I did some more investigation on this one, and there are a few more details here.

First, we try to fetch auxv for the exec target, which either shouldn't happen or should return something that
indicates we don't have an auxv entry.

Second, the initial inferior pid is 0, a dummy value.

> 
> Hmmm, I've dug a bit more though at this from a different angle in that
> attaching the core file should be flushing the auxv cache and I'm curious
> why it isn't.  The 'inferior_appeared' observer event (which clears the auxv
> cache) should be firing when a core file is attached via 'inferior_attached'
> from add_to_thread_list in corelow.c.
> 
> I wonder if the combined case on the command line doesn't trigger
> inferior_appeared for some reason when opening the core file?  If so, fixing
> that might fix a few other things as well.

Turns out we do call inferior_appeared and it invalidates the auxv cache, but there is something else going on.

> 
> In terms of the original patch, I find it a bit odd to replace the existing
> info structure vs just replacing the data member and/or if we want to disable
> negative caching we should perhaps avoid emplacing the object if the
> fetch of the auxv data from the target fails.
> 

Here's the sequence of events for each invocation mode:

1 - Loading core file only (OK)

Invocations:
gdb -c exec.core
gdb -ex "core exec.core

In this mode, GDB will load the core file and eventually invoke the constructor in core_target_open:

   core_target *target = new core_target ();

This constructor will attempt to retrieve the target description, and it won't find a XML note in the core file because this
was generated by the Linux Kernel. So eventually it will invoke aarch64_linux_core_read_description, which *will* try to
retrieve the hwcap bits.

In this case, it works because we don't have any auxv entries cached for inferior with pid 0, so we proceed to fetch the corefile's
auxv section data.

Eventually we will figure out the real pid for the core file inferior and we will fetch the target description again and
things just work.


2 - Loading exec file and core file (OK)

Invocations:
gdb exec -c exec.core
gdb -ex "file exec" -c exec.core
gdb -ex "core exec.core" -ex "file exec"

Loading the exec file through the command line (not using the file command) doesn't force the svr4 relocation
functions to be called, and thus doesn't attempt to fetch the auxv (none available) and cache it (empty).

When using the file command to load the exec file, we eventually attempt to fetch the auxv (none available) and
cache it (empty).

With that in mind, these invocations will always cause the corefile handling routines to be the first ones to attempt
to fetch the auxv data, so this will in fact cache valid data and things just work normally.


3 - Loading exec file and core file (Broken)

Invocation:
gdb -ex "file exec" -ex "core exec.core"

Having in mind the explanation from (2), this invocation will first attempt to fetch auxv data for the exec target and will
cache empty data, because the exec target doesn't have auxv entries. This will be cached for inferior pid 0.

Afterwards, we process the corefile and during the construction of the core_target, the aarch64 code will attempt to
fetch the auxv data, but at this point we still have inferior pid 0, since we didn't read the registers yet.

Given we cached the empty auxv data from the exec target earlier, we will get that when we attempt to fetch the auxv data
for the core file, which is incorrect.

This will throw off the target feature detection for aarch64, and so we won't register the required hooks (in particular report_signal_info).

Loading the corefile proceeds as usual and we will eventually find the pid for the corefile inferior, which causes the cached auxv entry to be
invalidated and we fetch the auxv again, but with a valid inferior pid.

What actually goes wrong now is the fact that core_gdbarch was not calculated correctly due to the auxv situation. But the gdbarch (not core_gdbarch) is
correct though.

There are some odd things that I noticed:

- The use of pid 0 for both the exec and the core file at the same time.
- The use of core_gdbarch as opposed to using the regular gdbarch. It feels like we have a couple gdbarch's when we should have only one.
- The handling of non-existent / empty auxv is not great. Should we cache an empty auxv? Should we cache a non-existent auxv value?

An easy fix would be to not use hwcap bits for detecting corefile features, instead relying on dumped register sections. But we should be able to use
information that is available through the auxv.

  reply	other threads:[~2022-08-02 15:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19 14:45 Luis Machado
2022-07-25  9:42 ` [PING][PATCH] " Luis Machado
2022-07-25 16:05 ` [PATCH] " John Baldwin
2022-07-25 18:03   ` Luis Machado
2022-07-25 19:13     ` John Baldwin
2022-08-02 15:05       ` Luis Machado [this message]
2022-08-02 16:05         ` John Baldwin
2022-08-05 15:46 ` [PATCH] Update auxv cache when inferior pid is 0 (no inferior) Luis Machado
2022-08-11  9:05   ` [PING][PATCH] " Luis Machado
2022-08-18 15:48   ` Luis Machado
2022-09-01  9:29   ` Luis Machado
2022-09-07  8:20   ` Luis Machado
2022-09-12 12:48   ` Luis Machado
2022-09-12 13:30   ` [PATCH] " Simon Marchi
2022-09-12 13:53     ` John Baldwin
2022-09-12 13:59       ` Luis Machado
2022-09-20 12:28 ` [PATCH] Invalidate auxv cache before creating a core target Luis Machado
2022-09-20 17:49   ` John Baldwin
2022-10-07 20:44   ` [PATCH] gdb: fix auxv caching Simon Marchi
2022-10-07 21:43     ` John Baldwin
2022-10-09  0:39       ` Simon Marchi
2022-10-10 18:32         ` John Baldwin
2022-10-11 17:52           ` Simon Marchi
2022-10-11 20:31         ` Pedro Alves
2022-10-11 20:34           ` Pedro Alves
2022-10-11 20:42             ` John Baldwin
2022-10-12  1:11               ` Simon Marchi
2022-10-10  9:33     ` Luis Machado
2022-10-11 17:53       ` Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f10c7cc8-e9e3-3573-7799-a326eac5e5ef@arm.com \
    --to=luis.machado@arm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=jhb@FreeBSD.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).