From: Tom de Vries <tdevries@suse.de>
To: Luis Machado <luis.machado@arm.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH v3] Fix missing watchpoint hits/endless loop
Date: Tue, 20 Feb 2024 22:07:12 +0100 [thread overview]
Message-ID: <b02d76f6-74ea-48e5-9d0d-6282ee188f88@suse.de> (raw)
In-Reply-To: <36f17aa1-72a5-4b57-a15d-efce90cf8b2d@arm.com>
On 1/8/24 13:27, Luis Machado wrote:
> Hi Tom,
>
> Thanks for refreshing this patch.
>
> As I hinted on IRC, back then I ended up deciding not to pursue this approach further. Though it does
> address some of the potential failures, it is a bit convoluted and poses a fairly significant
> maintenance burden (we need to explicitly identify instructions, so we will always be behind), including
> the risk of producing false positives.
>
> For SVE/SME, there are also predicated accesses, which means an intruction may read/write a discontiguous range
> of bytes depending on the predicate mask. That is trickier to handle.
>
> The nature of hardware watchpoint detection on aarch64 means the triggering behavior is dependent on a particular
> CPU spec, in a way that is hard for userspace to predict reliably. Hopefully in the future this situation will
> improve for userspace. The kernel folks are aware of it.
>
Hi Luis,
ack, that makes sense.
> Meanwhile, given the above restrictions (and potentially restrictions of other architectures), we could teach gdb
> about imprecise hardware watchpoint triggers. Something that would allow gdb to tell userspace that we know
> a hardware watchpoint has triggered, but we don't know exactly for what range and what happened.
>
> At least this would prevent certain situations where gdb is simply stuck trying to skip over unskippable
> hardware watchpoints, because it can't tell what happened and ended up with a generic SIGTRAP.
>
> Thoughts?
>
I've submitted a two-patch series here (
https://sourceware.org/pipermail/gdb-patches/2024-February/206706.html
), where the second patch does something a bit like this: the aarch64
tdep tells gdb that we know that a hardware watchpoint has triggered,
but doesn't know exactly for what range, and since it's a regular write
watchpoint, gdb handles this fine.
With this patch series, I'm getting rid of the last consistent FAILs I
see on m1 (not counting libcc1 stuff).
Thanks,
- Tom
> On 1/5/24 15:07, Tom de Vries wrote:
>> From: Luis Machado <luis.machado@linaro.org>
>>
>> Updates on v3:
>>
>> - Rebased to current trunk (v2 applied cleanly on gdb-12-branch).
>> - Should also work on fbsd, though I haven't tested it.
>> - Dropped gdbserver part (should be possible, I'm just not sure yet where to
>> move aarch64_stopped_data_address such that it's accessible from gdbserver).
>> - No attempt made to address any review remarks on v2.
>> https://inbox.sourceware.org/gdb-patches/D2AC3C31-5EEE-4BCA-8D75-7CEDFA75F540@arm.com/
>>
>> Updates on v2:
>>
>> - Handle more types of load/store instructions, including a catch all
>> for SVE load/store instructions.
>>
>> --
>>
>> I ran into a situation where a hardware watchpoint hit is not detected
>> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
>>
>> The problem is that hardware watchpoints are not skippable on AArch64, so
>> that makes GDB loop endlessly trying to run past the instruction.
>>
>> The most obvious case where this happens is when the load/store pair
>> instructions access 16 bytes of memory.
>>
>> Suppose we have a stp instruction that will write a couple 64-bit registers
>> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
>>
>> Now suppose a write watchpoint is created to monitor memory address 0x18,
>> which is the start of the second register write. It can have whatever length,
>> but let's assume it has length 8.
>>
>> When we execute that stp instruction, it will trap and the reported stopped
>> data address from the kernel will be 0x10 (the beginning of the memory range
>> accessed by the instruction).
>>
>> The current code won't be able to detect a valid trigger because it assumes an
>> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
>> won't be enough to detect a 16-byte access if the trap address falls outside of
>> the 8-byte alignment window. We need to know how many bytes the instruction
>> will access, but we won't have that data unless we go parsing instructions.
>>
>> Another issue with the current code seems to be that it assumes the accesses
>> will always be 8 bytes in size, since it wants to align the watchpoint address
>> to that particular boundary. This leads to problems when we have unaligned
>> addresses and unaligned watchpoints.
>>
>> For example, suppose we have a str instruction storing 8 bytes to memory
>> address 0xf. Now suppose we have a write watchpoint at address 0x16,
>> monitoring 8 bytes.
>>
>> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
>> 0x10, and so GDB doesn't think this is a watchpoint hit.
>>
>> I believe you can trigger the same problem with smaller memory accesses,
>> except one that accesses a single byte.
>>
>> In order to cover most of the cases correctly, we parse the load/store
>> instructions and detect how many bytes they're accessing. That will give us
>> enough information to tell if a hardware watchpoint triggered or not.
>>
>> The SVE load/store support is only a placeholder for now, as we need to
>> parse the instructions and use the variable length to determine the memory
>> access size (planned for the future).
>>
>> The patch also fixes these two failures in the testsuite:
>>
>> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
>> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
>>
>> Regression tested (v2) on aarch64-linux Ubuntu/20.04 and Ubuntu/18.04.
>> Regression tested (v3) on aarch64-linux fedora 39.
>>
>> I also used a very slow aarch64 hardware watchpoint stress test to validate
>> the v2 patch, but I don't think that particular test should be included in the
>> testsuite. It takes quite a while to run (20+ minutes), and shouldn't be
>> required unless we know there are problems with hardware watchpoint handling.
>>
>> PR tdep/29423
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29423
>> ---
>> gdb/aarch64-fbsd-nat.c | 7 +-
>> gdb/aarch64-linux-nat.c | 7 +-
>> gdb/aarch64-nat.c | 268 ++++++++++++++++++++++++++++++++++++++--
>> gdb/aarch64-nat.h | 3 +-
>> 4 files changed, 273 insertions(+), 12 deletions(-)
>>
>> diff --git a/gdb/aarch64-fbsd-nat.c b/gdb/aarch64-fbsd-nat.c
>> index 38fb093f139..39162fe3bb0 100644
>> --- a/gdb/aarch64-fbsd-nat.c
>> +++ b/gdb/aarch64-fbsd-nat.c
>> @@ -154,9 +154,14 @@ aarch64_fbsd_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>
>> const CORE_ADDR addr_trap = (CORE_ADDR) siginfo.si_addr;
>>
>> + struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> + CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> + uint32_t insn;
>> + read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>> /* Check if the address matches any watched address. */
>> state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> - return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> + return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>> }
>>
>> /* Implement the "stopped_by_watchpoint" target_ops method. */
>> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
>> index 5b4e3c2bde1..5494bb07517 100644
>> --- a/gdb/aarch64-linux-nat.c
>> +++ b/gdb/aarch64-linux-nat.c
>> @@ -962,9 +962,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>> const CORE_ADDR addr_trap
>> = gdbarch_remove_non_address_bits (gdbarch, (CORE_ADDR) siginfo.si_addr);
>>
>> + struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> + CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> + uint32_t insn;
>> + read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>> /* Check if the address matches any watched address. */
>> state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> - return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> + return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>> }
>>
>> /* Implement the "stopped_by_watchpoint" target_ops method. */
>> diff --git a/gdb/aarch64-nat.c b/gdb/aarch64-nat.c
>> index ee8c5a1e21d..928bb70d644 100644
>> --- a/gdb/aarch64-nat.c
>> +++ b/gdb/aarch64-nat.c
>> @@ -22,6 +22,7 @@
>> #include "inferior.h"
>> #include "cli/cli-cmds.h"
>> #include "aarch64-nat.h"
>> +#include "arch/aarch64-insn.h"
>>
>> #include <unordered_map>
>>
>> @@ -225,27 +226,275 @@ aarch64_remove_watchpoint (CORE_ADDR addr, int len, enum target_hw_bp_type type,
>> return ret;
>> }
>>
>> -/* See aarch64-nat.h. */
>> +/* Macros to distinguish between various types of load/store instructions. */
>> +
>> +/* Regular and Advanced SIMD load/store instructions. */
>> +#define GENERAL_LOAD_STORE_P(insn) (bit (insn, 25) == 0 && bit (insn, 27) == 1)
>> +
>> +/* SVE load/store instruction. */
>> +#define SVE_LOAD_STORE_P(insn) (bits (insn, 25, 28) == 0x2 \
>> + && (bits (insn, 29, 31) == 0x4 \
>> + || bits (insn, 29, 31) == 0x5 \
>> + || bits (insn, 29, 31) == 0x6 \
>> + || (bits (insn, 29, 31) == 0x7 \
>> + && ((bit (insn, 15) == 0x0 \
>> + && (bit (insn, 13) == 0x0 \
>> + || bit (insn, 13) == 0x1)) \
>> + || (bit (insn, 15) == 0x1 \
>> + && bit (insn, 13) == 0x0) \
>> + || bits (insn, 13, 15) == 0x6 \
>> + || bits (insn, 13, 15) == 0x7))))
>> +
>> +/* Any load/store instruction (regular, Advanced SIMD or SVE). */
>> +#define LOAD_STORE_P(insn) (GENERAL_LOAD_STORE_P (insn) \
>> + || SVE_LOAD_STORE_P (insn))
>> +
>> +/* Load/Store pair (regular or vector). */
>> +#define LOAD_STORE_PAIR_P(insn) (bit (insn, 28) == 0 && bit (insn, 29) == 1)
>> +
>> +#define COMPARE_SWAP_PAIR_P(insn) (bits (insn, 30, 31) == 0 \
>> + && bits (insn, 28, 29) == 0 \
>> + && bit (insn, 26) == 0 \
>> + && bits (insn, 23, 24) == 0 \
>> + && bit (insn, 11) == 1)
>> +#define LOAD_STORE_EXCLUSIVE_PAIR_P(insn) (bit (insn, 31) == 1 \
>> + && bits (insn, 28, 29) == 0 \
>> + && bit (insn, 26) == 0 \
>> + && bits (insn, 23, 24) == 0 \
>> + && bit (insn, 11) == 1)
>> +
>> +#define LOAD_STORE_LITERAL_P(insn) (bit (insn, 28) == 1 \
>> + && bit (insn, 29) == 0 \
>> + && bit (insn, 24) == 0)
>> +
>> +/* Vector Load/Store multiple structures. */
>> +#define LOAD_STORE_MS(insn) (bits (insn, 28, 29) == 0x0 \
>> + && bit (insn, 31) == 0x0 \
>> + && bit (insn, 26) == 0x1 \
>> + && ((bits (insn, 23, 24) == 0x0 \
>> + && bits (insn, 16, 21) == 0x0) \
>> + || (bits (insn, 23, 24) == 0x1 \
>> + && bit (insn, 21) == 0x0)))
>> +
>> +/* Vector Load/Store single structure. */
>> +#define LOAD_STORE_SS(insn) (bits (insn, 28, 29) == 0x0 \
>> + && bit (insn, 31) == 0x0 \
>> + && bit (insn, 26) == 0x1 \
>> + && ((bits (insn, 23, 24) == 0x2 \
>> + && bits (insn, 16, 20) == 0x0) \
>> + || (bits (insn, 23, 24) == 0x3)))
>> +
>> +/* Assuming INSN is a load/store instruction, return the size of the
>> + memory access. The patterns are documented in the ARM Architecture
>> + Reference Manual - Index by encoding. */
>> +
>> +static unsigned int
>> +get_load_store_access_size (CORE_ADDR insn)
>> +{
>> + if (SVE_LOAD_STORE_P (insn))
>> + {
>> + /* SVE load/store instructions. */
>> +
>> + /* This is not always correct for SVE instructions, but it is a reasonable
>> + default for now. Calculating the exact memory access size for SVE
>> + load/store instructions requires parsing instructions and evaluating
>> + the vector length. */
>> + return 16;
>> + }
>> +
>> + /* Non-SVE instructions. */
>> +
>> + bool vector = (bit (insn, 26) == 1);
>> + bool ldst_pair = LOAD_STORE_PAIR_P (insn);
>> +
>> + /* Is this an Advanced SIMD load/store instruction? */
>> + if (vector)
>> + {
>> + unsigned int size = bits (insn, 30, 31);
>> +
>> + if (LOAD_STORE_LITERAL_P (insn))
>> + {
>> + /* Advanced SIMD load/store literal */
>> + /* Sizes 4, 8 and 16 bytes. */
>> + return 4 << size;
>> + }
>> + else if (LOAD_STORE_MS (insn))
>> + {
>> + size = 8 << bit (insn, 30);
>> +
>> + unsigned char opcode = bits (insn, 12, 15);
>> +
>> + if (opcode == 0x0 || opcode == 0x2)
>> + size *= 4;
>> + else if (opcode == 0x4 || opcode == 0x6)
>> + size *= 3;
>> + else if (opcode == 0x8 || opcode == 0xa)
>> + size *= 2;
>> +
>> + return size;
>> + }
>> + else if (LOAD_STORE_SS (insn))
>> + {
>> + size = 8 << bit (insn, 30);
>> + return size;
>> + }
>> + /* Advanced SIMD load/store instructions. */
>> + else if (ldst_pair)
>> + {
>> + /* Advanced SIMD load/store pair. */
>> + /* Sizes 8, 16 and 32 bytes. */
>> + return (8 << size);
>> + }
>> + else
>> + {
>> + /* Regular Advanced SIMD load/store instructions. */
>> + size = size | (bit (insn, 23) << 2);
>> + return 1 << size;
>> + }
>> + }
>> +
>> + /* This must be a regular GPR load/store instruction. */
>> +
>> + unsigned int size = bits (insn, 30, 31);
>> + bool cs_pair = COMPARE_SWAP_PAIR_P (insn);
>> + bool ldstx_pair = LOAD_STORE_EXCLUSIVE_PAIR_P (insn);
>> +
>> + if (ldst_pair)
>> + {
>> + /* Load/Store pair. */
>> + if (size == 0x1)
>> + {
>> + if (bit (insn, 22) == 0)
>> + /* STGP - 16 bytes. */
>> + return 16;
>> + else
>> + /* LDPSW - 8 bytes. */
>> + return 8;
>> + }
>> + /* Other Load/Store pair. */
>> + return (size == 0)? 8 : 16;
>> + }
>> + else if (cs_pair || ldstx_pair)
>> + {
>> + /* Compare Swap Pair or Load Store Exclusive Pair. */
>> + /* Sizes 8 and 16 bytes. */
>> + size = bit (insn, 30);
>> + return (8 << size);
>> + }
>> + else if (LOAD_STORE_LITERAL_P (insn))
>> + {
>> + /* Load/Store literal. */
>> + /* Sizes between 4 and 8 bytes. */
>> + if (size == 0x2)
>> + return 4;
>> +
>> + return 4 << size;
>> + }
>> + else
>> + {
>> + /* General load/store instructions, with memory access sizes between
>> + 1 and 8 bytes. */
>> + return (1 << size);
>> + }
>> +}
>> +
>> +/* Return true if the regions [mem_addr, mem_addr + mem_len) and
>> + [watch_addr, watch_addr + watch_len) overlap. False otherwise. */
>> +
>> +static bool
>> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
>> + CORE_ADDR watch_addr, size_t watch_len)
>> +{
>> + /* Quick check for non-overlapping case. */
>> + if (watch_addr >= (mem_addr + mem_len)
>> + || mem_addr >= (watch_addr + watch_len))
>> + return false;
>> +
>> + CORE_ADDR start = std::max (mem_addr, watch_addr);
>> + CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
>> +
>> + return ((end - start) > 0);
>> +}
>> +
>> +/* Check if a hardware watchpoint has triggered. If a trigger is detected,
>> + return true and update ADDR_P with the stopped data address.
>> + Otherwise return false.
>> +
>> + STATE is the debug register's state, INSN is the instruction the inferior
>> + stopped at and ADDR_TRAP is the reported stopped data address. */
>>
>> bool
>> aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> - CORE_ADDR addr_trap, CORE_ADDR *addr_p)
>> + CORE_ADDR insn, CORE_ADDR addr_trap,
>> + CORE_ADDR *addr_p)
>> {
>> - int i;
>> + /* There are 6 variations of watchpoint range and memory access
>> + range positioning:
>> +
>> + - W is the byte in the watchpoint range only.
>> +
>> + - B is the byte in the memory access range ony.
>> +
>> + - O is the byte in the overlapping region of the watchpoint range and
>> + the memory access range.
>> +
>> + 1 - Non-overlapping, no triggers.
>> +
>> + [WWWWWWWW]...[BBBBBBBB]
>> +
>> + 2 - Non-overlapping, no triggers.
>> +
>> + [BBBBBBBB]...[WWWWWWWW]
>> +
>> + 3 - Overlapping partially, triggers.
>> +
>> + [BBBBOOOOWWWW]
>> +
>> + 4 - Overlapping partially, triggers.
>> +
>> + [WWWWOOOOBBBB]
>> +
>> + 5 - Memory access contained in watchpoint range, triggers.
>> +
>> + [WWWWOOOOOOOOWWWW]
>> +
>> + 6 - Memory access containing watchpoint range, triggers.
>> +
>> + [BBBBOOOOOOOOBBBB]
>> + */
>> +
>> + /* If there are no load/store instructions at PC, this must be a hardware
>> + breakpoint hit. Return early.
>> +
>> + If a hardware breakpoint is placed at a PC containing a load/store
>> + instruction, we will go through the memory access size check anyway, but
>> + will eventually figure out there are no watchpoints matching such an
>> + address.
>> +
>> + There is one corner case though, which is having a hardware breakpoint and
>> + a hardware watchpoint at PC, when PC contains a load/store
>> + instruction. That is an ambiguous case that is hard to differentiate.
>> +
>> + There's not much we can do about that unless the kernel sends us enough
>> + information to tell them apart. */
>> + if (!LOAD_STORE_P (insn))
>> + return false;
>> +
>> + /* Get the memory access size for the instruction at PC. */
>> + unsigned int memory_access_size = get_load_store_access_size (insn);
>>
>> - for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> + for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> {
>> const unsigned int offset
>> = aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>> const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>> const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> - const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
>> const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>>
>> - if (state->dr_ref_count_wp[i]
>> - && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
>> - && addr_trap >= addr_watch_aligned
>> - && addr_trap < addr_watch + len)
>> + if ((state->dr_ref_count_wp[i]
>> + && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
>> + && hw_watch_regions_overlap (addr_trap, memory_access_size,
>> + addr_watch, len))
>> {
>> /* ADDR_TRAP reports the first address of the memory range
>> accessed by the CPU, regardless of what was the memory
>> @@ -270,6 +519,7 @@ aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> }
>> }
>>
>> + /* No hardware watchpoint hits detected. */
>> return false;
>> }
>>
>> diff --git a/gdb/aarch64-nat.h b/gdb/aarch64-nat.h
>> index fee6bda2577..bc0e6297f40 100644
>> --- a/gdb/aarch64-nat.h
>> +++ b/gdb/aarch64-nat.h
>> @@ -51,7 +51,8 @@ void aarch64_remove_debug_reg_state (pid_t pid);
>> *ADDR_P. */
>>
>> bool aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> - CORE_ADDR addr_trap, CORE_ADDR *addr_p);
>> + CORE_ADDR insn, CORE_ADDR addr_trap,
>> + CORE_ADDR *addr_p);
>>
>> /* Helper functions used by aarch64_nat_target below. See their
>> definitions. */
>>
>> base-commit: e78337d5780c9d837e186c22c11eb8f9ed5bac62
>
next prev parent reply other threads:[~2024-02-20 21:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-05 15:07 Tom de Vries
2024-01-08 12:27 ` Luis Machado
2024-02-20 21:07 ` Tom de Vries [this message]
2024-03-06 13:36 ` Luis Machado
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b02d76f6-74ea-48e5-9d0d-6282ee188f88@suse.de \
--to=tdevries@suse.de \
--cc=gdb-patches@sourceware.org \
--cc=luis.machado@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).