Re: [PATCH v3] Fix missing watchpoint hits/endless loop

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Tom de Vries <tdevries@suse.de>
To: Luis Machado <luis.machado@arm.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH v3] Fix missing watchpoint hits/endless loop
Date: Tue, 20 Feb 2024 22:07:12 +0100	[thread overview]
Message-ID: <b02d76f6-74ea-48e5-9d0d-6282ee188f88@suse.de> (raw)
In-Reply-To: <36f17aa1-72a5-4b57-a15d-efce90cf8b2d@arm.com>

On 1/8/24 13:27, Luis Machado wrote:
> Hi Tom,
> 
> Thanks for refreshing this patch.
> 
> As I hinted on IRC, back then I ended up deciding not to pursue this approach further. Though it does
> address some of the potential failures, it is a bit convoluted and poses a fairly significant
> maintenance burden (we need to explicitly identify instructions, so we will always be behind), including
> the risk of producing false positives.
> 
> For SVE/SME, there are also predicated accesses, which means an intruction may read/write a discontiguous range
> of bytes depending on the predicate mask. That is trickier to handle.
> 
> The nature of hardware watchpoint detection on aarch64 means the triggering behavior is dependent on a particular
> CPU spec, in a way that is hard for userspace to predict reliably. Hopefully in the future this situation will
> improve for userspace. The kernel folks are aware of it.
> 

Hi Luis,

ack, that makes sense.

> Meanwhile, given the above restrictions (and potentially restrictions of other architectures), we could teach gdb
> about imprecise hardware watchpoint triggers. Something that would allow gdb to tell userspace that we know
> a hardware watchpoint has triggered, but we don't know exactly for what range and what happened.
> 
> At least this would prevent certain situations where gdb is simply stuck trying to skip over unskippable
> hardware watchpoints, because it can't tell what happened and ended up with a generic SIGTRAP.
> 
> Thoughts?
> 

I've submitted a two-patch series here ( 
https://sourceware.org/pipermail/gdb-patches/2024-February/206706.html 
), where the second patch does something a bit like this: the aarch64 
tdep tells gdb that we know that a hardware watchpoint has triggered, 
but doesn't know exactly for what range, and since it's a regular write 
watchpoint, gdb handles this fine.

With this patch series, I'm getting rid of the last consistent FAILs I 
see on m1 (not counting libcc1 stuff).

Thanks,
- Tom

> On 1/5/24 15:07, Tom de Vries wrote:
>> From: Luis Machado <luis.machado@linaro.org>
>>
>> Updates on v3:
>>
>> - Rebased to current trunk (v2 applied cleanly on gdb-12-branch).
>> - Should also work on fbsd, though I haven't tested it.
>> - Dropped gdbserver part (should be possible, I'm just not sure yet where to
>>    move aarch64_stopped_data_address such that it's accessible from gdbserver).
>> - No attempt made to address any review remarks on v2.
>>    https://inbox.sourceware.org/gdb-patches/D2AC3C31-5EEE-4BCA-8D75-7CEDFA75F540@arm.com/
>>
>> Updates on v2:
>>
>> - Handle more types of load/store instructions, including a catch all
>>    for SVE load/store instructions.
>>
>> --
>>
>> I ran into a situation where a hardware watchpoint hit is not detected
>> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
>>
>> The problem is that hardware watchpoints are not skippable on AArch64, so
>> that makes GDB loop endlessly trying to run past the instruction.
>>
>> The most obvious case where this happens is when the load/store pair
>> instructions access 16 bytes of memory.
>>
>> Suppose we have a stp instruction that will write a couple 64-bit registers
>> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
>>
>> Now suppose a write watchpoint is created to monitor memory address 0x18,
>> which is the start of the second register write. It can have whatever length,
>> but let's assume it has length 8.
>>
>> When we execute that stp instruction, it will trap and the reported stopped
>> data address from the kernel will be 0x10 (the beginning of the memory range
>> accessed by the instruction).
>>
>> The current code won't be able to detect a valid trigger because it assumes an
>> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
>> won't be enough to detect a 16-byte access if the trap address falls outside of
>> the 8-byte alignment window. We need to know how many bytes the instruction
>> will access, but we won't have that data unless we go parsing instructions.
>>
>> Another issue with the current code seems to be that it assumes the accesses
>> will always be 8 bytes in size, since it wants to align the watchpoint address
>> to that particular boundary. This leads to problems when we have unaligned
>> addresses and unaligned watchpoints.
>>
>> For example, suppose we have a str instruction storing 8 bytes to memory
>> address 0xf. Now suppose we have a write watchpoint at address 0x16,
>> monitoring 8 bytes.
>>
>> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
>> 0x10, and so GDB doesn't think this is a watchpoint hit.
>>
>> I believe you can trigger the same problem with smaller memory accesses,
>> except one that accesses a single byte.
>>
>> In order to cover most of the cases correctly, we parse the load/store
>> instructions and detect how many bytes they're accessing. That will give us
>> enough information to tell if a hardware watchpoint triggered or not.
>>
>> The SVE load/store support is only a placeholder for now, as we need to
>> parse the instructions and use the variable length to determine the memory
>> access size (planned for the future).
>>
>> The patch also fixes these two failures in the testsuite:
>>
>> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
>> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
>>
>> Regression tested (v2) on aarch64-linux Ubuntu/20.04 and Ubuntu/18.04.
>> Regression tested (v3) on aarch64-linux fedora 39.
>>
>> I also used a very slow aarch64 hardware watchpoint stress test to validate
>> the v2 patch, but I don't think that particular test should be included in the
>> testsuite. It takes quite a while to run (20+ minutes), and shouldn't be
>> required unless we know there are problems with hardware watchpoint handling.
>>
>> PR tdep/29423
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29423
>> ---
>>   gdb/aarch64-fbsd-nat.c  |   7 +-
>>   gdb/aarch64-linux-nat.c |   7 +-
>>   gdb/aarch64-nat.c       | 268 ++++++++++++++++++++++++++++++++++++++--
>>   gdb/aarch64-nat.h       |   3 +-
>>   4 files changed, 273 insertions(+), 12 deletions(-)
>>
>> diff --git a/gdb/aarch64-fbsd-nat.c b/gdb/aarch64-fbsd-nat.c
>> index 38fb093f139..39162fe3bb0 100644
>> --- a/gdb/aarch64-fbsd-nat.c
>> +++ b/gdb/aarch64-fbsd-nat.c
>> @@ -154,9 +154,14 @@ aarch64_fbsd_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>   
>>     const CORE_ADDR addr_trap = (CORE_ADDR) siginfo.si_addr;
>>   
>> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>     /* Check if the address matches any watched address.  */
>>     state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> -  return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> +  return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>>   }
>>   
>>   /* Implement the "stopped_by_watchpoint" target_ops method.  */
>> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
>> index 5b4e3c2bde1..5494bb07517 100644
>> --- a/gdb/aarch64-linux-nat.c
>> +++ b/gdb/aarch64-linux-nat.c
>> @@ -962,9 +962,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>     const CORE_ADDR addr_trap
>>       = gdbarch_remove_non_address_bits (gdbarch, (CORE_ADDR) siginfo.si_addr);
>>   
>> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>     /* Check if the address matches any watched address.  */
>>     state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> -  return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> +  return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>>   }
>>   
>>   /* Implement the "stopped_by_watchpoint" target_ops method.  */
>> diff --git a/gdb/aarch64-nat.c b/gdb/aarch64-nat.c
>> index ee8c5a1e21d..928bb70d644 100644
>> --- a/gdb/aarch64-nat.c
>> +++ b/gdb/aarch64-nat.c
>> @@ -22,6 +22,7 @@
>>   #include "inferior.h"
>>   #include "cli/cli-cmds.h"
>>   #include "aarch64-nat.h"
>> +#include "arch/aarch64-insn.h"
>>   
>>   #include <unordered_map>
>>   
>> @@ -225,27 +226,275 @@ aarch64_remove_watchpoint (CORE_ADDR addr, int len, enum target_hw_bp_type type,
>>     return ret;
>>   }
>>   
>> -/* See aarch64-nat.h.  */
>> +/* Macros to distinguish between various types of load/store instructions.  */
>> +
>> +/* Regular and Advanced SIMD load/store instructions.  */
>> +#define GENERAL_LOAD_STORE_P(insn) (bit (insn, 25) == 0 && bit (insn, 27) == 1)
>> +
>> +/* SVE load/store instruction.  */
>> +#define SVE_LOAD_STORE_P(insn) (bits (insn, 25, 28) == 0x2		      \
>> +				&& (bits (insn, 29, 31) == 0x4		      \
>> +				    || bits (insn, 29, 31) == 0x5	      \
>> +				    || bits (insn, 29, 31) == 0x6	      \
>> +				    || (bits (insn, 29, 31) == 0x7	      \
>> +					&& ((bit (insn, 15) == 0x0	      \
>> +					     && (bit (insn, 13) == 0x0	      \
>> +						 || bit (insn, 13) == 0x1))   \
>> +					     || (bit (insn, 15) == 0x1	      \
>> +						 && bit (insn, 13) == 0x0)    \
>> +					     || bits (insn, 13, 15) == 0x6    \
>> +					     || bits (insn, 13, 15) == 0x7))))
>> +
>> +/* Any load/store instruction (regular, Advanced SIMD or SVE).  */
>> +#define LOAD_STORE_P(insn) (GENERAL_LOAD_STORE_P (insn)			      \
>> +			    || SVE_LOAD_STORE_P (insn))
>> +
>> +/* Load/Store pair (regular or vector).  */
>> +#define LOAD_STORE_PAIR_P(insn) (bit (insn, 28) == 0 && bit (insn, 29) == 1)
>> +
>> +#define COMPARE_SWAP_PAIR_P(insn) (bits (insn, 30, 31) == 0		      \
>> +				   && bits (insn, 28, 29) == 0		      \
>> +				   && bit (insn, 26) == 0		      \
>> +				   && bits (insn, 23, 24) == 0		      \
>> +				   && bit (insn, 11) == 1)
>> +#define LOAD_STORE_EXCLUSIVE_PAIR_P(insn) (bit (insn, 31) == 1		      \
>> +					   && bits (insn, 28, 29) == 0	      \
>> +					   && bit (insn, 26) == 0	      \
>> +					   && bits (insn, 23, 24) == 0	      \
>> +					   && bit (insn, 11) == 1)
>> +
>> +#define LOAD_STORE_LITERAL_P(insn) (bit (insn, 28) == 1			      \
>> +				    && bit (insn, 29) == 0		      \
>> +				    && bit (insn, 24) == 0)
>> +
>> +/* Vector Load/Store multiple structures.  */
>> +#define LOAD_STORE_MS(insn) (bits (insn, 28, 29) == 0x0			      \
>> +			     && bit (insn, 31) == 0x0			      \
>> +			     && bit (insn, 26) == 0x1			      \
>> +			     && ((bits (insn, 23, 24) == 0x0		      \
>> +				  && bits (insn, 16, 21) == 0x0)	      \
>> +				 || (bits (insn, 23, 24) == 0x1		      \
>> +				     && bit (insn, 21) == 0x0)))
>> +
>> +/* Vector Load/Store single structure.  */
>> +#define LOAD_STORE_SS(insn) (bits (insn, 28, 29) == 0x0			      \
>> +			     && bit (insn, 31) == 0x0			      \
>> +			     && bit (insn, 26) == 0x1			      \
>> +			     && ((bits (insn, 23, 24) == 0x2		      \
>> +				  && bits (insn, 16, 20) == 0x0)	      \
>> +				 || (bits (insn, 23, 24) == 0x3)))
>> +
>> +/* Assuming INSN is a load/store instruction, return the size of the
>> +   memory access.  The patterns are documented in the ARM Architecture
>> +   Reference Manual - Index by encoding.  */
>> +
>> +static unsigned int
>> +get_load_store_access_size (CORE_ADDR insn)
>> +{
>> +  if (SVE_LOAD_STORE_P (insn))
>> +    {
>> +      /* SVE load/store instructions.  */
>> +
>> +      /* This is not always correct for SVE instructions, but it is a reasonable
>> +	 default for now.  Calculating the exact memory access size for SVE
>> +	 load/store instructions requires parsing instructions and evaluating
>> +	 the vector length.  */
>> +      return 16;
>> +    }
>> +
>> +  /* Non-SVE instructions.  */
>> +
>> +  bool vector = (bit (insn, 26) == 1);
>> +  bool ldst_pair = LOAD_STORE_PAIR_P (insn);
>> +
>> +  /* Is this an Advanced SIMD load/store instruction?  */
>> +  if (vector)
>> +    {
>> +      unsigned int size = bits (insn, 30, 31);
>> +
>> +      if (LOAD_STORE_LITERAL_P (insn))
>> +	{
>> +	  /* Advanced SIMD load/store literal */
>> +	  /* Sizes 4, 8 and 16 bytes.  */
>> +	  return 4 << size;
>> +	}
>> +      else if (LOAD_STORE_MS (insn))
>> +	{
>> +	  size = 8 << bit (insn, 30);
>> +
>> +	  unsigned char opcode = bits (insn, 12, 15);
>> +
>> +	  if (opcode == 0x0 || opcode == 0x2)
>> +	    size *= 4;
>> +	  else if (opcode == 0x4 || opcode == 0x6)
>> +	    size *= 3;
>> +	  else if (opcode == 0x8 || opcode == 0xa)
>> +	    size *= 2;
>> +
>> +	  return size;
>> +	}
>> +      else if (LOAD_STORE_SS (insn))
>> +	{
>> +	  size = 8 << bit (insn, 30);
>> +	  return size;
>> +	}
>> +      /* Advanced SIMD load/store instructions.  */
>> +      else if (ldst_pair)
>> +	{
>> +	  /* Advanced SIMD load/store pair.  */
>> +	  /* Sizes 8, 16 and 32 bytes.  */
>> +	  return (8 << size);
>> +	}
>> +      else
>> +	{
>> +	  /* Regular Advanced SIMD load/store instructions.  */
>> +	  size = size | (bit (insn, 23) << 2);
>> +	  return 1 << size;
>> +	}
>> +    }
>> +
>> +  /* This must be a regular GPR load/store instruction.  */
>> +
>> +  unsigned int size = bits (insn, 30, 31);
>> +  bool cs_pair = COMPARE_SWAP_PAIR_P (insn);
>> +  bool ldstx_pair = LOAD_STORE_EXCLUSIVE_PAIR_P (insn);
>> +
>> +  if (ldst_pair)
>> +    {
>> +      /* Load/Store pair.  */
>> +      if (size == 0x1)
>> +	{
>> +	  if (bit (insn, 22) == 0)
>> +	    /* STGP - 16 bytes.  */
>> +	    return 16;
>> +	  else
>> +	    /* LDPSW - 8 bytes.  */
>> +	    return 8;
>> +	}
>> +      /* Other Load/Store pair.  */
>> +      return (size == 0)? 8 : 16;
>> +    }
>> +  else if (cs_pair || ldstx_pair)
>> +    {
>> +      /* Compare Swap Pair or Load Store Exclusive Pair.  */
>> +      /* Sizes 8 and 16 bytes.  */
>> +      size = bit (insn, 30);
>> +      return (8 << size);
>> +    }
>> +  else if (LOAD_STORE_LITERAL_P (insn))
>> +    {
>> +      /* Load/Store literal.  */
>> +      /* Sizes between 4 and 8 bytes.  */
>> +      if (size == 0x2)
>> +	return 4;
>> +
>> +      return 4 << size;
>> +    }
>> +  else
>> +    {
>> +      /* General load/store instructions, with memory access sizes between
>> +	 1 and 8 bytes.  */
>> +      return (1 << size);
>> +    }
>> +}
>> +
>> +/* Return true if the regions [mem_addr, mem_addr + mem_len) and
>> +   [watch_addr, watch_addr + watch_len) overlap.  False otherwise.  */
>> +
>> +static bool
>> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
>> +			  CORE_ADDR watch_addr, size_t watch_len)
>> +{
>> +  /* Quick check for non-overlapping case.  */
>> +  if (watch_addr >= (mem_addr + mem_len)
>> +      || mem_addr >= (watch_addr + watch_len))
>> +    return false;
>> +
>> +  CORE_ADDR start = std::max (mem_addr, watch_addr);
>> +  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
>> +
>> +  return ((end - start) > 0);
>> +}
>> +
>> +/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
>> +   return true and update ADDR_P with the stopped data address.
>> +   Otherwise return false.
>> +
>> +   STATE is the debug register's state, INSN is the instruction the inferior
>> +   stopped at and ADDR_TRAP is the reported stopped data address.  */
>>   
>>   bool
>>   aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> -			      CORE_ADDR addr_trap, CORE_ADDR *addr_p)
>> +			      CORE_ADDR insn, CORE_ADDR addr_trap,
>> +			      CORE_ADDR *addr_p)
>>   {
>> -  int i;
>> +  /* There are 6 variations of watchpoint range and memory access
>> +     range positioning:
>> +
>> +     - W is the byte in the watchpoint range only.
>> +
>> +     - B is the byte in the memory access range ony.
>> +
>> +     - O is the byte in the overlapping region of the watchpoint range and
>> +       the memory access range.
>> +
>> +     1 - Non-overlapping, no triggers.
>> +
>> +     [WWWWWWWW]...[BBBBBBBB]
>> +
>> +     2 - Non-overlapping, no triggers.
>> +
>> +     [BBBBBBBB]...[WWWWWWWW]
>> +
>> +     3 - Overlapping partially, triggers.
>> +
>> +     [BBBBOOOOWWWW]
>> +
>> +     4 - Overlapping partially, triggers.
>> +
>> +     [WWWWOOOOBBBB]
>> +
>> +     5 - Memory access contained in watchpoint range, triggers.
>> +
>> +     [WWWWOOOOOOOOWWWW]
>> +
>> +     6 - Memory access containing watchpoint range, triggers.
>> +
>> +     [BBBBOOOOOOOOBBBB]
>> +  */
>> +
>> +  /* If there are no load/store instructions at PC, this must be a hardware
>> +     breakpoint hit.  Return early.
>> +
>> +     If a hardware breakpoint is placed at a PC containing a load/store
>> +     instruction, we will go through the memory access size check anyway, but
>> +     will eventually figure out there are no watchpoints matching such an
>> +     address.
>> +
>> +     There is one corner case though, which is having a hardware breakpoint and
>> +     a hardware watchpoint at PC, when PC contains a load/store
>> +     instruction.  That is an ambiguous case that is hard to differentiate.
>> +
>> +     There's not much we can do about that unless the kernel sends us enough
>> +     information to tell them apart.  */
>> +  if (!LOAD_STORE_P (insn))
>> +    return false;
>> +
>> +  /* Get the memory access size for the instruction at PC.  */
>> +  unsigned int memory_access_size = get_load_store_access_size (insn);
>>   
>> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> +  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
>>       {
>>         const unsigned int offset
>>   	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>>         const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>>         const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
>>         const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>>   
>> -      if (state->dr_ref_count_wp[i]
>> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
>> -	  && addr_trap >= addr_watch_aligned
>> -	  && addr_trap < addr_watch + len)
>> +      if ((state->dr_ref_count_wp[i]
>> +	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
>> +	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
>> +				       addr_watch, len))
>>   	{
>>   	  /* ADDR_TRAP reports the first address of the memory range
>>   	     accessed by the CPU, regardless of what was the memory
>> @@ -270,6 +519,7 @@ aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>>   	}
>>       }
>>   
>> +  /* No hardware watchpoint hits detected.  */
>>     return false;
>>   }
>>   
>> diff --git a/gdb/aarch64-nat.h b/gdb/aarch64-nat.h
>> index fee6bda2577..bc0e6297f40 100644
>> --- a/gdb/aarch64-nat.h
>> +++ b/gdb/aarch64-nat.h
>> @@ -51,7 +51,8 @@ void aarch64_remove_debug_reg_state (pid_t pid);
>>      *ADDR_P.  */
>>   
>>   bool aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> -				   CORE_ADDR addr_trap, CORE_ADDR *addr_p);
>> +				   CORE_ADDR insn, CORE_ADDR addr_trap,
>> +				   CORE_ADDR *addr_p);
>>   
>>   /* Helper functions used by aarch64_nat_target below.  See their
>>      definitions.  */
>>
>> base-commit: e78337d5780c9d837e186c22c11eb8f9ed5bac62
>

next prev parent reply	other threads:[~2024-02-20 21:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05 15:07 Tom de Vries
2024-01-08 12:27 ` Luis Machado
2024-02-20 21:07   ` Tom de Vries [this message]
2024-03-06 13:36     ` Luis Machado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b02d76f6-74ea-48e5-9d0d-6282ee188f88@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=luis.machado@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).