public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][AArch64] Fix missing watchpoint hits/endless loop
@ 2021-06-02 14:27 Luis Machado
  2021-06-02 16:57 ` Alan Hayward
  2021-07-23 13:24 ` [PATCH, v2][AArch64] " Luis Machado
  0 siblings, 2 replies; 6+ messages in thread
From: Luis Machado @ 2021-06-02 14:27 UTC (permalink / raw)
  To: gdb-patches; +Cc: alan.hayward, david.spickett

I ran into a situation where a hardware watchpoint hit is not detected
correctly, misleading GDB into thinking it was a delayed breakpoint hit.

The problem is that hardware watchpoints are not skippable on AArch64, so
that makes GDB loop endlessly trying to run past the instruction.

The most obvious case where this happens is when the load/store pair
instructions access 16 bytes of memory.

Suppose we have a stp instruction that will write a couple 64-bit registers
to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.

Now suppose a write watchpoint is created to monitor memory address 0x18,
which is the start of the second register write. It can have whatever length,
but let's assume it has length 8.

When we execute that stp instruction, it will trap and the reported stopped
data address from the kernel will be 0x10 (the beginning of the memory range
accessed by the instruction).

The current code won't be able to detect a valid trigger because it assumes an
alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
won't be enough to detect a 16-byte access if the trap address falls outside of
the 8-byte alignment window. We need to know how many bytes the instruction
will access, but we won't have that data unless we go parsing instructions.

Another issue with the current code seems to be that it assumes the accesses
will always be 8 bytes in size, since it wants to align the watchpoint address
to that particular boundary. This leads to problems when we have unaligned
addresses and unaligned watchpoints.

For example, suppose we have a str instruction storing 8 bytes to memory
address 0xf. Now suppose we have a write watchpoint at address 0x16,
monitoring 8 bytes.

The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
0x10, and so GDB doesn't think this is a watchpoint hit.

I believe you can trigger the same problem with smaller memory accesses,
except one that accesses a single byte.

As I said earlier, ideally we'd go parsing instructions to figure out how many
bytes they access. That is a bit complex though, so meanwhile I think we should
go with a simpler solution.

We keep the assumption that the instructions access 8 bytes, except when we
detect one of the instructions that access 16 bytes.

For the trigger detection, instead of forcing an alignment and then checking
if the trap address falls within the range of the watchpoint, we just check
if the memory access and the watchpoint range overlap. If they do, then we
have a watchpoint hit.

This still has potential to give false positives if we choose the right
combination of memory address, access size and watchpoint address. But it
should match the old code in this regard, while giving better results for
accesses of 16 bytes and unaligned accesses.

It also fixes these two failures in the testsuite:

FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write

Regression tested on aarch64-linux Ubuntu/20.04.

gdb/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>

	* aarch64-linux-nat.c
	(aarch64_linux_nat_target::stopped_data_address): Refactor.
	* nat/aarch64-linux-hw-point.c (hw_watch_is_16b_ldst)
	(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
	* nat/aarch64-linux-hw-point.h (LDP_STP_MASK, STP_OPC_64)
	(LDP_OPC_64): New constants.
	(hw_watch_is_16b_ldst, hw_watch_regions_overlap)
	(hw_watch_detect_trigger): New prototypes.

gdbserver/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>

	* linux-aarch64-low.cc
	(aarch64_target::low_stopped_data_address): Refactor.
---
 gdb/aarch64-linux-nat.c          |  45 ++----------
 gdb/nat/aarch64-linux-hw-point.c | 120 +++++++++++++++++++++++++++++++
 gdb/nat/aarch64-linux-hw-point.h |  16 +++++
 gdbserver/linux-aarch64-low.cc   |  49 ++++---------
 4 files changed, 154 insertions(+), 76 deletions(-)

diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
index 61224022f6a..de405a5e351 100644
--- a/gdb/aarch64-linux-nat.c
+++ b/gdb/aarch64-linux-nat.c
@@ -951,7 +951,6 @@ bool
 aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
 {
   siginfo_t siginfo;
-  int i;
   struct aarch64_debug_reg_state *state;
 
   if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
@@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
   const CORE_ADDR addr_trap
     = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);
 
+  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
   /* Check if the address matches any watched address.  */
   state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
-	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch_aligned
-	  && addr_trap < addr_watch + len)
-	{
-	  /* ADDR_TRAP reports the first address of the memory range
-	     accessed by the CPU, regardless of what was the memory
-	     range watched.  Thus, a large CPU access that straddles
-	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-	     ADDR_TRAP that is lower than the
-	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-	     addr: |   4   |   5   |   6   |   7   |   8   |
-				   |---- range watched ----|
-		   |----------- range accessed ------------|
-
-	     In this case, ADDR_TRAP will be 4.
-
-	     To match a watchpoint known to GDB core, we must never
-	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-	     positive on kernels older than 4.10.  See PR
-	     external/20207.  */
-	  *addr_p = addr_orig;
-	  return true;
-	}
-    }
-
-  return false;
+  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
 }
 
 /* Implement the "stopped_by_watchpoint" target_ops method.  */
diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
index af2cc4254e2..32ed76157ac 100644
--- a/gdb/nat/aarch64-linux-hw-point.c
+++ b/gdb/nat/aarch64-linux-hw-point.c
@@ -865,3 +865,123 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
      the checking is costly.  */
   return 1;
 }
+
+/* Return true if INSN is a ldp/stp that accesses 16 bytes of memory.
+   Return false otherwise.  */
+
+bool
+hw_watch_is_16b_ldst (CORE_ADDR insn)
+{
+  return ((insn & LDP_STP_MASK) == STP_OPC_64
+	  || (insn & LDP_STP_MASK) == LDP_OPC_64);
+}
+
+/* Return true if the regions [mem_addr, mem_addr + mem_len] and
+   [watch_addr, watch_addr + watch_len] overlap.  False otherwise.  */
+
+bool
+hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+			  CORE_ADDR watch_addr, size_t watch_len)
+{
+  if (watch_addr > (mem_addr + mem_len)
+      || mem_addr > (watch_addr + watch_len))
+    return false;
+
+  CORE_ADDR start = std::max (mem_addr, watch_addr);
+  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
+
+  return ((end - start) > 0);
+}
+
+/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
+   return true and update ADDR_P with the stopped data address.
+   Otherwise return false.
+
+   STATE is the debug register's state, INSN is the instruction the inferior
+   stopped at and ADDR_TRAP is the reported stopped data address.  */
+
+bool
+hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+			 CORE_ADDR insn, CORE_ADDR addr_trap,
+			 CORE_ADDR *addr_p)
+{
+  /* There are 6 variations of watchpoint range and memory access
+     range positioning:
+
+     - W is the byte in the watchpoint range only.
+
+     - M is the byte in the memory access range ony.
+
+     - O is the byte in the overlapping region of the watchpoint range and
+       the memory access range.
+
+     1 - Non-overlapping, no triggers.
+
+     [WWWWWWWW]...[MMMMMMMM]
+
+     2 - Non-overlapping, no triggers.
+
+     [MMMMMMMM]...[WWWWWWWW]
+
+     3 - Overlapping partially, triggers.
+
+     [MMMMOOOOWWWW]
+
+     4 - Overlapping partially, triggers.
+
+     [WWWWOOOOMMMM]
+
+     5 - Memory access contained in watchpoint range, triggers.
+
+     [WWWWOOOOOOOOWWWW]
+
+     6 - Memory access containing watchpoint range, triggers.
+
+     [MMMMOOOOOOOOMMMM]
+  */
+
+  /* We assume the memory access size is 8 bytes.  */
+  unsigned int memory_access_size = 8;
+
+  /* Check if the memory access size is 16 bytes.  */
+  if (hw_watch_is_16b_ldst (insn))
+    memory_access_size = 16;
+
+  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
+    {
+      const unsigned int offset
+	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
+      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
+      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
+      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
+
+      if ((state->dr_ref_count_wp[i]
+	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
+	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
+				       addr_watch, len))
+	{
+	  /* ADDR_TRAP reports the first address of the memory range
+	     accessed by the CPU, regardless of what was the memory
+	     range watched.  Thus, a large CPU access that straddles
+	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
+	     ADDR_TRAP that is lower than the
+	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
+
+	     addr: |   4   |   5   |   6   |   7   |   8   |
+				   |---- range watched ----|
+		   |----------- range accessed ------------|
+
+	     In this case, ADDR_TRAP will be 4.
+
+	     To match a watchpoint known to GDB core, we must never
+	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
+	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
+	     positive on kernels older than 4.10.  See PR
+	     external/20207.  */
+	  *addr_p = addr_orig;
+	  return true;
+	}
+    }
+
+  return false;
+}
diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
index 2fc4b400ece..62e755dd1e9 100644
--- a/gdb/nat/aarch64-linux-hw-point.h
+++ b/gdb/nat/aarch64-linux-hw-point.h
@@ -132,6 +132,13 @@ typedef ULONGEST dr_changed_t;
 #define DR_HAS_CHANGED(x) ((x) != 0)
 #define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))
 
+/* Mask for matching LDP and STP instruction variants.  */
+#define LDP_STP_MASK  0xFE400000
+/* stp and stnp with 64-bit registers.  */
+#define STP_OPC_64    0xA8000000
+/* ldp and ldnp with 64-bit registers.  */
+#define LDP_OPC_64    0xA8400000
+
 /* Structure for managing the hardware breakpoint/watchpoint resources.
    DR_ADDR_* stores the address, DR_CTRL_* stores the control register
    content, and DR_REF_COUNT_* counts the numbers of references to the
@@ -197,4 +204,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);
 
 int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);
 
+bool hw_watch_is_16b_ldst (CORE_ADDR insn);
+
+bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+			       CORE_ADDR watch_addr, size_t watch_len);
+
+bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+			      CORE_ADDR insn, CORE_ADDR stopped_data_address,
+			      CORE_ADDR *addr_p);
+
 #endif /* NAT_AARCH64_LINUX_HW_POINT_H */
diff --git a/gdbserver/linux-aarch64-low.cc b/gdbserver/linux-aarch64-low.cc
index daccfef746e..5df632fe724 100644
--- a/gdbserver/linux-aarch64-low.cc
+++ b/gdbserver/linux-aarch64-low.cc
@@ -518,7 +518,7 @@ CORE_ADDR
 aarch64_target::low_stopped_data_address ()
 {
   siginfo_t siginfo;
-  int pid, i;
+  int pid;
   struct aarch64_debug_reg_state *state;
 
   pid = lwpid_of (current_thread);
@@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
   const CORE_ADDR addr_trap
     = address_significant ((CORE_ADDR) siginfo.si_addr);
 
+  struct regcache *regs = get_thread_regcache (current_thread, 1);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
   /* Check if the address matches any watched address.  */
   state = aarch64_get_debug_reg_state (pid_of (current_thread));
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
-	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch_aligned
-	  && addr_trap < addr_watch + len)
-	{
-	  /* ADDR_TRAP reports the first address of the memory range
-	     accessed by the CPU, regardless of what was the memory
-	     range watched.  Thus, a large CPU access that straddles
-	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-	     ADDR_TRAP that is lower than the
-	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-	     addr: |   4   |   5   |   6   |   7   |   8   |
-				   |---- range watched ----|
-		   |----------- range accessed ------------|
-
-	     In this case, ADDR_TRAP will be 4.
-
-	     To match a watchpoint known to GDB core, we must never
-	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-	     positive on kernels older than 4.10.  See PR
-	     external/20207.  */
-	  return addr_orig;
-	}
-    }
 
-  return (CORE_ADDR) 0;
+  CORE_ADDR trigger_addr;
+
+  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
+    return trigger_addr;
+
+  return 0;
 }
 
 /* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][AArch64] Fix missing watchpoint hits/endless loop
  2021-06-02 14:27 [PATCH][AArch64] Fix missing watchpoint hits/endless loop Luis Machado
@ 2021-06-02 16:57 ` Alan Hayward
  2021-06-02 17:22   ` Luis Machado
  2021-07-23 13:24 ` [PATCH, v2][AArch64] " Luis Machado
  1 sibling, 1 reply; 6+ messages in thread
From: Alan Hayward @ 2021-06-02 16:57 UTC (permalink / raw)
  To: Luis Machado; +Cc: gdb-patches\@sourceware.org, david.spickett, nd



> On 2 Jun 2021, at 15:27, Luis Machado <luis.machado@linaro.org> wrote:
> 
> I ran into a situation where a hardware watchpoint hit is not detected
> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
> 
> The problem is that hardware watchpoints are not skippable on AArch64, so
> that makes GDB loop endlessly trying to run past the instruction.
> 
> The most obvious case where this happens is when the load/store pair
> instructions access 16 bytes of memory.
> 
> Suppose we have a stp instruction that will write a couple 64-bit registers
> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
> 
> Now suppose a write watchpoint is created to monitor memory address 0x18,
> which is the start of the second register write. It can have whatever length,
> but let's assume it has length 8.
> 
> When we execute that stp instruction, it will trap and the reported stopped
> data address from the kernel will be 0x10 (the beginning of the memory range
> accessed by the instruction).
> 
> The current code won't be able to detect a valid trigger because it assumes an
> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
> won't be enough to detect a 16-byte access if the trap address falls outside of
> the 8-byte alignment window. We need to know how many bytes the instruction
> will access, but we won't have that data unless we go parsing instructions.
> 
> Another issue with the current code seems to be that it assumes the accesses
> will always be 8 bytes in size, since it wants to align the watchpoint address
> to that particular boundary. This leads to problems when we have unaligned
> addresses and unaligned watchpoints.
> 
> For example, suppose we have a str instruction storing 8 bytes to memory
> address 0xf. Now suppose we have a write watchpoint at address 0x16,
> monitoring 8 bytes.
> 
> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
> 0x10, and so GDB doesn't think this is a watchpoint hit.

Ouch.
I’m starting to wonder now if I have encountered this bug myself, and just
shrugged it off thinking the variable wasn’t being hit.

> 
> I believe you can trigger the same problem with smaller memory accesses,
> except one that accesses a single byte.
> 
> As I said earlier, ideally we'd go parsing instructions to figure out how many
> bytes they access. That is a bit complex though, so meanwhile I think we should
> go with a simpler solution.
> 
> We keep the assumption that the instructions access 8 bytes, except when we
> detect one of the instructions that access 16 bytes.

What about NEON and SVE instructions? They can access a lot of data at once.

> 
> For the trigger detection, instead of forcing an alignment and then checking
> if the trap address falls within the range of the watchpoint, we just check
> if the memory access and the watchpoint range overlap. If they do, then we
> have a watchpoint hit.
> 
> This still has potential to give false positives if we choose the right
> combination of memory address, access size and watchpoint address. But it
> should match the old code in this regard, while giving better results for
> accesses of 16 bytes and unaligned accesses.

Is there a plan for fixing those?

> 
> It also fixes these two failures in the testsuite:
> 
> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
> 
> Regression tested on aarch64-linux Ubuntu/20.04.
> 
> gdb/ChangeLog:
> 
> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
> 
> 	* aarch64-linux-nat.c
> 	(aarch64_linux_nat_target::stopped_data_address): Refactor.
> 	* nat/aarch64-linux-hw-point.c (hw_watch_is_16b_ldst)
> 	(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
> 	* nat/aarch64-linux-hw-point.h (LDP_STP_MASK, STP_OPC_64)
> 	(LDP_OPC_64): New constants.
> 	(hw_watch_is_16b_ldst, hw_watch_regions_overlap)
> 	(hw_watch_detect_trigger): New prototypes.
> 
> gdbserver/ChangeLog:
> 
> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
> 
> 	* linux-aarch64-low.cc
> 	(aarch64_target::low_stopped_data_address): Refactor.
> ---
> gdb/aarch64-linux-nat.c          |  45 ++----------
> gdb/nat/aarch64-linux-hw-point.c | 120 +++++++++++++++++++++++++++++++
> gdb/nat/aarch64-linux-hw-point.h |  16 +++++
> gdbserver/linux-aarch64-low.cc   |  49 ++++---------
> 4 files changed, 154 insertions(+), 76 deletions(-)
> 
> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
> index 61224022f6a..de405a5e351 100644
> --- a/gdb/aarch64-linux-nat.c
> +++ b/gdb/aarch64-linux-nat.c
> @@ -951,7 +951,6 @@ bool
> aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
> {
>   siginfo_t siginfo;
> -  int i;
>   struct aarch64_debug_reg_state *state;
> 
>   if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
> @@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>   const CORE_ADDR addr_trap
>     = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);
> 
> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
> +  uint32_t insn;
> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
> +
>   /* Check if the address matches any watched address.  */
>   state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
> -    {
> -      const unsigned int offset
> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> -
> -      if (state->dr_ref_count_wp[i]
> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
> -	  && addr_trap >= addr_watch_aligned
> -	  && addr_trap < addr_watch + len)
> -	{
> -	  /* ADDR_TRAP reports the first address of the memory range
> -	     accessed by the CPU, regardless of what was the memory
> -	     range watched.  Thus, a large CPU access that straddles
> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> -	     ADDR_TRAP that is lower than the
> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> -
> -	     addr: |   4   |   5   |   6   |   7   |   8   |
> -				   |---- range watched ----|
> -		   |----------- range accessed ------------|
> -
> -	     In this case, ADDR_TRAP will be 4.
> -
> -	     To match a watchpoint known to GDB core, we must never
> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> -	     positive on kernels older than 4.10.  See PR
> -	     external/20207.  */
> -	  *addr_p = addr_orig;
> -	  return true;
> -	}
> -    }
> -
> -  return false;
> +  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
> }
> 
> /* Implement the "stopped_by_watchpoint" target_ops method.  */
> diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
> index af2cc4254e2..32ed76157ac 100644
> --- a/gdb/nat/aarch64-linux-hw-point.c
> +++ b/gdb/nat/aarch64-linux-hw-point.c
> @@ -865,3 +865,123 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
>      the checking is costly.  */
>   return 1;
> }
> +
> +/* Return true if INSN is a ldp/stp that accesses 16 bytes of memory.
> +   Return false otherwise.  */
> +
> +bool
> +hw_watch_is_16b_ldst (CORE_ADDR insn)
> +{
> +  return ((insn & LDP_STP_MASK) == STP_OPC_64
> +	  || (insn & LDP_STP_MASK) == LDP_OPC_64);
> +}
> +
> +/* Return true if the regions [mem_addr, mem_addr + mem_len] and
> +   [watch_addr, watch_addr + watch_len] overlap.  False otherwise.  */
> +
> +bool
> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
> +			  CORE_ADDR watch_addr, size_t watch_len)
> +{
> +  if (watch_addr > (mem_addr + mem_len)
> +      || mem_addr > (watch_addr + watch_len))
> +    return false;
> +
> +  CORE_ADDR start = std::max (mem_addr, watch_addr);
> +  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
> +
> +  return ((end - start) > 0);
> +}
> +
> +/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
> +   return true and update ADDR_P with the stopped data address.
> +   Otherwise return false.
> +
> +   STATE is the debug register's state, INSN is the instruction the inferior
> +   stopped at and ADDR_TRAP is the reported stopped data address.  */
> +
> +bool
> +hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
> +			 CORE_ADDR insn, CORE_ADDR addr_trap,
> +			 CORE_ADDR *addr_p)
> +{
> +  /* There are 6 variations of watchpoint range and memory access
> +     range positioning:
> +
> +     - W is the byte in the watchpoint range only.
> +
> +     - M is the byte in the memory access range ony.
> +
> +     - O is the byte in the overlapping region of the watchpoint range and
> +       the memory access range.
> +
> +     1 - Non-overlapping, no triggers.
> +
> +     [WWWWWWWW]...[MMMMMMMM]
> +
> +     2 - Non-overlapping, no triggers.
> +
> +     [MMMMMMMM]...[WWWWWWWW]
> +
> +     3 - Overlapping partially, triggers.
> +
> +     [MMMMOOOOWWWW]
> +
> +     4 - Overlapping partially, triggers.
> +
> +     [WWWWOOOOMMMM]
> +
> +     5 - Memory access contained in watchpoint range, triggers.
> +
> +     [WWWWOOOOOOOOWWWW]
> +
> +     6 - Memory access containing watchpoint range, triggers.
> +
> +     [MMMMOOOOOOOOMMMM]

Very minor nit: M looks like an upside down W, so there was an initial slight confusion.
Not worth changing, because M and W are the right letter choices.

> +  */
> +
> +  /* We assume the memory access size is 8 bytes.  */
> +  unsigned int memory_access_size = 8;
> +
> +  /* Check if the memory access size is 16 bytes.  */
> +  if (hw_watch_is_16b_ldst (insn))
> +    memory_access_size = 16;
> +
> +  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
> +    {
> +      const unsigned int offset
> +	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> +      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> +      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> +      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> +
> +      if ((state->dr_ref_count_wp[i]
> +	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
> +	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
> +				       addr_watch, len))
> +	{
> +	  /* ADDR_TRAP reports the first address of the memory range
> +	     accessed by the CPU, regardless of what was the memory
> +	     range watched.  Thus, a large CPU access that straddles
> +	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> +	     ADDR_TRAP that is lower than the
> +	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> +
> +	     addr: |   4   |   5   |   6   |   7   |   8   |
> +				   |---- range watched ----|
> +		   |----------- range accessed ------------|
> +
> +	     In this case, ADDR_TRAP will be 4.
> +
> +	     To match a watchpoint known to GDB core, we must never
> +	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> +	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> +	     positive on kernels older than 4.10.  See PR
> +	     external/20207.  */
> +	  *addr_p = addr_orig;
> +	  return true;
> +	}
> +    }
> +
> +  return false;
> +}
> diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
> index 2fc4b400ece..62e755dd1e9 100644
> --- a/gdb/nat/aarch64-linux-hw-point.h
> +++ b/gdb/nat/aarch64-linux-hw-point.h
> @@ -132,6 +132,13 @@ typedef ULONGEST dr_changed_t;
> #define DR_HAS_CHANGED(x) ((x) != 0)
> #define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))
> 
> +/* Mask for matching LDP and STP instruction variants.  */
> +#define LDP_STP_MASK  0xFE400000
> +/* stp and stnp with 64-bit registers.  */
> +#define STP_OPC_64    0xA8000000
> +/* ldp and ldnp with 64-bit registers.  */
> +#define LDP_OPC_64    0xA8400000
> +
> /* Structure for managing the hardware breakpoint/watchpoint resources.
>    DR_ADDR_* stores the address, DR_CTRL_* stores the control register
>    content, and DR_REF_COUNT_* counts the numbers of references to the
> @@ -197,4 +204,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);
> 
> int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);
> 
> +bool hw_watch_is_16b_ldst (CORE_ADDR insn);
> +
> +bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
> +			       CORE_ADDR watch_addr, size_t watch_len);
> +
> +bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
> +			      CORE_ADDR insn, CORE_ADDR stopped_data_address,
> +			      CORE_ADDR *addr_p);
> +
> #endif /* NAT_AARCH64_LINUX_HW_POINT_H */
> diff --git a/gdbserver/linux-aarch64-low.cc b/gdbserver/linux-aarch64-low.cc
> index daccfef746e..5df632fe724 100644
> --- a/gdbserver/linux-aarch64-low.cc
> +++ b/gdbserver/linux-aarch64-low.cc
> @@ -518,7 +518,7 @@ CORE_ADDR
> aarch64_target::low_stopped_data_address ()
> {
>   siginfo_t siginfo;
> -  int pid, i;
> +  int pid;
>   struct aarch64_debug_reg_state *state;
> 
>   pid = lwpid_of (current_thread);
> @@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
>   const CORE_ADDR addr_trap
>     = address_significant ((CORE_ADDR) siginfo.si_addr);
> 
> +  struct regcache *regs = get_thread_regcache (current_thread, 1);
> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
> +  uint32_t insn;
> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
> +
>   /* Check if the address matches any watched address.  */
>   state = aarch64_get_debug_reg_state (pid_of (current_thread));
> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
> -    {
> -      const unsigned int offset
> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> -
> -      if (state->dr_ref_count_wp[i]
> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
> -	  && addr_trap >= addr_watch_aligned
> -	  && addr_trap < addr_watch + len)
> -	{
> -	  /* ADDR_TRAP reports the first address of the memory range
> -	     accessed by the CPU, regardless of what was the memory
> -	     range watched.  Thus, a large CPU access that straddles
> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> -	     ADDR_TRAP that is lower than the
> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> -
> -	     addr: |   4   |   5   |   6   |   7   |   8   |
> -				   |---- range watched ----|
> -		   |----------- range accessed ------------|
> -
> -	     In this case, ADDR_TRAP will be 4.
> -
> -	     To match a watchpoint known to GDB core, we must never
> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> -	     positive on kernels older than 4.10.  See PR
> -	     external/20207.  */
> -	  return addr_orig;
> -	}
> -    }
> 
> -  return (CORE_ADDR) 0;
> +  CORE_ADDR trigger_addr;
> +
> +  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
> +    return trigger_addr;
> +
> +  return 0;
> }
> 
> /* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][AArch64] Fix missing watchpoint hits/endless loop
  2021-06-02 16:57 ` Alan Hayward
@ 2021-06-02 17:22   ` Luis Machado
  2021-06-03 10:22     ` Alan Hayward
  0 siblings, 1 reply; 6+ messages in thread
From: Luis Machado @ 2021-06-02 17:22 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gdb-patches\@sourceware.org, david.spickett, nd

On 6/2/21 1:57 PM, Alan Hayward wrote:
> 
> 
>> On 2 Jun 2021, at 15:27, Luis Machado <luis.machado@linaro.org> wrote:
>>
>> I ran into a situation where a hardware watchpoint hit is not detected
>> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
>>
>> The problem is that hardware watchpoints are not skippable on AArch64, so
>> that makes GDB loop endlessly trying to run past the instruction.
>>
>> The most obvious case where this happens is when the load/store pair
>> instructions access 16 bytes of memory.
>>
>> Suppose we have a stp instruction that will write a couple 64-bit registers
>> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
>>
>> Now suppose a write watchpoint is created to monitor memory address 0x18,
>> which is the start of the second register write. It can have whatever length,
>> but let's assume it has length 8.
>>
>> When we execute that stp instruction, it will trap and the reported stopped
>> data address from the kernel will be 0x10 (the beginning of the memory range
>> accessed by the instruction).
>>
>> The current code won't be able to detect a valid trigger because it assumes an
>> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
>> won't be enough to detect a 16-byte access if the trap address falls outside of
>> the 8-byte alignment window. We need to know how many bytes the instruction
>> will access, but we won't have that data unless we go parsing instructions.
>>
>> Another issue with the current code seems to be that it assumes the accesses
>> will always be 8 bytes in size, since it wants to align the watchpoint address
>> to that particular boundary. This leads to problems when we have unaligned
>> addresses and unaligned watchpoints.
>>
>> For example, suppose we have a str instruction storing 8 bytes to memory
>> address 0xf. Now suppose we have a write watchpoint at address 0x16,
>> monitoring 8 bytes.
>>
>> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
>> 0x10, and so GDB doesn't think this is a watchpoint hit.
> 
> Ouch.
> I’m starting to wonder now if I have encountered this bug myself, and just
> shrugged it off thinking the variable wasn’t being hit.
> 

One more detail worth mentioning is that different CPU's may report trap 
addresses that are more or less precise.

D05's Cortex A72's are precise, while Ampere's Altra and Cavium's 
ThunderX2 CPU's report the trap address from the beginning of the memory 
range, so that's less precise, but still valid according to the spec.

>>
>> I believe you can trigger the same problem with smaller memory accesses,
>> except one that accesses a single byte.
>>
>> As I said earlier, ideally we'd go parsing instructions to figure out how many
>> bytes they access. That is a bit complex though, so meanwhile I think we should
>> go with a simpler solution.
>>
>> We keep the assumption that the instructions access 8 bytes, except when we
>> detect one of the instructions that access 16 bytes.
> 
> What about NEON and SVE instructions? They can access a lot of data at once.
> 

I haven't written a test for those yet, but my guess is that our 
handling of watchpoint triggers caused by those instructions is not 
entirely correct either. If the trap address falls outside of the 
expected alignment window, we won't notice the hit.

>>
>> For the trigger detection, instead of forcing an alignment and then checking
>> if the trap address falls within the range of the watchpoint, we just check
>> if the memory access and the watchpoint range overlap. If they do, then we
>> have a watchpoint hit.
>>
>> This still has potential to give false positives if we choose the right
>> combination of memory address, access size and watchpoint address. But it
>> should match the old code in this regard, while giving better results for
>> accesses of 16 bytes and unaligned accesses.
> 
> Is there a plan for fixing those?
> 

There is, but I wanted to at least get this patch out before GDB 11 
branches. That ought to improve things for CPU's that don't report a 
precise trap address.

I think the best course of action is to find a fast way to determine how 
many bytes a particular instruction accesses. Given the differences in 
encoding between regular ld/st instructions, NEON and SVE variants, I'll 
need to come up with a more thorough testcase to make sure we're dealing 
with those correctly.

>>
>> It also fixes these two failures in the testsuite:
>>
>> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
>> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
>>
>> Regression tested on aarch64-linux Ubuntu/20.04.
>>
>> gdb/ChangeLog:
>>
>> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
>>
>> 	* aarch64-linux-nat.c
>> 	(aarch64_linux_nat_target::stopped_data_address): Refactor.
>> 	* nat/aarch64-linux-hw-point.c (hw_watch_is_16b_ldst)
>> 	(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
>> 	* nat/aarch64-linux-hw-point.h (LDP_STP_MASK, STP_OPC_64)
>> 	(LDP_OPC_64): New constants.
>> 	(hw_watch_is_16b_ldst, hw_watch_regions_overlap)
>> 	(hw_watch_detect_trigger): New prototypes.
>>
>> gdbserver/ChangeLog:
>>
>> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
>>
>> 	* linux-aarch64-low.cc
>> 	(aarch64_target::low_stopped_data_address): Refactor.
>> ---
>> gdb/aarch64-linux-nat.c          |  45 ++----------
>> gdb/nat/aarch64-linux-hw-point.c | 120 +++++++++++++++++++++++++++++++
>> gdb/nat/aarch64-linux-hw-point.h |  16 +++++
>> gdbserver/linux-aarch64-low.cc   |  49 ++++---------
>> 4 files changed, 154 insertions(+), 76 deletions(-)
>>
>> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
>> index 61224022f6a..de405a5e351 100644
>> --- a/gdb/aarch64-linux-nat.c
>> +++ b/gdb/aarch64-linux-nat.c
>> @@ -951,7 +951,6 @@ bool
>> aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>> {
>>    siginfo_t siginfo;
>> -  int i;
>>    struct aarch64_debug_reg_state *state;
>>
>>    if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
>> @@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>    const CORE_ADDR addr_trap
>>      = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);
>>
>> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>    /* Check if the address matches any watched address.  */
>>    state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> -    {
>> -      const unsigned int offset
>> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
>> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>> -
>> -      if (state->dr_ref_count_wp[i]
>> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
>> -	  && addr_trap >= addr_watch_aligned
>> -	  && addr_trap < addr_watch + len)
>> -	{
>> -	  /* ADDR_TRAP reports the first address of the memory range
>> -	     accessed by the CPU, regardless of what was the memory
>> -	     range watched.  Thus, a large CPU access that straddles
>> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
>> -	     ADDR_TRAP that is lower than the
>> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
>> -
>> -	     addr: |   4   |   5   |   6   |   7   |   8   |
>> -				   |---- range watched ----|
>> -		   |----------- range accessed ------------|
>> -
>> -	     In this case, ADDR_TRAP will be 4.
>> -
>> -	     To match a watchpoint known to GDB core, we must never
>> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
>> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
>> -	     positive on kernels older than 4.10.  See PR
>> -	     external/20207.  */
>> -	  *addr_p = addr_orig;
>> -	  return true;
>> -	}
>> -    }
>> -
>> -  return false;
>> +  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
>> }
>>
>> /* Implement the "stopped_by_watchpoint" target_ops method.  */
>> diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
>> index af2cc4254e2..32ed76157ac 100644
>> --- a/gdb/nat/aarch64-linux-hw-point.c
>> +++ b/gdb/nat/aarch64-linux-hw-point.c
>> @@ -865,3 +865,123 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
>>       the checking is costly.  */
>>    return 1;
>> }
>> +
>> +/* Return true if INSN is a ldp/stp that accesses 16 bytes of memory.
>> +   Return false otherwise.  */
>> +
>> +bool
>> +hw_watch_is_16b_ldst (CORE_ADDR insn)
>> +{
>> +  return ((insn & LDP_STP_MASK) == STP_OPC_64
>> +	  || (insn & LDP_STP_MASK) == LDP_OPC_64);
>> +}
>> +
>> +/* Return true if the regions [mem_addr, mem_addr + mem_len] and
>> +   [watch_addr, watch_addr + watch_len] overlap.  False otherwise.  */
>> +
>> +bool
>> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
>> +			  CORE_ADDR watch_addr, size_t watch_len)
>> +{
>> +  if (watch_addr > (mem_addr + mem_len)
>> +      || mem_addr > (watch_addr + watch_len))
>> +    return false;
>> +
>> +  CORE_ADDR start = std::max (mem_addr, watch_addr);
>> +  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
>> +
>> +  return ((end - start) > 0);
>> +}
>> +
>> +/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
>> +   return true and update ADDR_P with the stopped data address.
>> +   Otherwise return false.
>> +
>> +   STATE is the debug register's state, INSN is the instruction the inferior
>> +   stopped at and ADDR_TRAP is the reported stopped data address.  */
>> +
>> +bool
>> +hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
>> +			 CORE_ADDR insn, CORE_ADDR addr_trap,
>> +			 CORE_ADDR *addr_p)
>> +{
>> +  /* There are 6 variations of watchpoint range and memory access
>> +     range positioning:
>> +
>> +     - W is the byte in the watchpoint range only.
>> +
>> +     - M is the byte in the memory access range ony.
>> +
>> +     - O is the byte in the overlapping region of the watchpoint range and
>> +       the memory access range.
>> +
>> +     1 - Non-overlapping, no triggers.
>> +
>> +     [WWWWWWWW]...[MMMMMMMM]
>> +
>> +     2 - Non-overlapping, no triggers.
>> +
>> +     [MMMMMMMM]...[WWWWWWWW]
>> +
>> +     3 - Overlapping partially, triggers.
>> +
>> +     [MMMMOOOOWWWW]
>> +
>> +     4 - Overlapping partially, triggers.
>> +
>> +     [WWWWOOOOMMMM]
>> +
>> +     5 - Memory access contained in watchpoint range, triggers.
>> +
>> +     [WWWWOOOOOOOOWWWW]
>> +
>> +     6 - Memory access containing watchpoint range, triggers.
>> +
>> +     [MMMMOOOOOOOOMMMM]
> 
> Very minor nit: M looks like an upside down W, so there was an initial slight confusion.
> Not worth changing, because M and W are the right letter choices.
> 

I wonder if we can improve that in some way. Maybe name it B for "Memory 
Byte"?

>> +  */
>> +
>> +  /* We assume the memory access size is 8 bytes.  */
>> +  unsigned int memory_access_size = 8;
>> +
>> +  /* Check if the memory access size is 16 bytes.  */
>> +  if (hw_watch_is_16b_ldst (insn))
>> +    memory_access_size = 16;
>> +
>> +  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> +    {
>> +      const unsigned int offset
>> +	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>> +      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>> +      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> +      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>> +
>> +      if ((state->dr_ref_count_wp[i]
>> +	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
>> +	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
>> +				       addr_watch, len))
>> +	{
>> +	  /* ADDR_TRAP reports the first address of the memory range
>> +	     accessed by the CPU, regardless of what was the memory
>> +	     range watched.  Thus, a large CPU access that straddles
>> +	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
>> +	     ADDR_TRAP that is lower than the
>> +	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
>> +
>> +	     addr: |   4   |   5   |   6   |   7   |   8   |
>> +				   |---- range watched ----|
>> +		   |----------- range accessed ------------|
>> +
>> +	     In this case, ADDR_TRAP will be 4.
>> +
>> +	     To match a watchpoint known to GDB core, we must never
>> +	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
>> +	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
>> +	     positive on kernels older than 4.10.  See PR
>> +	     external/20207.  */
>> +	  *addr_p = addr_orig;
>> +	  return true;
>> +	}
>> +    }
>> +
>> +  return false;
>> +}
>> diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
>> index 2fc4b400ece..62e755dd1e9 100644
>> --- a/gdb/nat/aarch64-linux-hw-point.h
>> +++ b/gdb/nat/aarch64-linux-hw-point.h
>> @@ -132,6 +132,13 @@ typedef ULONGEST dr_changed_t;
>> #define DR_HAS_CHANGED(x) ((x) != 0)
>> #define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))
>>
>> +/* Mask for matching LDP and STP instruction variants.  */
>> +#define LDP_STP_MASK  0xFE400000
>> +/* stp and stnp with 64-bit registers.  */
>> +#define STP_OPC_64    0xA8000000
>> +/* ldp and ldnp with 64-bit registers.  */
>> +#define LDP_OPC_64    0xA8400000
>> +
>> /* Structure for managing the hardware breakpoint/watchpoint resources.
>>     DR_ADDR_* stores the address, DR_CTRL_* stores the control register
>>     content, and DR_REF_COUNT_* counts the numbers of references to the
>> @@ -197,4 +204,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);
>>
>> int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);
>>
>> +bool hw_watch_is_16b_ldst (CORE_ADDR insn);
>> +
>> +bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
>> +			       CORE_ADDR watch_addr, size_t watch_len);
>> +
>> +bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
>> +			      CORE_ADDR insn, CORE_ADDR stopped_data_address,
>> +			      CORE_ADDR *addr_p);
>> +
>> #endif /* NAT_AARCH64_LINUX_HW_POINT_H */
>> diff --git a/gdbserver/linux-aarch64-low.cc b/gdbserver/linux-aarch64-low.cc
>> index daccfef746e..5df632fe724 100644
>> --- a/gdbserver/linux-aarch64-low.cc
>> +++ b/gdbserver/linux-aarch64-low.cc
>> @@ -518,7 +518,7 @@ CORE_ADDR
>> aarch64_target::low_stopped_data_address ()
>> {
>>    siginfo_t siginfo;
>> -  int pid, i;
>> +  int pid;
>>    struct aarch64_debug_reg_state *state;
>>
>>    pid = lwpid_of (current_thread);
>> @@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
>>    const CORE_ADDR addr_trap
>>      = address_significant ((CORE_ADDR) siginfo.si_addr);
>>
>> +  struct regcache *regs = get_thread_regcache (current_thread, 1);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>    /* Check if the address matches any watched address.  */
>>    state = aarch64_get_debug_reg_state (pid_of (current_thread));
>> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> -    {
>> -      const unsigned int offset
>> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
>> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>> -
>> -      if (state->dr_ref_count_wp[i]
>> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
>> -	  && addr_trap >= addr_watch_aligned
>> -	  && addr_trap < addr_watch + len)
>> -	{
>> -	  /* ADDR_TRAP reports the first address of the memory range
>> -	     accessed by the CPU, regardless of what was the memory
>> -	     range watched.  Thus, a large CPU access that straddles
>> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
>> -	     ADDR_TRAP that is lower than the
>> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
>> -
>> -	     addr: |   4   |   5   |   6   |   7   |   8   |
>> -				   |---- range watched ----|
>> -		   |----------- range accessed ------------|
>> -
>> -	     In this case, ADDR_TRAP will be 4.
>> -
>> -	     To match a watchpoint known to GDB core, we must never
>> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
>> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
>> -	     positive on kernels older than 4.10.  See PR
>> -	     external/20207.  */
>> -	  return addr_orig;
>> -	}
>> -    }
>>
>> -  return (CORE_ADDR) 0;
>> +  CORE_ADDR trigger_addr;
>> +
>> +  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
>> +    return trigger_addr;
>> +
>> +  return 0;
>> }
>>
>> /* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
>> -- 
>> 2.25.1
>>
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][AArch64] Fix missing watchpoint hits/endless loop
  2021-06-02 17:22   ` Luis Machado
@ 2021-06-03 10:22     ` Alan Hayward
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Hayward @ 2021-06-03 10:22 UTC (permalink / raw)
  To: Luis Machado; +Cc: gdb-patches\@sourceware.org, david.spickett, nd



On 2 Jun 2021, at 18:22, Luis Machado <luis.machado@linaro.org<mailto:luis.machado@linaro.org>> wrote:

On 6/2/21 1:57 PM, Alan Hayward wrote:
On 2 Jun 2021, at 15:27, Luis Machado <luis.machado@linaro.org<mailto:luis.machado@linaro.org>> wrote:

I ran into a situation where a hardware watchpoint hit is not detected
correctly, misleading GDB into thinking it was a delayed breakpoint hit.

The problem is that hardware watchpoints are not skippable on AArch64, so
that makes GDB loop endlessly trying to run past the instruction.

The most obvious case where this happens is when the load/store pair
instructions access 16 bytes of memory.

Suppose we have a stp instruction that will write a couple 64-bit registers
to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.

Now suppose a write watchpoint is created to monitor memory address 0x18,
which is the start of the second register write. It can have whatever length,
but let's assume it has length 8.

When we execute that stp instruction, it will trap and the reported stopped
data address from the kernel will be 0x10 (the beginning of the memory range
accessed by the instruction).

The current code won't be able to detect a valid trigger because it assumes an
alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
won't be enough to detect a 16-byte access if the trap address falls outside of
the 8-byte alignment window. We need to know how many bytes the instruction
will access, but we won't have that data unless we go parsing instructions.

Another issue with the current code seems to be that it assumes the accesses
will always be 8 bytes in size, since it wants to align the watchpoint address
to that particular boundary. This leads to problems when we have unaligned
addresses and unaligned watchpoints.

For example, suppose we have a str instruction storing 8 bytes to memory
address 0xf. Now suppose we have a write watchpoint at address 0x16,
monitoring 8 bytes.

The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
0x10, and so GDB doesn't think this is a watchpoint hit.
Ouch.
I’m starting to wonder now if I have encountered this bug myself, and just
shrugged it off thinking the variable wasn’t being hit.

One more detail worth mentioning is that different CPU's may report trap addresses that are more or less precise.

D05's Cortex A72's are precise, while Ampere's Altra and Cavium's ThunderX2 CPU's report the trap address from the beginning of the memory range, so that's less precise, but still valid according to the spec.

Altra’s use Neoverse cores, which is based on Cortex A76.
That leads me to think “report the trap address from the beginning of memory range” is going to become much more common going forwards.



I believe you can trigger the same problem with smaller memory accesses,
except one that accesses a single byte.

As I said earlier, ideally we'd go parsing instructions to figure out how many
bytes they access. That is a bit complex though, so meanwhile I think we should
go with a simpler solution.

We keep the assumption that the instructions access 8 bytes, except when we
detect one of the instructions that access 16 bytes.
What about NEON and SVE instructions? They can access a lot of data at once.

I haven't written a test for those yet, but my guess is that our handling of watchpoint triggers caused by those instructions is not entirely correct either. If the trap address falls outside of the expected alignment window, we won't notice the hit.

Happy with that as a later patch. But… I’m still a little worried because with each additional patch, we’re in a situation where watchpoints now seem to work on, say, A72, but then they might suddenly miss changes for no obvious reason to the user. Especially given that, say, neon is used in general compiled code, but is fairly uncommon.




For the trigger detection, instead of forcing an alignment and then checking
if the trap address falls within the range of the watchpoint, we just check
if the memory access and the watchpoint range overlap. If they do, then we
have a watchpoint hit.

This still has potential to give false positives if we choose the right
combination of memory address, access size and watchpoint address. But it
should match the old code in this regard, while giving better results for
accesses of 16 bytes and unaligned accesses.
Is there a plan for fixing those?

There is, but I wanted to at least get this patch out before GDB 11 branches. That ought to improve things for CPU's that don't report a precise trap address.

I think the best course of action is to find a fast way to determine how many bytes a particular instruction accesses. Given the differences in encoding between regular ld/st instructions, NEON and SVE variants, I'll need to come up with a more thorough testcase to make sure we're dealing with those correctly.

Ok, given all the above comments, this sounds like a reasonable approach.
The code is clearly very broken right now.



It also fixes these two failures in the testsuite:

FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write

Regression tested on aarch64-linux Ubuntu/20.04.

gdb/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org<mailto:luis.machado@linaro.org>>

* aarch64-linux-nat.c
(aarch64_linux_nat_target::stopped_data_address): Refactor.
* nat/aarch64-linux-hw-point.c (hw_watch_is_16b_ldst)
(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
* nat/aarch64-linux-hw-point.h (LDP_STP_MASK, STP_OPC_64)
(LDP_OPC_64): New constants.
(hw_watch_is_16b_ldst, hw_watch_regions_overlap)
(hw_watch_detect_trigger): New prototypes.

gdbserver/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org<mailto:luis.machado@linaro.org>>

* linux-aarch64-low.cc<http://linux-aarch64-low.cc>
(aarch64_target::low_stopped_data_address): Refactor.
---
gdb/aarch64-linux-nat.c          |  45 ++----------
gdb/nat/aarch64-linux-hw-point.c | 120 +++++++++++++++++++++++++++++++
gdb/nat/aarch64-linux-hw-point.h |  16 +++++
gdbserver/linux-aarch64-low.cc<http://linux-aarch64-low.cc>   |  49 ++++---------
4 files changed, 154 insertions(+), 76 deletions(-)

diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
index 61224022f6a..de405a5e351 100644
--- a/gdb/aarch64-linux-nat.c
+++ b/gdb/aarch64-linux-nat.c
@@ -951,7 +951,6 @@ bool
aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
{
  siginfo_t siginfo;
-  int i;
  struct aarch64_debug_reg_state *state;

  if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
@@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
  const CORE_ADDR addr_trap
    = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);

+  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
  /* Check if the address matches any watched address.  */
  state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
- = aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-   && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-   && addr_trap >= addr_watch_aligned
-   && addr_trap < addr_watch + len)
- {
-   /* ADDR_TRAP reports the first address of the memory range
-      accessed by the CPU, regardless of what was the memory
-      range watched.  Thus, a large CPU access that straddles
-      the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-      ADDR_TRAP that is lower than the
-      ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-      addr: |   4   |   5   |   6   |   7   |   8   |
-    |---- range watched ----|
-    |----------- range accessed ------------|
-
-      In this case, ADDR_TRAP will be 4.
-
-      To match a watchpoint known to GDB core, we must never
-      report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-      range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-      positive on kernels older than 4.10.  See PR
-      external/20207.  */
-   *addr_p = addr_orig;
-   return true;
- }
-    }
-
-  return false;
+  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
}

/* Implement the "stopped_by_watchpoint" target_ops method.  */
diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
index af2cc4254e2..32ed76157ac 100644
--- a/gdb/nat/aarch64-linux-hw-point.c
+++ b/gdb/nat/aarch64-linux-hw-point.c
@@ -865,3 +865,123 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
     the checking is costly.  */
  return 1;
}
+
+/* Return true if INSN is a ldp/stp that accesses 16 bytes of memory.
+   Return false otherwise.  */
+
+bool
+hw_watch_is_16b_ldst (CORE_ADDR insn)
+{
+  return ((insn & LDP_STP_MASK) == STP_OPC_64
+   || (insn & LDP_STP_MASK) == LDP_OPC_64);
+}
+
+/* Return true if the regions [mem_addr, mem_addr + mem_len] and
+   [watch_addr, watch_addr + watch_len] overlap.  False otherwise.  */
+
+bool
+hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+   CORE_ADDR watch_addr, size_t watch_len)
+{
+  if (watch_addr > (mem_addr + mem_len)
+      || mem_addr > (watch_addr + watch_len))
+    return false;
+
+  CORE_ADDR start = std::max (mem_addr, watch_addr);
+  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
+
+  return ((end - start) > 0);
+}
+
+/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
+   return true and update ADDR_P with the stopped data address.
+   Otherwise return false.
+
+   STATE is the debug register's state, INSN is the instruction the inferior
+   stopped at and ADDR_TRAP is the reported stopped data address.  */
+
+bool
+hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+  CORE_ADDR insn, CORE_ADDR addr_trap,
+  CORE_ADDR *addr_p)
+{
+  /* There are 6 variations of watchpoint range and memory access
+     range positioning:
+
+     - W is the byte in the watchpoint range only.
+
+     - M is the byte in the memory access range ony.
+
+     - O is the byte in the overlapping region of the watchpoint range and
+       the memory access range.
+
+     1 - Non-overlapping, no triggers.
+
+     [WWWWWWWW]...[MMMMMMMM]
+
+     2 - Non-overlapping, no triggers.
+
+     [MMMMMMMM]...[WWWWWWWW]
+
+     3 - Overlapping partially, triggers.
+
+     [MMMMOOOOWWWW]
+
+     4 - Overlapping partially, triggers.
+
+     [WWWWOOOOMMMM]
+
+     5 - Memory access contained in watchpoint range, triggers.
+
+     [WWWWOOOOOOOOWWWW]
+
+     6 - Memory access containing watchpoint range, triggers.
+
+     [MMMMOOOOOOOOMMMM]
Very minor nit: M looks like an upside down W, so there was an initial slight confusion.
Not worth changing, because M and W are the right letter choices.

I wonder if we can improve that in some way. Maybe name it B for "Memory Byte"?

B might work.


+  */
+
+  /* We assume the memory access size is 8 bytes.  */
+  unsigned int memory_access_size = 8;
+
+  /* Check if the memory access size is 16 bytes.  */
+  if (hw_watch_is_16b_ldst (insn))
+    memory_access_size = 16;
+
+  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)

If there are multiple matching watchpoints, then this will just return the newest one.
That’s the behaviour we want?

+    {
+      const unsigned int offset
+ = aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
+      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
+      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
+      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
+
+      if ((state->dr_ref_count_wp[i]
+   && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
+   && hw_watch_regions_overlap (addr_trap, memory_access_size,
+        addr_watch, len))
+ {
+   /* ADDR_TRAP reports the first address of the memory range
+      accessed by the CPU, regardless of what was the memory
+      range watched.  Thus, a large CPU access that straddles
+      the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
+      ADDR_TRAP that is lower than the
+      ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
+
+      addr: |   4   |   5   |   6   |   7   |   8   |
+    |---- range watched ----|
+    |----------- range accessed ------------|
+
+      In this case, ADDR_TRAP will be 4.
+
+      To match a watchpoint known to GDB core, we must never
+      report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
+      range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
+      positive on kernels older than 4.10.  See PR
+      external/20207.  */
+   *addr_p = addr_orig;
+   return true;
+ }
+    }
+
+  return false;
+}
diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
index 2fc4b400ece..62e755dd1e9 100644
--- a/gdb/nat/aarch64-linux-hw-point.h
+++ b/gdb/nat/aarch64-linux-hw-point.h
@@ -132,6 +132,13 @@ typedef ULONGEST dr_changed_t;
#define DR_HAS_CHANGED(x) ((x) != 0)
#define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))

+/* Mask for matching LDP and STP instruction variants.  */
+#define LDP_STP_MASK  0xFE400000
+/* stp and stnp with 64-bit registers.  */
+#define STP_OPC_64    0xA8000000
+/* ldp and ldnp with 64-bit registers.  */
+#define LDP_OPC_64    0xA8400000
+
/* Structure for managing the hardware breakpoint/watchpoint resources.
   DR_ADDR_* stores the address, DR_CTRL_* stores the control register
   content, and DR_REF_COUNT_* counts the numbers of references to the
@@ -197,4 +204,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);

int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);

+bool hw_watch_is_16b_ldst (CORE_ADDR insn);
+
+bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+        CORE_ADDR watch_addr, size_t watch_len);
+
+bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+       CORE_ADDR insn, CORE_ADDR stopped_data_address,
+       CORE_ADDR *addr_p);
+
#endif /* NAT_AARCH64_LINUX_HW_POINT_H */
diff --git a/gdbserver/linux-aarch64-low.cc<http://linux-aarch64-low.cc> b/gdbserver/linux-aarch64-low.cc<http://linux-aarch64-low.cc>
index daccfef746e..5df632fe724 100644
--- a/gdbserver/linux-aarch64-low.cc<http://linux-aarch64-low.cc>
+++ b/gdbserver/linux-aarch64-low.cc<http://linux-aarch64-low.cc>
@@ -518,7 +518,7 @@ CORE_ADDR
aarch64_target::low_stopped_data_address ()
{
  siginfo_t siginfo;
-  int pid, i;
+  int pid;
  struct aarch64_debug_reg_state *state;

  pid = lwpid_of (current_thread);
@@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
  const CORE_ADDR addr_trap
    = address_significant ((CORE_ADDR) siginfo.si_addr);

+  struct regcache *regs = get_thread_regcache (current_thread, 1);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
  /* Check if the address matches any watched address.  */
  state = aarch64_get_debug_reg_state (pid_of (current_thread));
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
- = aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-   && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-   && addr_trap >= addr_watch_aligned
-   && addr_trap < addr_watch + len)
- {
-   /* ADDR_TRAP reports the first address of the memory range
-      accessed by the CPU, regardless of what was the memory
-      range watched.  Thus, a large CPU access that straddles
-      the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-      ADDR_TRAP that is lower than the
-      ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-      addr: |   4   |   5   |   6   |   7   |   8   |
-    |---- range watched ----|
-    |----------- range accessed ------------|
-
-      In this case, ADDR_TRAP will be 4.
-
-      To match a watchpoint known to GDB core, we must never
-      report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-      range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-      positive on kernels older than 4.10.  See PR
-      external/20207.  */
-   return addr_orig;
- }
-    }

-  return (CORE_ADDR) 0;
+  CORE_ADDR trigger_addr;
+
+  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
+    return trigger_addr;
+
+  return 0;
}

/* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
--
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH, v2][AArch64] Fix missing watchpoint hits/endless loop
  2021-06-02 14:27 [PATCH][AArch64] Fix missing watchpoint hits/endless loop Luis Machado
  2021-06-02 16:57 ` Alan Hayward
@ 2021-07-23 13:24 ` Luis Machado
  2021-08-17 15:30   ` Alan Hayward
  1 sibling, 1 reply; 6+ messages in thread
From: Luis Machado @ 2021-07-23 13:24 UTC (permalink / raw)
  To: gdb-patches

Updates on v2:

- Handle more types of load/store instructions, including a catch all
  for SVE load/store instructions.

--

I ran into a situation where a hardware watchpoint hit is not detected
correctly, misleading GDB into thinking it was a delayed breakpoint hit.

The problem is that hardware watchpoints are not skippable on AArch64, so
that makes GDB loop endlessly trying to run past the instruction.

The most obvious case where this happens is when the load/store pair
instructions access 16 bytes of memory.

Suppose we have a stp instruction that will write a couple 64-bit registers
to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.

Now suppose a write watchpoint is created to monitor memory address 0x18,
which is the start of the second register write. It can have whatever length,
but let's assume it has length 8.

When we execute that stp instruction, it will trap and the reported stopped
data address from the kernel will be 0x10 (the beginning of the memory range
accessed by the instruction).

The current code won't be able to detect a valid trigger because it assumes an
alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
won't be enough to detect a 16-byte access if the trap address falls outside of
the 8-byte alignment window. We need to know how many bytes the instruction
will access, but we won't have that data unless we go parsing instructions.

Another issue with the current code seems to be that it assumes the accesses
will always be 8 bytes in size, since it wants to align the watchpoint address
to that particular boundary. This leads to problems when we have unaligned
addresses and unaligned watchpoints.

For example, suppose we have a str instruction storing 8 bytes to memory
address 0xf. Now suppose we have a write watchpoint at address 0x16,
monitoring 8 bytes.

The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
0x10, and so GDB doesn't think this is a watchpoint hit.

I believe you can trigger the same problem with smaller memory accesses,
except one that accesses a single byte.

In order to cover most of the cases correctly, we parse the load/store
instructions and detect how many bytes they're accessing. That will give us
enough information to tell if a hardware watchpoint triggered or not.

The SVE load/store support is only a placeholder for now, as we need to
parse the instructions and use the variable length to determine the memory
access size (planned for the future).

The patch also fixes these two failures in the testsuite:

FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write

Regression tested on aarch64-linux Ubuntu/20.04 and Ubuntu/18.04.

I also used a very slow aarch64 hardware watchpoint stress test to validate
this patch, but I don't think that particular test should be included in the
testsuite. It takes quite a while to run (20+ minutes), and shouldn't be
required unless we know there are problems with hardware watchpoint handling.

gdb/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>

	* aarch64-linux-nat.c
	(aarch64_linux_nat_target::stopped_data_address): Refactor.
	* nat/aarch64-linux-hw-point.c (get_load_store_access_size)
	(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
	* nat/aarch64-linux-hw-point.h (GENERAL_LOAD_STORE_P)
	(SVE_LOAD_STORE_P, LOAD_STORE_P, LOAD_STORE_PAIR_P)
	(COMPARE_SWAP_PAIR_P, LOAD_STORE_EXCLUSIVE_PAIR_P)
	(LOAD_STORE_LITERAL_P, LOAD_STORE_MS, LOAD_STORE_SS): New constants.
	(hw_watch_regions_overlap, hw_watch_detect_trigger): New prototypes.
	* nat/aarch64-linux-hw-point.h: Include arch/aarch64-insn.h.

gdbserver/ChangeLog:

YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>

	* linux-aarch64-low.cc
	(aarch64_target::low_stopped_data_address): Refactor.
---
 gdb/aarch64-linux-nat.c          |  45 +-----
 gdb/nat/aarch64-linux-hw-point.c | 239 +++++++++++++++++++++++++++++++
 gdb/nat/aarch64-linux-hw-point.h |  68 +++++++++
 gdbserver/linux-aarch64-low.cc   |  49 ++-----
 4 files changed, 325 insertions(+), 76 deletions(-)

diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
index c7cbebbc351..2c4b7c5d10d 100644
--- a/gdb/aarch64-linux-nat.c
+++ b/gdb/aarch64-linux-nat.c
@@ -951,7 +951,6 @@ bool
 aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
 {
   siginfo_t siginfo;
-  int i;
   struct aarch64_debug_reg_state *state;
 
   if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
@@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
   const CORE_ADDR addr_trap
     = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);
 
+  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
   /* Check if the address matches any watched address.  */
   state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
-	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch_aligned
-	  && addr_trap < addr_watch + len)
-	{
-	  /* ADDR_TRAP reports the first address of the memory range
-	     accessed by the CPU, regardless of what was the memory
-	     range watched.  Thus, a large CPU access that straddles
-	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-	     ADDR_TRAP that is lower than the
-	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-	     addr: |   4   |   5   |   6   |   7   |   8   |
-				   |---- range watched ----|
-		   |----------- range accessed ------------|
-
-	     In this case, ADDR_TRAP will be 4.
-
-	     To match a watchpoint known to GDB core, we must never
-	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-	     positive on kernels older than 4.10.  See PR
-	     external/20207.  */
-	  *addr_p = addr_orig;
-	  return true;
-	}
-    }
-
-  return false;
+  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
 }
 
 /* Implement the "stopped_by_watchpoint" target_ops method.  */
diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
index af2cc4254e2..d94209d0081 100644
--- a/gdb/nat/aarch64-linux-hw-point.c
+++ b/gdb/nat/aarch64-linux-hw-point.c
@@ -865,3 +865,242 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
      the checking is costly.  */
   return 1;
 }
+
+/* Assuming INSN is a load/store instruction, return the size of the
+   memory access.  The patterns are documented in the ARM Architecture
+   Reference Manual - Index by encoding.  */
+
+unsigned int
+get_load_store_access_size (CORE_ADDR insn)
+{
+  if (SVE_LOAD_STORE_P (insn))
+    {
+      /* SVE load/store instructions.  */
+
+      /* This is not always correct for SVE instructions, but it is a reasonable
+	 default for now.  Calculating the exact memory access size for SVE
+	 load/store instructions requires parsing instructions and evaluating
+	 the vector length.  */
+      return 16;
+    }
+
+  /* Non-SVE instructions.  */
+
+  bool vector = (bit (insn, 26) == 1);
+  bool ldst_pair = LOAD_STORE_PAIR_P (insn);
+
+  /* Is this an Advanced SIMD load/store instruction?  */
+  if (vector)
+    {
+      unsigned int size = bits (insn, 30, 31);
+
+      if (LOAD_STORE_LITERAL_P (insn))
+	{
+	  /* Advanced SIMD load/store literal */
+	  /* Sizes 4, 8 and 16 bytes.  */
+	  return 4 << size;
+	}
+      else if (LOAD_STORE_MS (insn))
+	{
+	  size = 8 << bit (insn, 30);
+
+	  unsigned char opcode = bits (insn, 12, 15);
+
+	  if (opcode == 0x0 || opcode == 0x2)
+	    size *= 4;
+	  else if (opcode == 0x4 || opcode == 0x6)
+	    size *= 3;
+	  else if (opcode == 0x8 || opcode == 0xa)
+	    size *= 2;
+
+	  return size;
+	}
+      else if (LOAD_STORE_SS (insn))
+	{
+	  size = 8 << bit (insn, 30);
+	  return size;
+	}
+      /* Advanced SIMD load/store instructions.  */
+      else if (ldst_pair)
+	{
+	  /* Advanced SIMD load/store pair.  */
+	  /* Sizes 8, 16 and 32 bytes.  */
+	  return (8 << size);
+	}
+      else
+	{
+	  /* Regular Advanced SIMD load/store instructions.  */
+	  size = size | (bit (insn, 23) << 2);
+	  return 1 << size;
+	}
+    }
+
+  /* This must be a regular GPR load/store instruction.  */
+
+  unsigned int size = bits (insn, 30, 31);
+  bool cs_pair = COMPARE_SWAP_PAIR_P (insn);
+  bool ldstx_pair = LOAD_STORE_EXCLUSIVE_PAIR_P (insn);
+
+  if (ldst_pair)
+    {
+      /* Load/Store pair.  */
+      if (size == 0x1)
+	{
+	  if (bit (insn, 22) == 0)
+	    /* STGP - 16 bytes.  */
+	    return 16;
+	  else
+	    /* LDPSW - 8 bytes.  */
+	    return 8;
+	}
+      /* Other Load/Store pair.  */
+      return (size == 0)? 8 : 16;
+    }
+  else if (cs_pair || ldstx_pair)
+    {
+      /* Compare Swap Pair or Load Store Exclusive Pair.  */
+      /* Sizes 8 and 16 bytes.  */
+      size = bit (insn, 30);
+      return (8 << size);
+    }
+  else if (LOAD_STORE_LITERAL_P (insn))
+    {
+      /* Load/Store literal.  */
+      /* Sizes between 4 and 8 bytes.  */
+      if (size == 0x2)
+	return 4;
+
+      return 4 << size;
+    }
+  else
+    {
+      /* General load/store instructions, with memory access sizes between
+	 1 and 8 bytes.  */
+      return (1 << size);
+    }
+}
+
+/* Return true if the regions [mem_addr, mem_addr + mem_len) and
+   [watch_addr, watch_addr + watch_len) overlap.  False otherwise.  */
+
+bool
+hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+			  CORE_ADDR watch_addr, size_t watch_len)
+{
+  /* Quick check for non-overlapping case.  */
+  if (watch_addr >= (mem_addr + mem_len)
+      || mem_addr >= (watch_addr + watch_len))
+    return false;
+
+  CORE_ADDR start = std::max (mem_addr, watch_addr);
+  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
+
+  return ((end - start) > 0);
+}
+
+/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
+   return true and update ADDR_P with the stopped data address.
+   Otherwise return false.
+
+   STATE is the debug register's state, INSN is the instruction the inferior
+   stopped at and ADDR_TRAP is the reported stopped data address.  */
+
+bool
+hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+			 CORE_ADDR insn, CORE_ADDR addr_trap,
+			 CORE_ADDR *addr_p)
+{
+  /* There are 6 variations of watchpoint range and memory access
+     range positioning:
+
+     - W is the byte in the watchpoint range only.
+
+     - B is the byte in the memory access range ony.
+
+     - O is the byte in the overlapping region of the watchpoint range and
+       the memory access range.
+
+     1 - Non-overlapping, no triggers.
+
+     [WWWWWWWW]...[BBBBBBBB]
+
+     2 - Non-overlapping, no triggers.
+
+     [BBBBBBBB]...[WWWWWWWW]
+
+     3 - Overlapping partially, triggers.
+
+     [BBBBOOOOWWWW]
+
+     4 - Overlapping partially, triggers.
+
+     [WWWWOOOOBBBB]
+
+     5 - Memory access contained in watchpoint range, triggers.
+
+     [WWWWOOOOOOOOWWWW]
+
+     6 - Memory access containing watchpoint range, triggers.
+
+     [BBBBOOOOOOOOBBBB]
+  */
+
+  /* If there are no load/store instructions at PC, this must be a hardware
+     breakpoint hit.  Return early.
+
+     If a hardware breakpoint is placed at a PC containing a load/store
+     instruction, we will go through the memory access size check anyway, but
+     will eventually figure out there are no watchpoints matching such an
+     address.
+
+     There is one corner case though, which is having a hardware breakpoint and
+     a hardware watchpoint at PC, when PC contains a load/store
+     instruction.  That is an ambiguous case that is hard to differentiate.
+
+     There's not much we can do about that unless the kernel sends us enough
+     information to tell them apart.  */
+  if (!LOAD_STORE_P (insn))
+    return false;
+
+  /* Get the memory access size for the instruction at PC.  */
+  unsigned int memory_access_size = get_load_store_access_size (insn);
+
+  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
+    {
+      const unsigned int offset
+	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
+      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
+      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
+      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
+
+      if ((state->dr_ref_count_wp[i]
+	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
+	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
+				       addr_watch, len))
+	{
+	  /* ADDR_TRAP reports the first address of the memory range
+	     accessed by the CPU, regardless of what was the memory
+	     range watched.  Thus, a large CPU access that straddles
+	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
+	     ADDR_TRAP that is lower than the
+	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
+
+	     addr: |   4   |   5   |   6   |   7   |   8   |
+				   |---- range watched ----|
+		   |----------- range accessed ------------|
+
+	     In this case, ADDR_TRAP will be 4.
+
+	     To match a watchpoint known to GDB core, we must never
+	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
+	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
+	     positive on kernels older than 4.10.  See PR
+	     external/20207.  */
+	  *addr_p = addr_orig;
+	  return true;
+	}
+    }
+
+  /* No hardware watchpoint hits detected.  */
+  return false;
+}
diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
index 2fc4b400ece..2a08f6ad122 100644
--- a/gdb/nat/aarch64-linux-hw-point.h
+++ b/gdb/nat/aarch64-linux-hw-point.h
@@ -20,6 +20,7 @@
 #define NAT_AARCH64_LINUX_HW_POINT_H
 
 #include "gdbsupport/break-common.h" /* For enum target_hw_bp_type.  */
+#include "arch/aarch64-insn.h" /* For bit/bits.  */
 
 /* Macro definitions, data structures, and code for the hardware
    breakpoint and hardware watchpoint support follow.  We use the
@@ -132,6 +133,64 @@ typedef ULONGEST dr_changed_t;
 #define DR_HAS_CHANGED(x) ((x) != 0)
 #define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))
 
+/* Macros to distinguish between various types of load/store instructions.  */
+
+/* Regular and Advanced SIMD load/store instructions.  */
+#define GENERAL_LOAD_STORE_P(insn) (bit (insn, 25) == 0 && bit (insn, 27) == 1)
+
+/* SVE load/store instruction.  */
+#define SVE_LOAD_STORE_P(insn) (bits (insn, 25, 28) == 0x2		      \
+				&& (bits (insn, 29, 31) == 0x4		      \
+				    || bits (insn, 29, 31) == 0x5	      \
+				    || bits (insn, 29, 31) == 0x6	      \
+				    || (bits (insn, 29, 31) == 0x7	      \
+					&& ((bit (insn, 15) == 0x0	      \
+					     && (bit (insn, 13) == 0x0	      \
+						 || bit (insn, 13) == 0x1))   \
+					     || (bit (insn, 15) == 0x1	      \
+						 && bit (insn, 13) == 0x0)    \
+					     || bits (insn, 13, 15) == 0x6    \
+					     || bits (insn, 13, 15) == 0x7))))
+
+/* Any load/store instruction (regular, Advanced SIMD or SVE).  */
+#define LOAD_STORE_P(insn) (GENERAL_LOAD_STORE_P (insn)			      \
+			    || SVE_LOAD_STORE_P (insn))
+
+/* Load/Store pair (regular or vector).  */
+#define LOAD_STORE_PAIR_P(insn) (bit (insn, 28) == 0 && bit (insn, 29) == 1)
+
+#define COMPARE_SWAP_PAIR_P(insn) (bits (insn, 30, 31) == 0		      \
+				   && bits (insn, 28, 29) == 0		      \
+				   && bit (insn, 26) == 0		      \
+				   && bits (insn, 23, 24) == 0		      \
+				   && bit (insn, 11) == 1)
+#define LOAD_STORE_EXCLUSIVE_PAIR_P(insn) (bit (insn, 31) == 1		      \
+					   && bits (insn, 28, 29) == 0	      \
+					   && bit (insn, 26) == 0	      \
+					   && bits (insn, 23, 24) == 0	      \
+					   && bit (insn, 11) == 1)
+
+#define LOAD_STORE_LITERAL_P(insn) (bit (insn, 28) == 1			      \
+				    && bit (insn, 29) == 0		      \
+				    && bit (insn, 24) == 0)
+
+/* Vector Load/Store multiple structures.  */
+#define LOAD_STORE_MS(insn) (bits (insn, 28, 29) == 0x0			      \
+			     && bit (insn, 31) == 0x0			      \
+			     && bit (insn, 26) == 0x1			      \
+			     && ((bits (insn, 23, 24) == 0x0		      \
+				  && bits (insn, 16, 21) == 0x0)	      \
+				 || (bits (insn, 23, 24) == 0x1		      \
+				     && bit (insn, 21) == 0x0)))
+
+/* Vector Load/Store single structure.  */
+#define LOAD_STORE_SS(insn) (bits (insn, 28, 29) == 0x0			      \
+			     && bit (insn, 31) == 0x0			      \
+			     && bit (insn, 26) == 0x1			      \
+			     && ((bits (insn, 23, 24) == 0x2		      \
+				  && bits (insn, 16, 20) == 0x0)	      \
+				 || (bits (insn, 23, 24) == 0x3)))
+
 /* Structure for managing the hardware breakpoint/watchpoint resources.
    DR_ADDR_* stores the address, DR_CTRL_* stores the control register
    content, and DR_REF_COUNT_* counts the numbers of references to the
@@ -197,4 +256,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);
 
 int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);
 
+unsigned int get_load_store_access_size (CORE_ADDR insn);
+
+bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
+			       CORE_ADDR watch_addr, size_t watch_len);
+
+bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
+			      CORE_ADDR insn, CORE_ADDR stopped_data_address,
+			      CORE_ADDR *addr_p);
+
 #endif /* NAT_AARCH64_LINUX_HW_POINT_H */
diff --git a/gdbserver/linux-aarch64-low.cc b/gdbserver/linux-aarch64-low.cc
index daccfef746e..5df632fe724 100644
--- a/gdbserver/linux-aarch64-low.cc
+++ b/gdbserver/linux-aarch64-low.cc
@@ -518,7 +518,7 @@ CORE_ADDR
 aarch64_target::low_stopped_data_address ()
 {
   siginfo_t siginfo;
-  int pid, i;
+  int pid;
   struct aarch64_debug_reg_state *state;
 
   pid = lwpid_of (current_thread);
@@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
   const CORE_ADDR addr_trap
     = address_significant ((CORE_ADDR) siginfo.si_addr);
 
+  struct regcache *regs = get_thread_regcache (current_thread, 1);
+  CORE_ADDR trigger_pc = regcache_read_pc (regs);
+  uint32_t insn;
+  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
+
   /* Check if the address matches any watched address.  */
   state = aarch64_get_debug_reg_state (pid_of (current_thread));
-  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
-    {
-      const unsigned int offset
-	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
-      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
-      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
-      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
-
-      if (state->dr_ref_count_wp[i]
-	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch_aligned
-	  && addr_trap < addr_watch + len)
-	{
-	  /* ADDR_TRAP reports the first address of the memory range
-	     accessed by the CPU, regardless of what was the memory
-	     range watched.  Thus, a large CPU access that straddles
-	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
-	     ADDR_TRAP that is lower than the
-	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
-
-	     addr: |   4   |   5   |   6   |   7   |   8   |
-				   |---- range watched ----|
-		   |----------- range accessed ------------|
-
-	     In this case, ADDR_TRAP will be 4.
-
-	     To match a watchpoint known to GDB core, we must never
-	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
-	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
-	     positive on kernels older than 4.10.  See PR
-	     external/20207.  */
-	  return addr_orig;
-	}
-    }
 
-  return (CORE_ADDR) 0;
+  CORE_ADDR trigger_addr;
+
+  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
+    return trigger_addr;
+
+  return 0;
 }
 
 /* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, v2][AArch64] Fix missing watchpoint hits/endless loop
  2021-07-23 13:24 ` [PATCH, v2][AArch64] " Luis Machado
@ 2021-08-17 15:30   ` Alan Hayward
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Hayward @ 2021-08-17 15:30 UTC (permalink / raw)
  To: Luis Machado; +Cc: gdb-patches\@sourceware.org, nd



> On 23 Jul 2021, at 14:24, Luis Machado <luis.machado@linaro.org> wrote:
> 
> Updates on v2:
> 
> - Handle more types of load/store instructions, including a catch all
>  for SVE load/store instructions.
> 
> --
> 
> I ran into a situation where a hardware watchpoint hit is not detected
> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
> 
> The problem is that hardware watchpoints are not skippable on AArch64, so
> that makes GDB loop endlessly trying to run past the instruction.
> 
> The most obvious case where this happens is when the load/store pair
> instructions access 16 bytes of memory.
> 
> Suppose we have a stp instruction that will write a couple 64-bit registers
> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
> 
> Now suppose a write watchpoint is created to monitor memory address 0x18,
> which is the start of the second register write. It can have whatever length,
> but let's assume it has length 8.
> 
> When we execute that stp instruction, it will trap and the reported stopped
> data address from the kernel will be 0x10 (the beginning of the memory range
> accessed by the instruction).
> 
> The current code won't be able to detect a valid trigger because it assumes an
> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
> won't be enough to detect a 16-byte access if the trap address falls outside of
> the 8-byte alignment window. We need to know how many bytes the instruction
> will access, but we won't have that data unless we go parsing instructions.
> 
> Another issue with the current code seems to be that it assumes the accesses
> will always be 8 bytes in size, since it wants to align the watchpoint address
> to that particular boundary. This leads to problems when we have unaligned
> addresses and unaligned watchpoints.
> 
> For example, suppose we have a str instruction storing 8 bytes to memory
> address 0xf. Now suppose we have a write watchpoint at address 0x16,
> monitoring 8 bytes.
> 
> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
> 0x10, and so GDB doesn't think this is a watchpoint hit.
> 
> I believe you can trigger the same problem with smaller memory accesses,
> except one that accesses a single byte.
> 
> In order to cover most of the cases correctly, we parse the load/store
> instructions and detect how many bytes they're accessing. That will give us
> enough information to tell if a hardware watchpoint triggered or not.
> 
> The SVE load/store support is only a placeholder for now, as we need to
> parse the instructions and use the variable length to determine the memory
> access size (planned for the future).
> 
> The patch also fixes these two failures in the testsuite:
> 
> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
> 
> Regression tested on aarch64-linux Ubuntu/20.04 and Ubuntu/18.04.
> 
> I also used a very slow aarch64 hardware watchpoint stress test to validate
> this patch, but I don't think that particular test should be included in the
> testsuite. It takes quite a while to run (20+ minutes), and shouldn't be
> required unless we know there are problems with hardware watchpoint handling.

It’s a shame there isn’t anywhere it can go, but ok.

> 
> gdb/ChangeLog:
> 
> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
> 
> 	* aarch64-linux-nat.c
> 	(aarch64_linux_nat_target::stopped_data_address): Refactor.
> 	* nat/aarch64-linux-hw-point.c (get_load_store_access_size)
> 	(hw_watch_regions_overlap, hw_watch_detect_trigger): New functions.
> 	* nat/aarch64-linux-hw-point.h (GENERAL_LOAD_STORE_P)
> 	(SVE_LOAD_STORE_P, LOAD_STORE_P, LOAD_STORE_PAIR_P)
> 	(COMPARE_SWAP_PAIR_P, LOAD_STORE_EXCLUSIVE_PAIR_P)
> 	(LOAD_STORE_LITERAL_P, LOAD_STORE_MS, LOAD_STORE_SS): New constants.
> 	(hw_watch_regions_overlap, hw_watch_detect_trigger): New prototypes.
> 	* nat/aarch64-linux-hw-point.h: Include arch/aarch64-insn.h.
> 
> gdbserver/ChangeLog:
> 
> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
> 
> 	* linux-aarch64-low.cc
> 	(aarch64_target::low_stopped_data_address): Refactor.
> ---
> gdb/aarch64-linux-nat.c          |  45 +-----
> gdb/nat/aarch64-linux-hw-point.c | 239 +++++++++++++++++++++++++++++++
> gdb/nat/aarch64-linux-hw-point.h |  68 +++++++++
> gdbserver/linux-aarch64-low.cc   |  49 ++-----
> 4 files changed, 325 insertions(+), 76 deletions(-)
> 
> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
> index c7cbebbc351..2c4b7c5d10d 100644
> --- a/gdb/aarch64-linux-nat.c
> +++ b/gdb/aarch64-linux-nat.c
> @@ -951,7 +951,6 @@ bool
> aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
> {
>   siginfo_t siginfo;
> -  int i;
>   struct aarch64_debug_reg_state *state;
> 
>   if (!linux_nat_get_siginfo (inferior_ptid, &siginfo))
> @@ -969,46 +968,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>   const CORE_ADDR addr_trap
>     = address_significant (gdbarch, (CORE_ADDR) siginfo.si_addr);
> 
> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
> +  uint32_t insn;
> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
> +
>   /* Check if the address matches any watched address.  */
>   state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
> -    {
> -      const unsigned int offset
> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> -
> -      if (state->dr_ref_count_wp[i]
> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
> -	  && addr_trap >= addr_watch_aligned
> -	  && addr_trap < addr_watch + len)
> -	{
> -	  /* ADDR_TRAP reports the first address of the memory range
> -	     accessed by the CPU, regardless of what was the memory
> -	     range watched.  Thus, a large CPU access that straddles
> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> -	     ADDR_TRAP that is lower than the
> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> -
> -	     addr: |   4   |   5   |   6   |   7   |   8   |
> -				   |---- range watched ----|
> -		   |----------- range accessed ------------|
> -
> -	     In this case, ADDR_TRAP will be 4.
> -
> -	     To match a watchpoint known to GDB core, we must never
> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> -	     positive on kernels older than 4.10.  See PR
> -	     external/20207.  */
> -	  *addr_p = addr_orig;
> -	  return true;
> -	}
> -    }
> -
> -  return false;
> +  return hw_watch_detect_trigger (state, insn, addr_trap, addr_p);
> }
> 
> /* Implement the "stopped_by_watchpoint" target_ops method.  */
> diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
> index af2cc4254e2..d94209d0081 100644
> --- a/gdb/nat/aarch64-linux-hw-point.c
> +++ b/gdb/nat/aarch64-linux-hw-point.c
> @@ -865,3 +865,242 @@ aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len)
>      the checking is costly.  */
>   return 1;
> }
> +
> +/* Assuming INSN is a load/store instruction, return the size of the
> +   memory access.  The patterns are documented in the ARM Architecture
> +   Reference Manual - Index by encoding.  */
> +
> +unsigned int
> +get_load_store_access_size (CORE_ADDR insn)

Maybe this function belongs in arch-insn?

> +{
> +  if (SVE_LOAD_STORE_P (insn))
> +    {
> +      /* SVE load/store instructions.  */
> +
> +      /* This is not always correct for SVE instructions, but it is a reasonable
> +	 default for now.  Calculating the exact memory access size for SVE
> +	 load/store instructions requires parsing instructions and evaluating
> +	 the vector length.  */
> +      return 16;

… ok. Do we still use TODO in comments?

> +    }
> +
> +  /* Non-SVE instructions.  */
> +
> +  bool vector = (bit (insn, 26) == 1);
> +  bool ldst_pair = LOAD_STORE_PAIR_P (insn);
> +
> +  /* Is this an Advanced SIMD load/store instruction?  */
> +  if (vector)
> +    {
> +      unsigned int size = bits (insn, 30, 31);
> +
> +      if (LOAD_STORE_LITERAL_P (insn))
> +	{
> +	  /* Advanced SIMD load/store literal */
> +	  /* Sizes 4, 8 and 16 bytes.  */
> +	  return 4 << size;
> +	}
> +      else if (LOAD_STORE_MS (insn))
> +	{
> +	  size = 8 << bit (insn, 30);
> +
> +	  unsigned char opcode = bits (insn, 12, 15);
> +
> +	  if (opcode == 0x0 || opcode == 0x2)
> +	    size *= 4;
> +	  else if (opcode == 0x4 || opcode == 0x6)
> +	    size *= 3;
> +	  else if (opcode == 0x8 || opcode == 0xa)
> +	    size *= 2;
> +
> +	  return size;
> +	}
> +      else if (LOAD_STORE_SS (insn))
> +	{
> +	  size = 8 << bit (insn, 30);
> +	  return size;
> +	}
> +      /* Advanced SIMD load/store instructions.  */
> +      else if (ldst_pair)
> +	{
> +	  /* Advanced SIMD load/store pair.  */
> +	  /* Sizes 8, 16 and 32 bytes.  */
> +	  return (8 << size);
> +	}
> +      else
> +	{
> +	  /* Regular Advanced SIMD load/store instructions.  */
> +	  size = size | (bit (insn, 23) << 2);
> +	  return 1 << size;
> +	}
> +    }
> +
> +  /* This must be a regular GPR load/store instruction.  */
> +
> +  unsigned int size = bits (insn, 30, 31);
> +  bool cs_pair = COMPARE_SWAP_PAIR_P (insn);
> +  bool ldstx_pair = LOAD_STORE_EXCLUSIVE_PAIR_P (insn);
> +
> +  if (ldst_pair)
> +    {
> +      /* Load/Store pair.  */
> +      if (size == 0x1)
> +	{
> +	  if (bit (insn, 22) == 0)
> +	    /* STGP - 16 bytes.  */
> +	    return 16;
> +	  else
> +	    /* LDPSW - 8 bytes.  */
> +	    return 8;
> +	}
> +      /* Other Load/Store pair.  */
> +      return (size == 0)? 8 : 16;
> +    }
> +  else if (cs_pair || ldstx_pair)
> +    {
> +      /* Compare Swap Pair or Load Store Exclusive Pair.  */
> +      /* Sizes 8 and 16 bytes.  */
> +      size = bit (insn, 30);
> +      return (8 << size);
> +    }
> +  else if (LOAD_STORE_LITERAL_P (insn))
> +    {
> +      /* Load/Store literal.  */
> +      /* Sizes between 4 and 8 bytes.  */
> +      if (size == 0x2)
> +	return 4;
> +
> +      return 4 << size;
> +    }
> +  else
> +    {
> +      /* General load/store instructions, with memory access sizes between
> +	 1 and 8 bytes.  */
> +      return (1 << size);
> +    }
> +}
> +
> +/* Return true if the regions [mem_addr, mem_addr + mem_len) and
> +   [watch_addr, watch_addr + watch_len) overlap.  False otherwise.  */
> +
> +bool
> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
> +			  CORE_ADDR watch_addr, size_t watch_len)

mem_ranges_overlap() in memrange.c already does the same as this.

> +{
> +  /* Quick check for non-overlapping case.  */
> +  if (watch_addr >= (mem_addr + mem_len)
> +      || mem_addr >= (watch_addr + watch_len))
> +    return false;
> +
> +  CORE_ADDR start = std::max (mem_addr, watch_addr);
> +  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
> +
> +  return ((end - start) > 0);
> +}
> +
> +/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
> +   return true and update ADDR_P with the stopped data address.
> +   Otherwise return false.
> +
> +   STATE is the debug register's state, INSN is the instruction the inferior
> +   stopped at and ADDR_TRAP is the reported stopped data address.  */
> +
> +bool
> +hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
> +			 CORE_ADDR insn, CORE_ADDR addr_trap,
> +			 CORE_ADDR *addr_p)
> +{
> +  /* There are 6 variations of watchpoint range and memory access
> +     range positioning:
> +
> +     - W is the byte in the watchpoint range only.
> +
> +     - B is the byte in the memory access range ony.
> +
> +     - O is the byte in the overlapping region of the watchpoint range and
> +       the memory access range.
> +
> +     1 - Non-overlapping, no triggers.
> +
> +     [WWWWWWWW]...[BBBBBBBB]
> +
> +     2 - Non-overlapping, no triggers.
> +
> +     [BBBBBBBB]...[WWWWWWWW]
> +
> +     3 - Overlapping partially, triggers.
> +
> +     [BBBBOOOOWWWW]
> +
> +     4 - Overlapping partially, triggers.
> +
> +     [WWWWOOOOBBBB]
> +
> +     5 - Memory access contained in watchpoint range, triggers.
> +
> +     [WWWWOOOOOOOOWWWW]
> +
> +     6 - Memory access containing watchpoint range, triggers.
> +
> +     [BBBBOOOOOOOOBBBB]
> +  */
> +
> +  /* If there are no load/store instructions at PC, this must be a hardware
> +     breakpoint hit.  Return early.
> +
> +     If a hardware breakpoint is placed at a PC containing a load/store
> +     instruction, we will go through the memory access size check anyway, but
> +     will eventually figure out there are no watchpoints matching such an
> +     address.
> +
> +     There is one corner case though, which is having a hardware breakpoint and
> +     a hardware watchpoint at PC, when PC contains a load/store
> +     instruction.  That is an ambiguous case that is hard to differentiate.
> +
> +     There's not much we can do about that unless the kernel sends us enough
> +     information to tell them apart.  */
> +  if (!LOAD_STORE_P (insn))
> +    return false;
> +
> +  /* Get the memory access size for the instruction at PC.  */
> +  unsigned int memory_access_size = get_load_store_access_size (insn);
> +
> +  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
> +    {
> +      const unsigned int offset
> +	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> +      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> +      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> +      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> +
> +      if ((state->dr_ref_count_wp[i]
> +	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
> +	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
> +				       addr_watch, len))
> +	{
> +	  /* ADDR_TRAP reports the first address of the memory range
> +	     accessed by the CPU, regardless of what was the memory
> +	     range watched.  Thus, a large CPU access that straddles
> +	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> +	     ADDR_TRAP that is lower than the
> +	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> +
> +	     addr: |   4   |   5   |   6   |   7   |   8   |
> +				   |---- range watched ----|
> +		   |----------- range accessed ------------|
> +
> +	     In this case, ADDR_TRAP will be 4.
> +
> +	     To match a watchpoint known to GDB core, we must never
> +	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> +	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> +	     positive on kernels older than 4.10.  See PR
> +	     external/20207.  */
> +	  *addr_p = addr_orig;
> +	  return true;
> +	}
> +    }
> +
> +  /* No hardware watchpoint hits detected.  */
> +  return false;
> +}
> diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
> index 2fc4b400ece..2a08f6ad122 100644
> --- a/gdb/nat/aarch64-linux-hw-point.h
> +++ b/gdb/nat/aarch64-linux-hw-point.h
> @@ -20,6 +20,7 @@
> #define NAT_AARCH64_LINUX_HW_POINT_H
> 
> #include "gdbsupport/break-common.h" /* For enum target_hw_bp_type.  */
> +#include "arch/aarch64-insn.h" /* For bit/bits.  */
> 
> /* Macro definitions, data structures, and code for the hardware
>    breakpoint and hardware watchpoint support follow.  We use the
> @@ -132,6 +133,64 @@ typedef ULONGEST dr_changed_t;
> #define DR_HAS_CHANGED(x) ((x) != 0)
> #define DR_N_HAS_CHANGED(x, n) ((x) & ((dr_changed_t)1 << (n)))
> 
> +/* Macros to distinguish between various types of load/store instructions.  */
> +
> +/* Regular and Advanced SIMD load/store instructions.  */
> +#define GENERAL_LOAD_STORE_P(insn) (bit (insn, 25) == 0 && bit (insn, 27) == 1)
> +
> +/* SVE load/store instruction.  */
> +#define SVE_LOAD_STORE_P(insn) (bits (insn, 25, 28) == 0x2		      \
> +				&& (bits (insn, 29, 31) == 0x4		      \
> +				    || bits (insn, 29, 31) == 0x5	      \
> +				    || bits (insn, 29, 31) == 0x6	      \
> +				    || (bits (insn, 29, 31) == 0x7	      \
> +					&& ((bit (insn, 15) == 0x0	      \
> +					     && (bit (insn, 13) == 0x0	      \
> +						 || bit (insn, 13) == 0x1))   \
> +					     || (bit (insn, 15) == 0x1	      \
> +						 && bit (insn, 13) == 0x0)    \
> +					     || bits (insn, 13, 15) == 0x6    \
> +					     || bits (insn, 13, 15) == 0x7))))
> +
> +/* Any load/store instruction (regular, Advanced SIMD or SVE).  */
> +#define LOAD_STORE_P(insn) (GENERAL_LOAD_STORE_P (insn)			      \
> +			    || SVE_LOAD_STORE_P (insn))
> +
> +/* Load/Store pair (regular or vector).  */
> +#define LOAD_STORE_PAIR_P(insn) (bit (insn, 28) == 0 && bit (insn, 29) == 1)
> +
> +#define COMPARE_SWAP_PAIR_P(insn) (bits (insn, 30, 31) == 0		      \
> +				   && bits (insn, 28, 29) == 0		      \
> +				   && bit (insn, 26) == 0		      \
> +				   && bits (insn, 23, 24) == 0		      \
> +				   && bit (insn, 11) == 1)
> +#define LOAD_STORE_EXCLUSIVE_PAIR_P(insn) (bit (insn, 31) == 1		      \
> +					   && bits (insn, 28, 29) == 0	      \
> +					   && bit (insn, 26) == 0	      \
> +					   && bits (insn, 23, 24) == 0	      \
> +					   && bit (insn, 11) == 1)
> +
> +#define LOAD_STORE_LITERAL_P(insn) (bit (insn, 28) == 1			      \
> +				    && bit (insn, 29) == 0		      \
> +				    && bit (insn, 24) == 0)
> +
> +/* Vector Load/Store multiple structures.  */
> +#define LOAD_STORE_MS(insn) (bits (insn, 28, 29) == 0x0			      \
> +			     && bit (insn, 31) == 0x0			      \
> +			     && bit (insn, 26) == 0x1			      \
> +			     && ((bits (insn, 23, 24) == 0x0		      \
> +				  && bits (insn, 16, 21) == 0x0)	      \
> +				 || (bits (insn, 23, 24) == 0x1		      \
> +				     && bit (insn, 21) == 0x0)))
> +
> +/* Vector Load/Store single structure.  */
> +#define LOAD_STORE_SS(insn) (bits (insn, 28, 29) == 0x0			      \
> +			     && bit (insn, 31) == 0x0			      \
> +			     && bit (insn, 26) == 0x1			      \
> +			     && ((bits (insn, 23, 24) == 0x2		      \
> +				  && bits (insn, 16, 20) == 0x0)	      \
> +				 || (bits (insn, 23, 24) == 0x3)))
> +

I don’t like these at all. :)
I was looking in opcodes/aarch64* and gdb/arch/aarch64-insn.* see if there is something to
resuse, but there was nothing obvious. And you can’t just check against, say, the opcodes in
aarch64_opcodes because you need to check some bits for 0. 
If there isn’t anything suitable, then maybe these defines belong in aarch64-insn.h ?


> /* Structure for managing the hardware breakpoint/watchpoint resources.
>    DR_ADDR_* stores the address, DR_CTRL_* stores the control register
>    content, and DR_REF_COUNT_* counts the numbers of references to the
> @@ -197,4 +256,13 @@ struct aarch64_debug_reg_state *aarch64_get_debug_reg_state (pid_t pid);
> 
> int aarch64_linux_region_ok_for_watchpoint (CORE_ADDR addr, int len);
> 
> +unsigned int get_load_store_access_size (CORE_ADDR insn);
> +
> +bool hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
> +			       CORE_ADDR watch_addr, size_t watch_len);
> +
> +bool hw_watch_detect_trigger (const struct aarch64_debug_reg_state *state,
> +			      CORE_ADDR insn, CORE_ADDR stopped_data_address,
> +			      CORE_ADDR *addr_p);
> +
> #endif /* NAT_AARCH64_LINUX_HW_POINT_H */
> diff --git a/gdbserver/linux-aarch64-low.cc b/gdbserver/linux-aarch64-low.cc
> index daccfef746e..5df632fe724 100644
> --- a/gdbserver/linux-aarch64-low.cc
> +++ b/gdbserver/linux-aarch64-low.cc
> @@ -518,7 +518,7 @@ CORE_ADDR
> aarch64_target::low_stopped_data_address ()
> {
>   siginfo_t siginfo;
> -  int pid, i;
> +  int pid;
>   struct aarch64_debug_reg_state *state;
> 
>   pid = lwpid_of (current_thread);
> @@ -538,45 +538,20 @@ aarch64_target::low_stopped_data_address ()
>   const CORE_ADDR addr_trap
>     = address_significant ((CORE_ADDR) siginfo.si_addr);
> 
> +  struct regcache *regs = get_thread_regcache (current_thread, 1);
> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
> +  uint32_t insn;
> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
> +
>   /* Check if the address matches any watched address.  */
>   state = aarch64_get_debug_reg_state (pid_of (current_thread));
> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
> -    {
> -      const unsigned int offset
> -	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
> -      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
> -      const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
> -      const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
> -
> -      if (state->dr_ref_count_wp[i]
> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
> -	  && addr_trap >= addr_watch_aligned
> -	  && addr_trap < addr_watch + len)
> -	{
> -	  /* ADDR_TRAP reports the first address of the memory range
> -	     accessed by the CPU, regardless of what was the memory
> -	     range watched.  Thus, a large CPU access that straddles
> -	     the ADDR_WATCH..ADDR_WATCH+LEN range may result in an
> -	     ADDR_TRAP that is lower than the
> -	     ADDR_WATCH..ADDR_WATCH+LEN range.  E.g.:
> -
> -	     addr: |   4   |   5   |   6   |   7   |   8   |
> -				   |---- range watched ----|
> -		   |----------- range accessed ------------|
> -
> -	     In this case, ADDR_TRAP will be 4.
> -
> -	     To match a watchpoint known to GDB core, we must never
> -	     report *ADDR_P outside of any ADDR_WATCH..ADDR_WATCH+LEN
> -	     range.  ADDR_WATCH <= ADDR_TRAP < ADDR_ORIG is a false
> -	     positive on kernels older than 4.10.  See PR
> -	     external/20207.  */
> -	  return addr_orig;
> -	}
> -    }
> 
> -  return (CORE_ADDR) 0;
> +  CORE_ADDR trigger_addr;
> +
> +  if (hw_watch_detect_trigger (state, insn, addr_trap, &trigger_addr))
> +    return trigger_addr;
> +
> +  return 0;
> }
> 
> /* Implementation of linux target ops method "low_stopped_by_watchpoint".  */
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-17 15:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 14:27 [PATCH][AArch64] Fix missing watchpoint hits/endless loop Luis Machado
2021-06-02 16:57 ` Alan Hayward
2021-06-02 17:22   ` Luis Machado
2021-06-03 10:22     ` Alan Hayward
2021-07-23 13:24 ` [PATCH, v2][AArch64] " Luis Machado
2021-08-17 15:30   ` Alan Hayward

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).