From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=OkJc=J5=suse.de=tdevries@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131])
	by sourceware.org (Postfix) with ESMTPS id EC9003858D20
	for <gdb-patches@sourceware.org>; Tue, 20 Feb 2024 21:07:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EC9003858D20
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EC9003858D20
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708463243; cv=none;
	b=qMcgSIUSGh4k1Qi9kYjtl5A9lS5GN2cT3+NE8NVuomc+dFORij9COVouxC+EXdhaOW6q/qSaw32GqNmUBSvsUfQTZgWyD3NcsaOBk6hHE4bpTy0AB2ErzJcHRPVrJqvvyE9jblulX+PJmowxWolH13ymNzTUWzbqex5hrjgJg7k=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1708463243; c=relaxed/simple;
	bh=qEAEekSCvEYPtxZMIsHFHXP4C4N71o6DcK1hP/mU9pc=;
	h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:
	 Message-ID:Date:MIME-Version:Subject:To:From; b=dFHAtd+YLXxKvguFB66G0sXXT7SZYiSHsdBMHBuyBp8CfVFGWmDD66g9uuHpLCDIfZWVoqapBNuj1sCWxoHOQaErNdkp7U+4WEQVw3zgG4F3YgyyRA4XmndlHKxF+JdNt8G+Zl9AgmpKisVzV2rgPmtubndc7UbC3pYJyw3ZFIc=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id E4A701FD89;
	Tue, 20 Feb 2024 21:07:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1708463239; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LjQrvKiSeZlsKu/743fL0pNZCxuAx/MlHbus0AiQxTk=;
	b=qY170gZVR2AeWTz5lAY+NtFNFEEsu+fc8Std2ctJuIHIN/0cZpgMewvY3Q0dvbsMWmXzau
	yP72cH6YyK0NHY0UujMP289Ecsxn5WcPoXDz0e4l51P2Wnfsq519hyxgNMytgUWWodfMsa
	HnvCgAor4b3qXUnJp6RsQgxkCxZpogg=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1708463239;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LjQrvKiSeZlsKu/743fL0pNZCxuAx/MlHbus0AiQxTk=;
	b=nSlLVUjJ8ILpNed3AuecAE1bPADMOfWKyQphY86k8uOEiRIzpwBPhl8hzHRK6dyp3uIfaS
	BnjD5MDzyh8/krAw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1708463238; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LjQrvKiSeZlsKu/743fL0pNZCxuAx/MlHbus0AiQxTk=;
	b=m3woed4TAVJKlTWSB79tbIpfVEIP3E+isGeYmd2z3omIBYcwgEvd/nap05PwOb/Qnm2mTO
	+73qlDfGoxTb4XgSG17zWC0OWTwV6f0JV5gjMWcMwOtCbEQM1vqkxewNeRUoZltRvtOCer
	pyMp8lBsBceUIFD9NBSIVpTqcA/ZPFQ=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1708463238;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LjQrvKiSeZlsKu/743fL0pNZCxuAx/MlHbus0AiQxTk=;
	b=WBz5qtsIDRetXWncowmg4ZTSUXlptp+qcfpEdipSQk7FJ3YZn17y/hHzvpVwpmm39ZAjbY
	BOIdlqm9NOZt2wBw==
Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C8379139D0;
	Tue, 20 Feb 2024 21:07:18 +0000 (UTC)
Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])
	by imap1.dmz-prg2.suse.org with ESMTPSA
	id c/CCL4YU1WUUFwAAD6G6ig
	(envelope-from <tdevries@suse.de>); Tue, 20 Feb 2024 21:07:18 +0000
Message-ID: <b02d76f6-74ea-48e5-9d0d-6282ee188f88@suse.de>
Date: Tue, 20 Feb 2024 22:07:12 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v3] Fix missing watchpoint hits/endless loop
Content-Language: en-US
To: Luis Machado <luis.machado@arm.com>, gdb-patches@sourceware.org
References: <20240105150734.5287-1-tdevries@suse.de>
 <36f17aa1-72a5-4b57-a15d-efce90cf8b2d@arm.com>
From: Tom de Vries <tdevries@suse.de>
In-Reply-To: <36f17aa1-72a5-4b57-a15d-efce90cf8b2d@arm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Authentication-Results: smtp-out2.suse.de;
	none
X-Spamd-Result: default: False [-3.09 / 50.00];
	 ARC_NA(0.00)[];
	 RCVD_VIA_SMTP_AUTH(0.00)[];
	 XM_UA_NO_VERSION(0.01)[];
	 FROM_HAS_DN(0.00)[];
	 TO_DN_SOME(0.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 BAYES_HAM(-3.00)[100.00%];
	 MIME_GOOD(-0.10)[text/plain];
	 RCVD_COUNT_THREE(0.00)[3];
	 DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	 RCPT_COUNT_TWO(0.00)[2];
	 DBL_BLOCKED_OPENRESOLVER(0.00)[linaro.org:email];
	 FUZZY_BLOCKED(0.00)[rspamd.com];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 RCVD_TLS_ALL(0.00)[];
	 MID_RHS_MATCH_FROM(0.00)[]
X-Spam-Level: 
X-Spam-Score: -3.09
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

On 1/8/24 13:27, Luis Machado wrote:
> Hi Tom,
> 
> Thanks for refreshing this patch.
> 
> As I hinted on IRC, back then I ended up deciding not to pursue this approach further. Though it does
> address some of the potential failures, it is a bit convoluted and poses a fairly significant
> maintenance burden (we need to explicitly identify instructions, so we will always be behind), including
> the risk of producing false positives.
> 
> For SVE/SME, there are also predicated accesses, which means an intruction may read/write a discontiguous range
> of bytes depending on the predicate mask. That is trickier to handle.
> 
> The nature of hardware watchpoint detection on aarch64 means the triggering behavior is dependent on a particular
> CPU spec, in a way that is hard for userspace to predict reliably. Hopefully in the future this situation will
> improve for userspace. The kernel folks are aware of it.
> 

Hi Luis,

ack, that makes sense.

> Meanwhile, given the above restrictions (and potentially restrictions of other architectures), we could teach gdb
> about imprecise hardware watchpoint triggers. Something that would allow gdb to tell userspace that we know
> a hardware watchpoint has triggered, but we don't know exactly for what range and what happened.
> 
> At least this would prevent certain situations where gdb is simply stuck trying to skip over unskippable
> hardware watchpoints, because it can't tell what happened and ended up with a generic SIGTRAP.
> 
> Thoughts?
> 

I've submitted a two-patch series here ( 
https://sourceware.org/pipermail/gdb-patches/2024-February/206706.html 
), where the second patch does something a bit like this: the aarch64 
tdep tells gdb that we know that a hardware watchpoint has triggered, 
but doesn't know exactly for what range, and since it's a regular write 
watchpoint, gdb handles this fine.

With this patch series, I'm getting rid of the last consistent FAILs I 
see on m1 (not counting libcc1 stuff).

Thanks,
- Tom

> On 1/5/24 15:07, Tom de Vries wrote:
>> From: Luis Machado <luis.machado@linaro.org>
>>
>> Updates on v3:
>>
>> - Rebased to current trunk (v2 applied cleanly on gdb-12-branch).
>> - Should also work on fbsd, though I haven't tested it.
>> - Dropped gdbserver part (should be possible, I'm just not sure yet where to
>>    move aarch64_stopped_data_address such that it's accessible from gdbserver).
>> - No attempt made to address any review remarks on v2.
>>    https://inbox.sourceware.org/gdb-patches/D2AC3C31-5EEE-4BCA-8D75-7CEDFA75F540@arm.com/
>>
>> Updates on v2:
>>
>> - Handle more types of load/store instructions, including a catch all
>>    for SVE load/store instructions.
>>
>> --
>>
>> I ran into a situation where a hardware watchpoint hit is not detected
>> correctly, misleading GDB into thinking it was a delayed breakpoint hit.
>>
>> The problem is that hardware watchpoints are not skippable on AArch64, so
>> that makes GDB loop endlessly trying to run past the instruction.
>>
>> The most obvious case where this happens is when the load/store pair
>> instructions access 16 bytes of memory.
>>
>> Suppose we have a stp instruction that will write a couple 64-bit registers
>> to address 0x10 (stp x3,x4 [x2]). It will write data from 0x10 up to 0x20.
>>
>> Now suppose a write watchpoint is created to monitor memory address 0x18,
>> which is the start of the second register write. It can have whatever length,
>> but let's assume it has length 8.
>>
>> When we execute that stp instruction, it will trap and the reported stopped
>> data address from the kernel will be 0x10 (the beginning of the memory range
>> accessed by the instruction).
>>
>> The current code won't be able to detect a valid trigger because it assumes an
>> alignment of 8 bytes for the watchpoint address. Forcing that kind of alignment
>> won't be enough to detect a 16-byte access if the trap address falls outside of
>> the 8-byte alignment window. We need to know how many bytes the instruction
>> will access, but we won't have that data unless we go parsing instructions.
>>
>> Another issue with the current code seems to be that it assumes the accesses
>> will always be 8 bytes in size, since it wants to align the watchpoint address
>> to that particular boundary. This leads to problems when we have unaligned
>> addresses and unaligned watchpoints.
>>
>> For example, suppose we have a str instruction storing 8 bytes to memory
>> address 0xf. Now suppose we have a write watchpoint at address 0x16,
>> monitoring 8 bytes.
>>
>> The trap address will be 0xf, but forcing 0x16 to 8-byte alignment yields
>> 0x10, and so GDB doesn't think this is a watchpoint hit.
>>
>> I believe you can trigger the same problem with smaller memory accesses,
>> except one that accesses a single byte.
>>
>> In order to cover most of the cases correctly, we parse the load/store
>> instructions and detect how many bytes they're accessing. That will give us
>> enough information to tell if a hardware watchpoint triggered or not.
>>
>> The SVE load/store support is only a placeholder for now, as we need to
>> parse the instructions and use the variable length to determine the memory
>> access size (planned for the future).
>>
>> The patch also fixes these two failures in the testsuite:
>>
>> FAIL: gdb.base/watchpoint-unaligned.exp: continue (timeout)
>> FAIL: gdb.base/watchpoint-unaligned.exp: size8twice write
>>
>> Regression tested (v2) on aarch64-linux Ubuntu/20.04 and Ubuntu/18.04.
>> Regression tested (v3) on aarch64-linux fedora 39.
>>
>> I also used a very slow aarch64 hardware watchpoint stress test to validate
>> the v2 patch, but I don't think that particular test should be included in the
>> testsuite. It takes quite a while to run (20+ minutes), and shouldn't be
>> required unless we know there are problems with hardware watchpoint handling.
>>
>> PR tdep/29423
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29423
>> ---
>>   gdb/aarch64-fbsd-nat.c  |   7 +-
>>   gdb/aarch64-linux-nat.c |   7 +-
>>   gdb/aarch64-nat.c       | 268 ++++++++++++++++++++++++++++++++++++++--
>>   gdb/aarch64-nat.h       |   3 +-
>>   4 files changed, 273 insertions(+), 12 deletions(-)
>>
>> diff --git a/gdb/aarch64-fbsd-nat.c b/gdb/aarch64-fbsd-nat.c
>> index 38fb093f139..39162fe3bb0 100644
>> --- a/gdb/aarch64-fbsd-nat.c
>> +++ b/gdb/aarch64-fbsd-nat.c
>> @@ -154,9 +154,14 @@ aarch64_fbsd_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>   
>>     const CORE_ADDR addr_trap = (CORE_ADDR) siginfo.si_addr;
>>   
>> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>     /* Check if the address matches any watched address.  */
>>     state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> -  return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> +  return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>>   }
>>   
>>   /* Implement the "stopped_by_watchpoint" target_ops method.  */
>> diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
>> index 5b4e3c2bde1..5494bb07517 100644
>> --- a/gdb/aarch64-linux-nat.c
>> +++ b/gdb/aarch64-linux-nat.c
>> @@ -962,9 +962,14 @@ aarch64_linux_nat_target::stopped_data_address (CORE_ADDR *addr_p)
>>     const CORE_ADDR addr_trap
>>       = gdbarch_remove_non_address_bits (gdbarch, (CORE_ADDR) siginfo.si_addr);
>>   
>> +  struct regcache *regs = get_thread_regcache (this, inferior_ptid);
>> +  CORE_ADDR trigger_pc = regcache_read_pc (regs);
>> +  uint32_t insn;
>> +  read_memory (trigger_pc, (gdb_byte *) &insn, 4);
>> +
>>     /* Check if the address matches any watched address.  */
>>     state = aarch64_get_debug_reg_state (inferior_ptid.pid ());
>> -  return aarch64_stopped_data_address (state, addr_trap, addr_p);
>> +  return aarch64_stopped_data_address (state, insn, addr_trap, addr_p);
>>   }
>>   
>>   /* Implement the "stopped_by_watchpoint" target_ops method.  */
>> diff --git a/gdb/aarch64-nat.c b/gdb/aarch64-nat.c
>> index ee8c5a1e21d..928bb70d644 100644
>> --- a/gdb/aarch64-nat.c
>> +++ b/gdb/aarch64-nat.c
>> @@ -22,6 +22,7 @@
>>   #include "inferior.h"
>>   #include "cli/cli-cmds.h"
>>   #include "aarch64-nat.h"
>> +#include "arch/aarch64-insn.h"
>>   
>>   #include <unordered_map>
>>   
>> @@ -225,27 +226,275 @@ aarch64_remove_watchpoint (CORE_ADDR addr, int len, enum target_hw_bp_type type,
>>     return ret;
>>   }
>>   
>> -/* See aarch64-nat.h.  */
>> +/* Macros to distinguish between various types of load/store instructions.  */
>> +
>> +/* Regular and Advanced SIMD load/store instructions.  */
>> +#define GENERAL_LOAD_STORE_P(insn) (bit (insn, 25) == 0 && bit (insn, 27) == 1)
>> +
>> +/* SVE load/store instruction.  */
>> +#define SVE_LOAD_STORE_P(insn) (bits (insn, 25, 28) == 0x2		      \
>> +				&& (bits (insn, 29, 31) == 0x4		      \
>> +				    || bits (insn, 29, 31) == 0x5	      \
>> +				    || bits (insn, 29, 31) == 0x6	      \
>> +				    || (bits (insn, 29, 31) == 0x7	      \
>> +					&& ((bit (insn, 15) == 0x0	      \
>> +					     && (bit (insn, 13) == 0x0	      \
>> +						 || bit (insn, 13) == 0x1))   \
>> +					     || (bit (insn, 15) == 0x1	      \
>> +						 && bit (insn, 13) == 0x0)    \
>> +					     || bits (insn, 13, 15) == 0x6    \
>> +					     || bits (insn, 13, 15) == 0x7))))
>> +
>> +/* Any load/store instruction (regular, Advanced SIMD or SVE).  */
>> +#define LOAD_STORE_P(insn) (GENERAL_LOAD_STORE_P (insn)			      \
>> +			    || SVE_LOAD_STORE_P (insn))
>> +
>> +/* Load/Store pair (regular or vector).  */
>> +#define LOAD_STORE_PAIR_P(insn) (bit (insn, 28) == 0 && bit (insn, 29) == 1)
>> +
>> +#define COMPARE_SWAP_PAIR_P(insn) (bits (insn, 30, 31) == 0		      \
>> +				   && bits (insn, 28, 29) == 0		      \
>> +				   && bit (insn, 26) == 0		      \
>> +				   && bits (insn, 23, 24) == 0		      \
>> +				   && bit (insn, 11) == 1)
>> +#define LOAD_STORE_EXCLUSIVE_PAIR_P(insn) (bit (insn, 31) == 1		      \
>> +					   && bits (insn, 28, 29) == 0	      \
>> +					   && bit (insn, 26) == 0	      \
>> +					   && bits (insn, 23, 24) == 0	      \
>> +					   && bit (insn, 11) == 1)
>> +
>> +#define LOAD_STORE_LITERAL_P(insn) (bit (insn, 28) == 1			      \
>> +				    && bit (insn, 29) == 0		      \
>> +				    && bit (insn, 24) == 0)
>> +
>> +/* Vector Load/Store multiple structures.  */
>> +#define LOAD_STORE_MS(insn) (bits (insn, 28, 29) == 0x0			      \
>> +			     && bit (insn, 31) == 0x0			      \
>> +			     && bit (insn, 26) == 0x1			      \
>> +			     && ((bits (insn, 23, 24) == 0x0		      \
>> +				  && bits (insn, 16, 21) == 0x0)	      \
>> +				 || (bits (insn, 23, 24) == 0x1		      \
>> +				     && bit (insn, 21) == 0x0)))
>> +
>> +/* Vector Load/Store single structure.  */
>> +#define LOAD_STORE_SS(insn) (bits (insn, 28, 29) == 0x0			      \
>> +			     && bit (insn, 31) == 0x0			      \
>> +			     && bit (insn, 26) == 0x1			      \
>> +			     && ((bits (insn, 23, 24) == 0x2		      \
>> +				  && bits (insn, 16, 20) == 0x0)	      \
>> +				 || (bits (insn, 23, 24) == 0x3)))
>> +
>> +/* Assuming INSN is a load/store instruction, return the size of the
>> +   memory access.  The patterns are documented in the ARM Architecture
>> +   Reference Manual - Index by encoding.  */
>> +
>> +static unsigned int
>> +get_load_store_access_size (CORE_ADDR insn)
>> +{
>> +  if (SVE_LOAD_STORE_P (insn))
>> +    {
>> +      /* SVE load/store instructions.  */
>> +
>> +      /* This is not always correct for SVE instructions, but it is a reasonable
>> +	 default for now.  Calculating the exact memory access size for SVE
>> +	 load/store instructions requires parsing instructions and evaluating
>> +	 the vector length.  */
>> +      return 16;
>> +    }
>> +
>> +  /* Non-SVE instructions.  */
>> +
>> +  bool vector = (bit (insn, 26) == 1);
>> +  bool ldst_pair = LOAD_STORE_PAIR_P (insn);
>> +
>> +  /* Is this an Advanced SIMD load/store instruction?  */
>> +  if (vector)
>> +    {
>> +      unsigned int size = bits (insn, 30, 31);
>> +
>> +      if (LOAD_STORE_LITERAL_P (insn))
>> +	{
>> +	  /* Advanced SIMD load/store literal */
>> +	  /* Sizes 4, 8 and 16 bytes.  */
>> +	  return 4 << size;
>> +	}
>> +      else if (LOAD_STORE_MS (insn))
>> +	{
>> +	  size = 8 << bit (insn, 30);
>> +
>> +	  unsigned char opcode = bits (insn, 12, 15);
>> +
>> +	  if (opcode == 0x0 || opcode == 0x2)
>> +	    size *= 4;
>> +	  else if (opcode == 0x4 || opcode == 0x6)
>> +	    size *= 3;
>> +	  else if (opcode == 0x8 || opcode == 0xa)
>> +	    size *= 2;
>> +
>> +	  return size;
>> +	}
>> +      else if (LOAD_STORE_SS (insn))
>> +	{
>> +	  size = 8 << bit (insn, 30);
>> +	  return size;
>> +	}
>> +      /* Advanced SIMD load/store instructions.  */
>> +      else if (ldst_pair)
>> +	{
>> +	  /* Advanced SIMD load/store pair.  */
>> +	  /* Sizes 8, 16 and 32 bytes.  */
>> +	  return (8 << size);
>> +	}
>> +      else
>> +	{
>> +	  /* Regular Advanced SIMD load/store instructions.  */
>> +	  size = size | (bit (insn, 23) << 2);
>> +	  return 1 << size;
>> +	}
>> +    }
>> +
>> +  /* This must be a regular GPR load/store instruction.  */
>> +
>> +  unsigned int size = bits (insn, 30, 31);
>> +  bool cs_pair = COMPARE_SWAP_PAIR_P (insn);
>> +  bool ldstx_pair = LOAD_STORE_EXCLUSIVE_PAIR_P (insn);
>> +
>> +  if (ldst_pair)
>> +    {
>> +      /* Load/Store pair.  */
>> +      if (size == 0x1)
>> +	{
>> +	  if (bit (insn, 22) == 0)
>> +	    /* STGP - 16 bytes.  */
>> +	    return 16;
>> +	  else
>> +	    /* LDPSW - 8 bytes.  */
>> +	    return 8;
>> +	}
>> +      /* Other Load/Store pair.  */
>> +      return (size == 0)? 8 : 16;
>> +    }
>> +  else if (cs_pair || ldstx_pair)
>> +    {
>> +      /* Compare Swap Pair or Load Store Exclusive Pair.  */
>> +      /* Sizes 8 and 16 bytes.  */
>> +      size = bit (insn, 30);
>> +      return (8 << size);
>> +    }
>> +  else if (LOAD_STORE_LITERAL_P (insn))
>> +    {
>> +      /* Load/Store literal.  */
>> +      /* Sizes between 4 and 8 bytes.  */
>> +      if (size == 0x2)
>> +	return 4;
>> +
>> +      return 4 << size;
>> +    }
>> +  else
>> +    {
>> +      /* General load/store instructions, with memory access sizes between
>> +	 1 and 8 bytes.  */
>> +      return (1 << size);
>> +    }
>> +}
>> +
>> +/* Return true if the regions [mem_addr, mem_addr + mem_len) and
>> +   [watch_addr, watch_addr + watch_len) overlap.  False otherwise.  */
>> +
>> +static bool
>> +hw_watch_regions_overlap (CORE_ADDR mem_addr, size_t mem_len,
>> +			  CORE_ADDR watch_addr, size_t watch_len)
>> +{
>> +  /* Quick check for non-overlapping case.  */
>> +  if (watch_addr >= (mem_addr + mem_len)
>> +      || mem_addr >= (watch_addr + watch_len))
>> +    return false;
>> +
>> +  CORE_ADDR start = std::max (mem_addr, watch_addr);
>> +  CORE_ADDR end = std::min (mem_addr + mem_len, watch_addr + watch_len);
>> +
>> +  return ((end - start) > 0);
>> +}
>> +
>> +/* Check if a hardware watchpoint has triggered.  If a trigger is detected,
>> +   return true and update ADDR_P with the stopped data address.
>> +   Otherwise return false.
>> +
>> +   STATE is the debug register's state, INSN is the instruction the inferior
>> +   stopped at and ADDR_TRAP is the reported stopped data address.  */
>>   
>>   bool
>>   aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> -			      CORE_ADDR addr_trap, CORE_ADDR *addr_p)
>> +			      CORE_ADDR insn, CORE_ADDR addr_trap,
>> +			      CORE_ADDR *addr_p)
>>   {
>> -  int i;
>> +  /* There are 6 variations of watchpoint range and memory access
>> +     range positioning:
>> +
>> +     - W is the byte in the watchpoint range only.
>> +
>> +     - B is the byte in the memory access range ony.
>> +
>> +     - O is the byte in the overlapping region of the watchpoint range and
>> +       the memory access range.
>> +
>> +     1 - Non-overlapping, no triggers.
>> +
>> +     [WWWWWWWW]...[BBBBBBBB]
>> +
>> +     2 - Non-overlapping, no triggers.
>> +
>> +     [BBBBBBBB]...[WWWWWWWW]
>> +
>> +     3 - Overlapping partially, triggers.
>> +
>> +     [BBBBOOOOWWWW]
>> +
>> +     4 - Overlapping partially, triggers.
>> +
>> +     [WWWWOOOOBBBB]
>> +
>> +     5 - Memory access contained in watchpoint range, triggers.
>> +
>> +     [WWWWOOOOOOOOWWWW]
>> +
>> +     6 - Memory access containing watchpoint range, triggers.
>> +
>> +     [BBBBOOOOOOOOBBBB]
>> +  */
>> +
>> +  /* If there are no load/store instructions at PC, this must be a hardware
>> +     breakpoint hit.  Return early.
>> +
>> +     If a hardware breakpoint is placed at a PC containing a load/store
>> +     instruction, we will go through the memory access size check anyway, but
>> +     will eventually figure out there are no watchpoints matching such an
>> +     address.
>> +
>> +     There is one corner case though, which is having a hardware breakpoint and
>> +     a hardware watchpoint at PC, when PC contains a load/store
>> +     instruction.  That is an ambiguous case that is hard to differentiate.
>> +
>> +     There's not much we can do about that unless the kernel sends us enough
>> +     information to tell them apart.  */
>> +  if (!LOAD_STORE_P (insn))
>> +    return false;
>> +
>> +  /* Get the memory access size for the instruction at PC.  */
>> +  unsigned int memory_access_size = get_load_store_access_size (insn);
>>   
>> -  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
>> +  for (int i = aarch64_num_wp_regs - 1; i >= 0; --i)
>>       {
>>         const unsigned int offset
>>   	= aarch64_watchpoint_offset (state->dr_ctrl_wp[i]);
>>         const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
>>         const CORE_ADDR addr_watch = state->dr_addr_wp[i] + offset;
>> -      const CORE_ADDR addr_watch_aligned = align_down (state->dr_addr_wp[i], 8);
>>         const CORE_ADDR addr_orig = state->dr_addr_orig_wp[i];
>>   
>> -      if (state->dr_ref_count_wp[i]
>> -	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
>> -	  && addr_trap >= addr_watch_aligned
>> -	  && addr_trap < addr_watch + len)
>> +      if ((state->dr_ref_count_wp[i]
>> +	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i]))
>> +	  && hw_watch_regions_overlap (addr_trap, memory_access_size,
>> +				       addr_watch, len))
>>   	{
>>   	  /* ADDR_TRAP reports the first address of the memory range
>>   	     accessed by the CPU, regardless of what was the memory
>> @@ -270,6 +519,7 @@ aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>>   	}
>>       }
>>   
>> +  /* No hardware watchpoint hits detected.  */
>>     return false;
>>   }
>>   
>> diff --git a/gdb/aarch64-nat.h b/gdb/aarch64-nat.h
>> index fee6bda2577..bc0e6297f40 100644
>> --- a/gdb/aarch64-nat.h
>> +++ b/gdb/aarch64-nat.h
>> @@ -51,7 +51,8 @@ void aarch64_remove_debug_reg_state (pid_t pid);
>>      *ADDR_P.  */
>>   
>>   bool aarch64_stopped_data_address (const struct aarch64_debug_reg_state *state,
>> -				   CORE_ADDR addr_trap, CORE_ADDR *addr_p);
>> +				   CORE_ADDR insn, CORE_ADDR addr_trap,
>> +				   CORE_ADDR *addr_p);
>>   
>>   /* Helper functions used by aarch64_nat_target below.  See their
>>      definitions.  */
>>
>> base-commit: e78337d5780c9d837e186c22c11eb8f9ed5bac62
>