From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=gXjU=GT=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 695593858414
	for <gdb-patches@sourceware.org>; Mon,  6 Nov 2023 10:49:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 695593858414
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 695593858414
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699267787; cv=none;
	b=c7Sqvg6Ko9XrigAWu+mHRI7RQACwtrvf4g3UBzrwG/fouCC1U9Cr/tmLeXVHPmv1Ua+CL1n3XnUtTZQ9awk2caKT9Cp8y0JmjLFYygoAArnHoO1sF5bxz6tEqspoAhStRK9Cu1cKvgHETStfaGLbkKJsBK8IZnjs1mjn1nHgzY8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699267787; c=relaxed/simple;
	bh=RzwqXRXKSLZYmkzsZhXukEs42gFICy0pGqEEsZb0VmE=;
	h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=O1AYGlCWLwYk0qbVmH/r49FzekKYAhXFlj2WxAUGGbb0W3VTLA/olQX77LS/pSt9UUv09uRt+w63bo2sDW39pzuif83UmselLSXD1k7GwjKutG9W94Y31TZMca0IFjt6HWEkuPv+oyYs60hoa8OfxNx6vD0ctYELacNal3edgvw=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1699267784;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=xRsbcwiY0PNSllT/RWWambIbrTIdJO976NeNIxBVPqo=;
	b=iCI/PU0v9DTfz50dht1MnEHdA4hAjAcRv9fR/TV4j2rfXonKHsqRAXxm5Kz1zeGCtWO6XE
	NCL00X149BUWZUb+HJzFFdp3ZEC8X4nXIulNY5kY4iKv6SsUAlXZhrTdQn0en2G4yUrLbt
	Ml5uaH1yeNTCMSpCFnUiDu3TGsmM3q4=
Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com
 [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-170-aqT22ssVNX-qg-oLz7feKw-1; Mon, 06 Nov 2023 05:49:25 -0500
X-MC-Unique: aqT22ssVNX-qg-oLz7feKw-1
Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-670b675b2c5so54972716d6.0
        for <gdb-patches@sourceware.org>; Mon, 06 Nov 2023 02:49:25 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699267765; x=1699872565;
        h=content-transfer-encoding:mime-version:message-id:date:references
         :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ZqfZp30Z7mW8Svbc8tsDfl3jnxr0XG5nRVzuwiYP1ow=;
        b=nG53KpHtOO0qb7uXadKkxS8tGNOjmWHnlOityvgCajIXQZApizvP78bUt3WUI2kDgb
         Quj8SV1Bvzu+hPIK7Hj5B6loqgRDNogfKaKIFQYYp8Zd3wFQHWIg3sQtNj9vhmrZFW8G
         NPL4+/zXsieZ4a29hrW/V3sXjH87zOHAPPhZA1Sm11WBZojfJGwTz2cd6cyuJhQgQi4D
         jPG8TSJCzFi5BbBzpt8RB/ccPE8VAjfm5P2dUW7gaFG2QpNRJLMLVVi9hLOuEW+rQgXT
         zeuGTaloU8ejXG2sGqrSUohSfm9Qg4ABLpS2vpHuPG9v0TcTkPVOZ0+HhmfR+Uu/LxHk
         BDPQ==
X-Gm-Message-State: AOJu0Yz9zj3Y7z6RQLDsVKX3cHjFKCZA8VFBnVp4H/Ej8fOmsmmKUPC/
	yPayjbt3LeTQPf7dGUqsY9akgODrANZJ4M99NW+Z7ag3IQvsDinHbWq2yTPqRZ4NVLHkpNBpzBL
	aOugwMbO+ejveW933LLooP3zuxgZu4w==
X-Received: by 2002:a05:6214:e61:b0:66f:ba6e:73e8 with SMTP id jz1-20020a0562140e6100b0066fba6e73e8mr5869756qvb.41.1699267764988;
        Mon, 06 Nov 2023 02:49:24 -0800 (PST)
X-Google-Smtp-Source: AGHT+IGsM/cYftPyEy+MukkNuwZIzF6OYDByOgBoPmL1k1YIaKjSxwVWhTv45UBxF1pXFeEoj2UxMg==
X-Received: by 2002:a05:6214:e61:b0:66f:ba6e:73e8 with SMTP id jz1-20020a0562140e6100b0066fba6e73e8mr5869744qvb.41.1699267764514;
        Mon, 06 Nov 2023 02:49:24 -0800 (PST)
Received: from localhost (105.226.159.143.dyn.plus.net. [143.159.226.105])
        by smtp.gmail.com with ESMTPSA id x1-20020ad44581000000b00647386a3234sm3313068qvu.85.2023.11.06.02.49.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 06 Nov 2023 02:49:24 -0800 (PST)
From: Andrew Burgess <aburgess@redhat.com>
To: Yang Liu <liuyang22@iscas.ac.cn>, gdb-patches@sourceware.org
Cc: palmer@dabbelt.com, simon.marchi@polymtl.ca, Yang Liu
 <liuyang22@iscas.ac.cn>
Subject: Re: [PATCH v2] gdb: RISC-V: Refine lr/sc sequence support
In-Reply-To: <20231102174418.46080-1-liuyang22@iscas.ac.cn>
References: <20231102174418.46080-1-liuyang22@iscas.ac.cn>
Date: Mon, 06 Nov 2023 10:49:22 +0000
Message-ID: <87il6fupfx.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Yang Liu <liuyang22@iscas.ac.cn> writes:

> Per RISC-V spec, the lr/sc sequence can consist of up to 16 instructions,=
 and we
> cannot insert breakpoints in the middle of this sequence. Before this, we=
 only
> detected a specific pattern (the most common one). This patch improves th=
is part
> and now supports more complex pattern detection.

Thanks for doing this.  I can't comment too much on whether this code
complies with the ISA requirements as I don't follow the ISA at this
level these days.  However, the code seems mostly clear, except for one
part of the logic that confused me for a while, I've commented on this
inline below.  I also spotted a few GDB/GNU style nits that I'd like to
see fixed, I've detailed them inline below too.

Looking at Palmer's comment from your V1 thread:

  On 2023/11/2 23:58, Palmer Dabbelt wrote:
 =20
  > I also think the branch detection logic here isn't quite right: the=20
  > LR/SC sequence only becomes unconstrained if the backwards branch is=20
  > taken at runtime.=C2=A0 I can't find any examples of people doing that,=
 but=20
  > one could imagine some sort of spin-type patterns with an early retry=
=20
  > that do this sort of thing.
  >
  > The same logic applies to disallowed instructions, but I don't think=20
  > it's super likely those end up conditionally executed in LR/SC=20
  > sequences so maybe we can ignore that?

I think if we wanted to implement this then we'd have to actually
emulate all of the instructions within the sequence, which seems like a
much bigger task.

As you suggested in your reply, I'd be happy to see this change merged
as it is an improvement over what we currently have, but maybe it's
worth adding a comment somewhere to indicate that we understand that
there are still limitations with this code?  I think at a minimum, we
should probably discuss this issue in the commit message.

>
> gdb/ChangeLog:
>
>         * gdb/riscv-tdep.c (class riscv_insn): Add more needed opcode enu=
ms.
>         (riscv_insn::decode): Decode newly added opcodes.
>         (riscv_insn_is_non_cti_and_allowed_in_atomic_sequence): New.
>         (riscv_insn_is_direct_branch): New.
>         (riscv_next_pc_atomic_sequence): Removed.
>         (riscv_deal_with_atomic_sequence): Rename from riscv_next_pc_atom=
ic_sequence.
>         (riscv_software_single_step): Adjust to use the renamed one.

There's no problem including a ChangeLog style entry in the commit
message if you like this, but this is no longer required for commits to
GDB.  The change is instead expected to be fully discussed in the
initial part of the commit message.

>
> Signed-off-by: Yang Liu <liuyang22@iscas.ac.cn>
> ---
>  gdb/riscv-tdep.c | 275 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 236 insertions(+), 39 deletions(-)
>
> diff --git a/gdb/riscv-tdep.c b/gdb/riscv-tdep.c
> index 3a2891c2c92..3c5afea99ca 100644
> --- a/gdb/riscv-tdep.c
> +++ b/gdb/riscv-tdep.c
> @@ -1578,8 +1578,34 @@ class riscv_insn
>        BLTU,
>        BGEU,
>        /* These are needed for stepping over atomic sequences.  */
> -      LR,
> -      SC,
> +      SLTI,
> +      SLTIU,
> +      XORI,
> +      ORI,
> +      ANDI,
> +      SLLI,
> +      SLLIW,
> +      SRLI,
> +      SRLIW,
> +      SRAI,
> +      SRAIW,
> +      SUB,
> +      SUBW,
> +      SLL,
> +      SLLW,
> +      SLT,
> +      SLTU,
> +      XOR,
> +      SRL,
> +      SRLW,
> +      SRA,
> +      SRAW,
> +      OR,
> +      AND,
> +      LR_W,
> +      LR_D,
> +      SC_W,
> +      SC_D,
>        /* This instruction is used to do a syscall.  */
>        ECALL,
> =20
> @@ -1768,6 +1794,13 @@ class riscv_insn
>      m_imm.s =3D EXTRACT_CBTYPE_IMM (ival);
>    }
> =20
> +  void decode_ca_type_insn (enum opcode opcode, ULONGEST ival)
> +  {
> +    m_opcode =3D opcode;
> +    m_rs1 =3D decode_register_index_short (ival, OP_SH_CRS1S);
> +    m_rs2 =3D decode_register_index_short (ival, OP_SH_CRS2S);
> +  }
> +
>    /* Fetch instruction from target memory at ADDR, return the content of
>       the instruction, and update LEN with the instruction length.  */
>    static ULONGEST fetch_instruction (struct gdbarch *gdbarch,
> @@ -1882,14 +1915,62 @@ riscv_insn::decode (struct gdbarch *gdbarch, CORE=
_ADDR pc)
>  =09decode_b_type_insn (BLTU, ival);
>        else if (is_bgeu_insn (ival))
>  =09decode_b_type_insn (BGEU, ival);
> +      else if (is_slti_insn(ival))
> +=09decode_i_type_insn (SLTI, ival);
> +      else if (is_sltiu_insn(ival))
> +=09decode_i_type_insn (SLTIU, ival);
> +      else if (is_xori_insn(ival))
> +=09decode_i_type_insn (XORI, ival);
> +      else if (is_ori_insn(ival))
> +=09decode_i_type_insn (ORI, ival);
> +      else if (is_andi_insn(ival))
> +=09decode_i_type_insn (ANDI, ival);
> +      else if (is_slli_insn(ival))
> +=09decode_i_type_insn (SLLI, ival);
> +      else if (is_slliw_insn(ival))
> +=09decode_i_type_insn (SLLIW, ival);
> +      else if (is_srli_insn(ival))
> +=09decode_i_type_insn (SRLI, ival);
> +      else if (is_srliw_insn(ival))
> +=09decode_i_type_insn (SRLIW, ival);
> +      else if (is_srai_insn(ival))
> +=09decode_i_type_insn (SRAI, ival);
> +      else if (is_sraiw_insn(ival))
> +=09decode_i_type_insn (SRAIW, ival);
> +      else if (is_sub_insn(ival))
> +=09decode_r_type_insn (SUB, ival);
> +      else if (is_subw_insn(ival))
> +=09decode_r_type_insn (SUBW, ival);
> +      else if (is_sll_insn(ival))
> +=09decode_r_type_insn (SLL, ival);
> +      else if (is_sllw_insn(ival))
> +=09decode_r_type_insn (SLLW, ival);
> +      else if (is_slt_insn(ival))
> +=09decode_r_type_insn (SLT, ival);
> +      else if (is_sltu_insn(ival))
> +=09decode_r_type_insn (SLTU, ival);
> +      else if (is_xor_insn(ival))
> +=09decode_r_type_insn (XOR, ival);
> +      else if (is_srl_insn(ival))
> +=09decode_r_type_insn (SRL, ival);
> +      else if (is_srlw_insn(ival))
> +=09decode_r_type_insn (SRLW, ival);
> +      else if (is_sra_insn(ival))
> +=09decode_r_type_insn (SRA, ival);
> +      else if (is_sraw_insn(ival))
> +=09decode_r_type_insn (SRAW, ival);
> +      else if (is_or_insn(ival))
> +=09decode_r_type_insn (OR, ival);
> +      else if (is_and_insn(ival))
> +=09decode_r_type_insn (AND, ival);
>        else if (is_lr_w_insn (ival))
> -=09decode_r_type_insn (LR, ival);
> +=09decode_r_type_insn (LR_W, ival);
>        else if (is_lr_d_insn (ival))
> -=09decode_r_type_insn (LR, ival);
> +=09decode_r_type_insn (LR_D, ival);
>        else if (is_sc_w_insn (ival))
> -=09decode_r_type_insn (SC, ival);
> +=09decode_r_type_insn (SC_W, ival);
>        else if (is_sc_d_insn (ival))
> -=09decode_r_type_insn (SC, ival);
> +=09decode_r_type_insn (SC_D, ival);
>        else if (is_ecall_insn (ival))
>  =09decode_i_type_insn (ECALL, ival);
>        else if (is_ld_insn (ival))
> @@ -1944,6 +2025,24 @@ riscv_insn::decode (struct gdbarch *gdbarch, CORE_=
ADDR pc)
>  =09  m_rd =3D decode_register_index (ival, OP_SH_CRS1S);
>  =09  m_imm.s =3D EXTRACT_CITYPE_LUI_IMM (ival);
>  =09}
> +      else if (is_c_srli_insn (ival))
> +=09decode_cb_type_insn (SRLI, ival);
> +      else if (is_c_srai_insn (ival))
> +=09decode_cb_type_insn (SRAI, ival);
> +      else if (is_c_andi_insn (ival))
> +=09decode_cb_type_insn (ANDI, ival);
> +      else if (is_c_sub_insn (ival))
> +=09decode_ca_type_insn (SUB, ival);
> +      else if (is_c_xor_insn (ival))
> +=09decode_ca_type_insn (XOR, ival);
> +      else if (is_c_or_insn (ival))
> +=09decode_ca_type_insn (OR, ival);
> +      else if (is_c_and_insn (ival))
> +=09decode_ca_type_insn (AND, ival);
> +      else if (is_c_subw_insn (ival))
> +=09decode_ca_type_insn (SUBW, ival);
> +      else if (is_c_addw_insn (ival))
> +=09decode_ca_type_insn (ADDW, ival);
>        else if (is_c_li_insn (ival))
>  =09decode_ci_type_insn (LI, ival);
>        /* C_SD and C_FSW have the same opcode.  C_SD is RV64 and RV128 on=
ly,
> @@ -4404,51 +4503,149 @@ riscv_next_pc (struct regcache *regcache, CORE_A=
DDR pc)
>    return next_pc;
>  }
> =20
> +/* Return true if INSN is not a control transfer instruction and is allo=
wed to
> +   appear in the middle of the lr/sc sequence.  */
> +
> +static bool
> +riscv_insn_is_non_cti_and_allowed_in_atomic_sequence(const struct riscv_=
insn &insn)

You need to add some white-space after the function name and before the
opening '('.  In this case, the line is too long (> 80 characters), so
you should format it as:

  static bool
  riscv_insn_is_non_cti_and_allowed_in_atomic_sequence
    (const struct riscv_insn &insn)
  {
    ... etc ...

> +{
> +  switch (insn.opcode ())
> +    {
> +    case riscv_insn::LUI:
> +    case riscv_insn::AUIPC:
> +    case riscv_insn::ADDI:
> +    case riscv_insn::ADDIW:
> +    case riscv_insn::SLTI:
> +    case riscv_insn::SLTIU:
> +    case riscv_insn::XORI:
> +    case riscv_insn::ORI:
> +    case riscv_insn::ANDI:
> +    case riscv_insn::SLLI:
> +    case riscv_insn::SLLIW:
> +    case riscv_insn::SRLI:
> +    case riscv_insn::SRLIW:
> +    case riscv_insn::SRAI:
> +    case riscv_insn::ADD:
> +    case riscv_insn::ADDW:
> +    case riscv_insn::SRAIW:
> +    case riscv_insn::SUB:
> +    case riscv_insn::SUBW:
> +    case riscv_insn::SLL:
> +    case riscv_insn::SLLW:
> +    case riscv_insn::SLT:
> +    case riscv_insn::SLTU:
> +    case riscv_insn::XOR:
> +    case riscv_insn::SRL:
> +    case riscv_insn::SRLW:
> +    case riscv_insn::SRA:
> +    case riscv_insn::SRAW:
> +    case riscv_insn::OR:
> +    case riscv_insn::AND:
> +      return true;
> +    }
> +
> +    return false;
> +}
> +
> +/* Return true if INSN is a direct branch insctruction.  */

Typo: insctruction -> instruction.

> +
> +static bool
> +riscv_insn_is_direct_branch(const struct riscv_insn &insn)

And here the formatting should be:

  static bool
  riscv_insn_is_direct_branch (const struct riscv_insn &insn)
  {
    ... etc ...

> +{
> +  switch (insn.opcode ())
> +    {
> +    case riscv_insn::BEQ:
> +    case riscv_insn::BNE:
> +    case riscv_insn::BLT:
> +    case riscv_insn::BGE:
> +    case riscv_insn::BLTU:
> +    case riscv_insn::BGEU:
> +    case riscv_insn::JAL:
> +      return true;
> +    }
> +
> +    return false;
> +}
> +
>  /* We can't put a breakpoint in the middle of a lr/sc atomic sequence, s=
o look
>     for the end of the sequence and put the breakpoint there.  */
> =20
> -static bool
> -riscv_next_pc_atomic_sequence (struct regcache *regcache, CORE_ADDR pc,
> -=09=09=09       CORE_ADDR *next_pc)
> +static std::vector<CORE_ADDR>
> +riscv_deal_with_atomic_sequence (struct regcache *regcache, CORE_ADDR pc=
)
>  {
>    struct gdbarch *gdbarch =3D regcache->arch ();
>    struct riscv_insn insn;
> -  CORE_ADDR cur_step_pc =3D pc;
> -  CORE_ADDR last_addr =3D 0;
> +  CORE_ADDR cur_step_pc =3D pc, next_pc;
> +  std::vector<CORE_ADDR> next_pcs;
> +  const int atomic_sequence_length =3D 16;

The comment explaining the "magic" constant (16) should appear at the
point that the constant is introduced.  However, I wonder in this case
if we should just move the constant declaration to later in the function
at the point where it's used (see below)...

> +  bool found_valid_atomic_sequence =3D false;
> +  enum riscv_insn::opcode lr_opcode;
> =20
>    /* First instruction has to be a load reserved.  */
>    insn.decode (gdbarch, cur_step_pc);
> -  if (insn.opcode () !=3D riscv_insn::LR)
> -    return false;
> -  cur_step_pc =3D cur_step_pc + insn.length ();
> +  lr_opcode =3D insn.opcode ();
> +  if (lr_opcode !=3D riscv_insn::LR_D && lr_opcode !=3D riscv_insn::LR_W=
)
> +    return {};
> =20
> -  /* Next instruction should be branch to exit.  */
> -  insn.decode (gdbarch, cur_step_pc);
> -  if (insn.opcode () !=3D riscv_insn::BNE)
> -    return false;
> -  last_addr =3D cur_step_pc + insn.imm_signed ();
> -  cur_step_pc =3D cur_step_pc + insn.length ();
> +  /* A lr/sc sequence comprise at most 16 instructions placed sequential=
ly in memory. */
> +  for (int insn_count =3D 0; insn_count < atomic_sequence_length; ++insn=
_count)

... I'd be tempted to move the constant declaration to here, like:

  /* A lr/sc sequence comprise at most 16 instructions placed sequentially
     in memory.  */
  static const int max_atomic_sequence_length =3D 16;
  for (int insn_count =3D 0;
       insn_count < max_atomic_sequence_length;
       ++insn_count)
    {
      ... etc ...

though given the constant is only used in this one place, maybe I'd just
fold the constant in-line and rely on the comment to explain it?

Notice I also renamed the constant to max_atomic_sequence_length as this
seems like a better description of what the constant is used for, but
then I had to line wrap the for (...) in order to keep the line under 80
characters.

The GNU style also requires two spaces after a '.' in comments, which
you'd missing here, and in a couple of other places below.

I'm not 100% sure that the comment here is detailed enough.  This caused
some of the confusion I had when reading this patch.  I'll describe this
more later in this email.

> +    {
> +      cur_step_pc +=3D insn.length ();
> +      insn.decode (gdbarch, cur_step_pc);
> =20
> -  /* Next instruction should be store conditional.  */
> -  insn.decode (gdbarch, cur_step_pc);
> -  if (insn.opcode () !=3D riscv_insn::SC)
> -    return false;
> -  cur_step_pc =3D cur_step_pc + insn.length ();
> +      /* The dynamic code executed between lr/sc can only contain instru=
ctions
> +=09 from the base I instruction set, excluding loads, stores, backward j=
umps,
> +=09 taken backward branches, JALR, FENCE, FENCE.I, and SYSTEM instructio=
ns.
> +=09 If the C extension is supported, then compressed forms of the aforem=
entioned
> +=09 I instructions are also permitted.  */

Some of these lines are too long.
> +
> +      if (riscv_insn_is_non_cti_and_allowed_in_atomic_sequence (insn))
> +=09{
> +=09  continue;
> +=09}

Single statement blocks should not have the { ... } around them, so
just:

      if (riscv_insn_is_non_cti_and_allowed_in_atomic_sequence (insn))
=09continue;


> +      /* Look for a conditional branch instruction, check if it's taken =
forward or not. */

Add an extra space after final '.' in comment.

> +      else if (riscv_insn_is_direct_branch (insn))
> +=09{
> +=09  if (insn.imm_signed () > 0)
> +=09    {
> +=09      next_pc =3D cur_step_pc + insn.imm_signed ();
> +=09      next_pcs.push_back (next_pc);
> +=09    }
> +=09  else
> +=09    break;
> +=09}
> +      /* Look for a paired SC instruction which closes the atomic sequen=
ce. */
Add an extra space after final '.' in comment.
> +      else if ((insn.opcode () =3D=3D riscv_insn::SC_D && lr_opcode =3D=
=3D riscv_insn::LR_D)
> +=09       || (insn.opcode () =3D=3D riscv_insn::SC_W && lr_opcode =3D=3D=
 riscv_insn::LR_W))

These lines are too long.

> +=09{
> +=09  found_valid_atomic_sequence =3D true;
> +=09}

And this should have the { ... } remove as it is a single statement
block.

This is where I got confused the first time I read this patch.

The comment earlier says:

  /* A lr/sc sequence comprise at most 16 instructions placed
     sequentially in memory.  */

My expectation after reading this was that upon finding the SC
instruction we'd be done, and break out of the loop ... which isn't what
happens.

It was only when I went and read the ISA manual that I realised that the
16 instruction limit includes both the lr/sc loop _and_ the code to
repeat the loop.

I think the comment above the for loop should be extended to include
this information, otherwise, I think this is a little confusing.

I think this comment above the for loop would also be the right place to
comment about the limitations of what we're checking for here.  I think
there's two broad things we're not doing:

  (a) As Palmer pointed out, the atomic limitations only apply to the
      code that is actually executed, so really we should be emulating
      each instruction and checking what is actually going to be
      executed, and

  (b) The lr/sc are required to be for the same target address.
      However, to check this in the general case we would need to
      emulate all the instructions up to the sc instruction in order to
      check that the address it uses has not been modified.


> +      else
> +=09break;
> +    }
> +
> +  if (!found_valid_atomic_sequence)
> +    return {};
> =20
>    /* Next instruction should be branch to start.  */
>    insn.decode (gdbarch, cur_step_pc);
>    if (insn.opcode () !=3D riscv_insn::BNE)
> -    return false;
> +    return {};
>    if (pc !=3D (cur_step_pc + insn.imm_signed ()))
> -    return false;
> -  cur_step_pc =3D cur_step_pc + insn.length ();
> +    return {};
> +  cur_step_pc +=3D insn.length ();
> =20
> -  /* We should now be at the end of the sequence.  */
> -  if (cur_step_pc !=3D last_addr)
> -    return false;
> +  /* Remove all PCs that jump within the sequence. */
> +  auto matcher =3D [cur_step_pc] (const CORE_ADDR addr)

My preference would be to write:

  auto matcher =3D [cur_step_pc] (const CORE_ADDR addr) -> bool

we seem to use a mix in GDB, so if you are really opposed to this change
then that's fine.  I like being able to see the type of MATCHER without
having to read the function body.

Thanks,
Andrew

> +    {
> +      return addr < cur_step_pc;
> +    };
> +  auto it =3D std::remove_if (next_pcs.begin (), next_pcs.end (), matche=
r);
> +  next_pcs.erase (it, next_pcs.end ());
> =20
> -  *next_pc =3D cur_step_pc;
> -  return true;
> +  next_pc =3D cur_step_pc;
> +  next_pcs.push_back (next_pc);
> +  return next_pcs;
>  }
> =20
>  /* This is called just before we want to resume the inferior, if we want=
 to
> @@ -4458,14 +4655,14 @@ riscv_next_pc_atomic_sequence (struct regcache *r=
egcache, CORE_ADDR pc,
>  std::vector<CORE_ADDR>
>  riscv_software_single_step (struct regcache *regcache)
>  {
> -  CORE_ADDR pc, next_pc;
> -
> -  pc =3D regcache_read_pc (regcache);
> +  CORE_ADDR cur_pc =3D regcache_read_pc (regcache), next_pc;
> +  std::vector<CORE_ADDR> next_pcs
> +    =3D riscv_deal_with_atomic_sequence (regcache, cur_pc);
> =20
> -  if (riscv_next_pc_atomic_sequence (regcache, pc, &next_pc))
> -    return {next_pc};
> +  if (!next_pcs.empty ())
> +    return next_pcs;
> =20
> -  next_pc =3D riscv_next_pc (regcache, pc);
> +  next_pc =3D riscv_next_pc (regcache, cur_pc);
> =20
>    return {next_pc};
>  }
> --=20
> 2.34.1