public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb/linespec: Improve sliding breakpoints
@ 2022-06-17 16:07 Lancelot SIX
  2022-08-26  9:42 ` Bruno Larsen
  0 siblings, 1 reply; 3+ messages in thread
From: Lancelot SIX @ 2022-06-17 16:07 UTC (permalink / raw)
  To: gdb-patches; +Cc: lsix, Lancelot SIX

Hi,

Here is proposition to slightly update the breakpoint sliding algorithm
in GDB.

This patch must be applied on top of
https://sourceware.org/pipermail/gdb-patches/2022-April/187686.html
which was submitted by Simon Marchi a couple of months.  This needs
small updates as the BLOCK_ENTRY_PC and SYMBOL_BLOCK_VALUE are no
longer used.  As a consequence

  BLOCK_ENTRY_PC (SYMBOL_BLOCK_VALUE (sym))

should be replaced with

  sym->value_block ()->entry_pc ()

This patch have been regression tested on x86_64-Linux.  All comments
and feedbacks are welcome.

Best,
Lancelot.

---

The algorithm used to place a breakpoint by line approximately goes as
follow:

- Look for all the PCs which maps directly to the desired line
- If any PC is found, place a breakpoint at each found PC.
- If no PC maps to the desired line:
  - Search for the lower line number in the line table which is higher
    than the desired line, and has some object code associated to it.
  - List all PCs which maps to this line number.
  - Place a breakpoint at each PC in the list above.

There are situations where this heuristic is not sufficient.  If we
consider the following c++ code:

    12 template<typename T>
    13 T func (T a, T b)
    14 {
    15   /* Break here.  */
    16   if constexpr (std::is_integral_t<T>)
    17     return a;
    18   else
    19     return b;
    20 }

Given the current algorithm, if the user tries to set a breakpoint at
line 15, here is what happens:

- Search PCs mapping to L15, there is none ;
- Search for the first line number above L16 with code, this is L17 ;
- List all PCs mapping to L17 and a breakpoints at insert a breakpoint at
  each ;

Now lets suppose that this templated function has been instantiated both
for an integral type (int) and non-integral type (float).  If we follow
the algorithm, no breakpoint will ever be inserted in func<float>,
while there will be a breakpoint in func<int>.

I came across a less obvious variation of this case where an inlined
function had different optimization opportunities for different instances.
As a consequence, some instances had the first actual line of code
different than the other, making the breakpoint be missed and the user
confused.  To add to the confusion, the instances which have code mapped
to the lowest line above the user specified linespec only appeared in a
shared library conditionally loaded, making the breakpoint stop depend on
whether or not the said shared library was loaded or not.

This patch is a proposition to adapt the algorithm in order to cover the
scenario exposed above.  To do so, this patch ends up scanning
independently for the best candidate in the various blocks composing
the line table structures (a block being a succession of
linetable_entries terminated by an entry with line == 0).

The line table for the example program above looks something like:

    (gdb) maintenance info line-table missed
    symtab: /[...]/missed_pb.cc ((struct symtab *) 0x6210000f0500)
    linetable: ((struct linetable *) 0x6210001a1220):
    INDEX  LINE   ADDRESS            IS-STMT PROLOGUE-END
    ...
    17     END    0x00000000000011d3 Y
    18     13     0x00000000000011d3 Y                    <-- func<float>
    19     19     0x00000000000011e5 Y
    20     20     0x00000000000011ea Y
    21     END    0x00000000000011ec Y
    22     13     0x00000000000011ec Y                    <-- func<int>
    23     17     0x00000000000011fa Y
    24     20     0x00000000000011fd Y
    25     END    0x00000000000011ff Y

The first block listed (starting at 0x11d3) is func<float>, and the
second starting at 0x11ec is func<int>.

The proposed algorithm analyzes each block independently.  The first step
is to decide if a block contains the line the user is interested in (a.k.a
the target).  In our example, this is L15.  A block is considered to match
if it contains at least one instruction mapped to a line before the target
and at least one instruction mapped to a line after the target.  In other
words this means that will select all blocks where:

- there is at least one instruction mapped to a line L such as L <= target
- there is at least one instruction mapped to a line M such as M >= target

For each block which satisfies this condition, we select the lowest line
number above our target, and select all instructions mapping to this line
number (there might be multiple).  In our example the func<float> block
satisfies the condition (we have line 13 and 19), so we insert breakpoints
at the PCs mapped to L19.  Similarly, the func<int> block is selected
(thanks to L13 and L17), and we insert breakpoints at all PCs mapped to
L17.

The core of this heuristic is implemented in find_best_le_for_symtab_line.

There are however some cases which are still problematic, where the
first line after the target is the first line of the block we are
looking for.  In such situation the algorithm would find nothing where
we would want to select 0x11d3 and 0x11ec (both mapping to L13).  In
order to solve this, the algorithm first looks for the lowest line
number following the user specified target as this is done before this
patch (here the first line with associated code following L12 which is
L13).  Once this is figured out, this new target (L13) is passed to the
algorithm above (find_best_le_for_symtab_line).

This is safe to do because we know that there is no line between the user
target (included) and this new target.  Therefore, whatever
find_best_le_for_symtab_line returns, it must map to lines numbers above
(inclusive) the new target.

Finally, the testsuite showed an interesting regression.  Let's suppose
we have:

    1 struct Foo { Foo () {} };
    2 Foo f1;
    3 void bar ()
    4 {
    5   /* break here.  */
    6   do_something ();
    7 }
    8 Foo f2;

Inspecting the linetable, we have a code block containing L2 and L8 (so
over line 5).  If a user was to insert a breakpoint somewhere in L5, the
algorithm shown above would also insert a breakpoint at L8.  To solve
this, this patch uses the fact that L2 and L8 are, from dwarf's
perspective in a artificial block which corresponds to the static
initialization.  So if among multiple candidates, some are part of a
virtual function and some are not, we only keep the ones in the non
virtual function.

This proposed new algorithm will not solve all possible cases.  For
example, if the func function is inlined multiple times in the same block
for different template parameters, the lookup will process all of those as
part of the same scope.  Solving such situation would require heavier
works.  I think that this change, while not covering every scenarios is
still a step in the right direction.

All feedbacks and thoughts are welcome.

Regression tested on x86_64 GNU/Linux.

Change-Id: I8291f9e1a3522934a0f418dfc565cebb86cff730
---
 gdb/linespec.c                                | 93 ++++++++++++++++++-
 gdb/symtab.c                                  | 45 +++++++++
 gdb/symtab.h                                  | 14 +++
 .../gdb.linespec/breakpoint-sliding.cc        | 70 ++++++++++++++
 .../gdb.linespec/breakpoint-sliding.exp       | 82 ++++++++++++++++
 5 files changed, 301 insertions(+), 3 deletions(-)
 create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
 create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.exp

diff --git a/gdb/linespec.c b/gdb/linespec.c
index da9e35c5035..1e8679cd027 100644
--- a/gdb/linespec.c
+++ b/gdb/linespec.c
@@ -2092,9 +2092,42 @@ create_sals_line_offset (struct linespec_state *self,
       if (intermediate_results.empty () && best_entry != NULL)
 	{
 	  was_exact = false;
-	  intermediate_results = decode_digits_ordinary (self, ls,
-							 best_entry->line,
-							 &best_entry);
+	  /* We did not find an exact match for the line we are looking for.
+	     As a consequence, we are now looking for the next best line.
+	     However, the next best line does not have to be the same across
+	     the entire program.  If we consider the following code:
+
+	       12 template<typename T>
+	       13 T select (T a, T b)
+	       14 {
+	       15   // Break here.
+	       16   if constexpr (std::is_integral_t<T>)
+	       17     return a;
+	       18   else
+	       19     return b;
+	       20 }
+
+	     For a function like the above, the next best line is L17 if T is
+	     an integral type, but should be L19 if T is not.  As a
+	     consequence we try to search for the next best line independently
+	     in each code block (a code block being a contiguous section of a
+	     line table terminated by an entry with line == 0).  */
+	  for (const auto &elt : ls->file_symtabs)
+	    {
+	      program_space *pspace = elt->compunit ()->objfile ()->pspace;
+	      for (const linetable_entry *le
+		   : find_best_le_for_symtab_line (elt->linetable (),
+						   best_entry->line))
+		{
+		  symtab_and_line sal;
+		  sal.pspace = pspace;
+		  sal.symtab = elt;
+		  sal.line = le->line;
+		  sal.explicit_line = true;
+		  sal.pc = le->pc;
+		  intermediate_results.push_back (std::move (sal));
+		}
+	    }
 	}
 
       /* For optimized code, the compiler can scatter one source line
@@ -2110,6 +2143,25 @@ create_sals_line_offset (struct linespec_state *self,
       gdb::def_vector<int> filter (intermediate_results.size ());
       gdb::def_vector<const block *> blocks (intermediate_results.size ());
 
+      /* Artificial code blocks can produce false positives.  Consider
+
+	   1 struct Foo { Foo () {} };
+	   2 Foo f1;
+	   3 void bar ()
+	   4 {
+	   5   // break here.
+	   6   do_something ();
+	   7 }
+	   8 Foo f2;
+
+	 In this example, if the user asked for a breakpoint in L5, the above
+	 algorithm will select L6 and L8.  L6 is selected because the we have
+	 an artificial block (marked by DW_AT_artificial) generated to cover
+	 the static initializations and this block covers code from line 2 to
+	 line 8.  To avoid this false positive, reject candidates in artificial
+	 blocks if we also have candidates in non artificial blocks.  */
+      bool any_nonartificial = false;
+
       for (i = 0; i < intermediate_results.size (); ++i)
 	{
 	  set_current_program_space (intermediate_results[i].pspace);
@@ -2117,6 +2169,13 @@ create_sals_line_offset (struct linespec_state *self,
 	  filter[i] = 1;
 	  blocks[i] = block_for_pc_sect (intermediate_results[i].pc,
 					 intermediate_results[i].section);
+
+	  if (!any_nonartificial && blocks[i] != nullptr)
+	    {
+	      struct symbol *func = block_containing_function (blocks[i]);
+	      any_nonartificial
+		= (func != nullptr && !func->is_artificial ());
+	    }
 	}
 
       for (i = 0; i < intermediate_results.size (); ++i)
@@ -2171,6 +2230,34 @@ create_sals_line_offset (struct linespec_state *self,
 		&& val.line < sym->line ())
 	      continue;
 
+	    /* If we did not find an exact match, the above heuristic might
+	       find some false positives.
+
+	       Lets consider (with Foo having a non trivial ctor):
+
+		    9 struct Foo { Foo () {} };
+		   10 Foo f1;
+		   11 int bar ()
+		   12 {
+		   13   // Break here.
+		   14   return 42;
+		   15 }
+		   16 Foo f2;
+
+		When the user tries to insert a breakpoint it L13, the
+		algorithm sees a code path from L10 to L16 (in static
+		initialization), and will consider L16 as a candidate.
+		However the user clearly intended to break in bar.  Therefore,
+		if we see that there is at least 1 breakpoint in a non
+		artificial function (bar), ignore the hit in the static
+		initialization.
+	       */
+	    if (!was_exact
+		&& any_nonartificial
+		&& sym != nullptr
+		&& sym->is_artificial ())
+	      continue;
+
 	    if (self->funfirstline)
 	      skip_prologue_sal (sal);
 
diff --git a/gdb/symtab.c b/gdb/symtab.c
index 03aa2a96b87..1b03e6fce8e 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -3541,6 +3541,51 @@ find_pcs_for_symtab_line (struct symtab *symtab, int line,
   return result;
 }
 
+/* See symtab.h.  */
+
+std::vector<linetable_entry *>
+find_best_le_for_symtab_line (struct linetable *lt, int line)
+{
+  std::vector<linetable_entry *> res;
+
+  if (lt == nullptr || line < 0)
+    return res;
+
+  /* Tells if in the current block we found at least one PC mapping to a line
+     before LINE.  */
+  bool found_preceding = false;
+  const int nitems = lt->nitems;
+  std::vector<linetable_entry *> best_candidates;
+
+  for (int i = 0; i < nitems; ++i)
+    {
+      struct linetable_entry &item = lt->item[i];
+      found_preceding
+	= found_preceding || (item.line > 0 && item.line <= line);
+
+      if (item.is_stmt && item.line >= line
+	  && (best_candidates.empty ()
+	      || item.line <= best_candidates.front ()->line))
+	{
+	  if (!best_candidates.empty ()
+	      && item.line < best_candidates.front ()->line)
+	    best_candidates.clear ();
+	  best_candidates.push_back (&item);
+	}
+      else if (item.line == 0)
+	{
+	  if (found_preceding)
+	    res.insert (res.end (), best_candidates.begin (),
+			best_candidates.end ());
+
+	  found_preceding = false;
+	  best_candidates.clear ();
+	}
+    }
+
+  return res;
+}
+
 \f
 /* Set the PC value for a given source file and line number and return true.
    Returns false for invalid line number (and sets the PC to 0).
diff --git a/gdb/symtab.h b/gdb/symtab.h
index ac902a4cbbe..94bea02cb43 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -2624,6 +2624,20 @@ void iterate_over_symtabs (const char *name,
 std::vector<CORE_ADDR> find_pcs_for_symtab_line
     (struct symtab *symtab, int line, struct linetable_entry **best_entry);
 
+/* Given a linetable, look for entries which can map to line LINE.
+
+   The result is the set of entries which:
+   1) Are part of a block (i.e. a sequence of entries terminated by an entry
+      with line == 0) containing at least one statement entry with line above
+      or equal to LINE, and at least one statement entry with line below or
+      equal to LINE.
+   2) Are statement entries (is_stmt == 1)
+   3) Have line equal to the lowest statement line number greater or equal to
+      LINE within the block they are part of.  */
+
+std::vector<linetable_entry *> find_best_le_for_symtab_line
+    (struct linetable *lt, int line);
+
 /* Prototype for callbacks for LA_ITERATE_OVER_SYMBOLS.  The callback
    is called once per matching symbol SYM.  The callback should return
    true to indicate that LA_ITERATE_OVER_SYMBOLS should continue
diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
new file mode 100644
index 00000000000..3ae6b563cdf
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
@@ -0,0 +1,70 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#if __cplusplus >= 201703L
+#include <type_traits>
+
+template<typename T>
+T
+foo (T a, T b)
+{
+  T ret;
+  /* In foo break here.  */
+  if constexpr (std::is_integral_v<T>)
+    ret = a; /* break in foo<int>.  */
+  else
+    ret = b; /* break in foo<NonIntegral>.  */
+
+  return ret;
+}
+
+struct NonIntegral
+{
+  int m;
+};
+
+#endif
+
+struct St
+{
+  St ()
+  { /* Empty.  */ }
+};
+
+St st1;
+
+static int
+between_static_init ()
+{
+  /* between_static_init break here.  */
+  return 42; /* break in between_static_init.  */
+}
+
+St st2;
+
+int main ()
+{
+
+#if __cplusplus >= 201703L
+  foo<int> (1, 2);
+  foo<NonIntegral> ({ 1 }, { 2 });
+#endif
+
+  between_static_init ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
new file mode 100644
index 00000000000..9b79c419e9b
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
@@ -0,0 +1,82 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This testcase exercises breakpoint sliding when the user tries to set
+# a breakpoint at a line number which has no code associated to it.  GDB
+# should then place breakpoint at the first line following the one given
+# by the user in each code block composing the program space, given that
+# the code block "covers" the line asked by the user.
+
+if { [skip_cplus_tests] } {
+  return -1
+}
+
+standard_testfile .cc
+
+# Part of the test require c++17.  If the used compiler does not support
+# it, fallback only doing part of the tests.
+set skip_cpp17 0
+
+if {[prepare_for_testing "failed to prepare with c++17" $testfile \
+			 $srcfile {debug c++ additional_flags=-std=c++17}]} {
+    set skip_cpp17 1
+    if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+			     {debug c++}]} {
+	return -1
+    }
+}
+
+proc_with_prefix test_sliding_per_block {} {
+    if {$::skip_cpp17} {
+	return
+    }
+
+    set loc [gdb_get_line_number "In foo break here."]
+    gdb_test "break $loc" \
+	"Breakpoint $::decimal.* \\(2 locations\\)" \
+	"slide to if constexpr"
+
+    set loc_in_int_instance [gdb_get_line_number "break in foo<int>"]
+    set loc_in_struct_instance [gdb_get_line_number "break in foo<NonIntegral>"]
+    set seen_int_pb 0
+    set seen_struct_bp 0
+    gdb_test_multiple "info breakpoint" "info breakpoint" {
+	-re "in foo<int>\\(int, int\\) at \[^\n\r\]*$::srcfile:$loc_in_int_instance" {
+	    set seen_int_pb 1
+	    verbose -log "found int"
+	    exp_continue
+	}
+	-re "in foo<NonIntegral>\\(NonIntegral, NonIntegral\\) at \[^\n\r\]*$::srcfile:$loc_in_struct_instance" {
+	    set seen_struct_bp 1
+	    verbose -log "found struct"
+	    exp_continue
+	}
+	-re -wrap "" {
+	    gdb_assert {$seen_int_pb && $seen_struct_bp} $gdb_test_name
+	}
+    }
+}
+
+proc_with_prefix test_ignore_artificial_block {} {
+    set loc  [gdb_get_line_number "between_static_init break here"]
+    set expected_loc [gdb_get_line_number "break in between_static_init."]
+    gdb_test "break $loc" \
+	"Breakpoint $::decimal at $::hex: file .*/$::srcfile, line $expected_loc\\." \
+	"ignore static block"
+}
+
+clean_restart $::binfile
+test_sliding_per_block
+test_ignore_artificial_block

base-commit: 5fb28d2607a8325559b44a5dc0c8760236c81218
prerequisite-patch-id: 695c74160156ff2c66b3a88462be22a36c32f141
-- 
2.25.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] gdb/linespec: Improve sliding breakpoints
  2022-06-17 16:07 [PATCH] gdb/linespec: Improve sliding breakpoints Lancelot SIX
@ 2022-08-26  9:42 ` Bruno Larsen
  2022-09-01  9:04   ` Lancelot SIX
  0 siblings, 1 reply; 3+ messages in thread
From: Bruno Larsen @ 2022-08-26  9:42 UTC (permalink / raw)
  To: Lancelot SIX, gdb-patches; +Cc: lsix


On 17/06/2022 18:07, Lancelot SIX via Gdb-patches wrote:
> Hi,
>
> Here is proposition to slightly update the breakpoint sliding algorithm
> in GDB.
>
> This patch must be applied on top of
> https://sourceware.org/pipermail/gdb-patches/2022-April/187686.html
> which was submitted by Simon Marchi a couple of months.  This needs
> small updates as the BLOCK_ENTRY_PC and SYMBOL_BLOCK_VALUE are no
> longer used.  As a consequence
>
>    BLOCK_ENTRY_PC (SYMBOL_BLOCK_VALUE (sym))
>
> should be replaced with
>
>    sym->value_block ()->entry_pc ()
>
> This patch have been regression tested on x86_64-Linux.  All comments
> and feedbacks are welcome.
>
> Best,
> Lancelot.
>
> ---
>
> The algorithm used to place a breakpoint by line approximately goes as
> follow:
>
> - Look for all the PCs which maps directly to the desired line
> - If any PC is found, place a breakpoint at each found PC.
> - If no PC maps to the desired line:
>    - Search for the lower line number in the line table which is higher
>      than the desired line, and has some object code associated to it.
>    - List all PCs which maps to this line number.
>    - Place a breakpoint at each PC in the list above.
>
> There are situations where this heuristic is not sufficient.  If we
> consider the following c++ code:
>
>      12 template<typename T>
>      13 T func (T a, T b)
>      14 {
>      15   /* Break here.  */
>      16   if constexpr (std::is_integral_t<T>)
>      17     return a;
>      18   else
>      19     return b;
>      20 }
>
> Given the current algorithm, if the user tries to set a breakpoint at
> line 15, here is what happens:
>
> - Search PCs mapping to L15, there is none ;
> - Search for the first line number above L16 with code, this is L17 ;
> - List all PCs mapping to L17 and a breakpoints at insert a breakpoint at
>    each ;
>
> Now lets suppose that this templated function has been instantiated both
> for an integral type (int) and non-integral type (float).  If we follow
> the algorithm, no breakpoint will ever be inserted in func<float>,
> while there will be a breakpoint in func<int>.
>
> I came across a less obvious variation of this case where an inlined
> function had different optimization opportunities for different instances.
> As a consequence, some instances had the first actual line of code
> different than the other, making the breakpoint be missed and the user
> confused.  To add to the confusion, the instances which have code mapped
> to the lowest line above the user specified linespec only appeared in a
> shared library conditionally loaded, making the breakpoint stop depend on
> whether or not the said shared library was loaded or not.
>
> This patch is a proposition to adapt the algorithm in order to cover the
> scenario exposed above.  To do so, this patch ends up scanning
> independently for the best candidate in the various blocks composing
> the line table structures (a block being a succession of
> linetable_entries terminated by an entry with line == 0).
>
> The line table for the example program above looks something like:
>
>      (gdb) maintenance info line-table missed
>      symtab: /[...]/missed_pb.cc ((struct symtab *) 0x6210000f0500)
>      linetable: ((struct linetable *) 0x6210001a1220):
>      INDEX  LINE   ADDRESS            IS-STMT PROLOGUE-END
>      ...
>      17     END    0x00000000000011d3 Y
>      18     13     0x00000000000011d3 Y                    <-- func<float>
>      19     19     0x00000000000011e5 Y
>      20     20     0x00000000000011ea Y
>      21     END    0x00000000000011ec Y
>      22     13     0x00000000000011ec Y                    <-- func<int>
>      23     17     0x00000000000011fa Y
>      24     20     0x00000000000011fd Y
>      25     END    0x00000000000011ff Y
>
> The first block listed (starting at 0x11d3) is func<float>, and the
> second starting at 0x11ec is func<int>.
>
> The proposed algorithm analyzes each block independently.  The first step
> is to decide if a block contains the line the user is interested in (a.k.a
> the target).  In our example, this is L15.  A block is considered to match
> if it contains at least one instruction mapped to a line before the target
> and at least one instruction mapped to a line after the target.  In other
> words this means that will select all blocks where:
>
> - there is at least one instruction mapped to a line L such as L <= target
> - there is at least one instruction mapped to a line M such as M >= target
>
> For each block which satisfies this condition, we select the lowest line
> number above our target, and select all instructions mapping to this line
> number (there might be multiple).  In our example the func<float> block
> satisfies the condition (we have line 13 and 19), so we insert breakpoints
> at the PCs mapped to L19.  Similarly, the func<int> block is selected
> (thanks to L13 and L17), and we insert breakpoints at all PCs mapped to
> L17.
>
> The core of this heuristic is implemented in find_best_le_for_symtab_line.
>
> There are however some cases which are still problematic, where the
> first line after the target is the first line of the block we are
> looking for.  In such situation the algorithm would find nothing where
> we would want to select 0x11d3 and 0x11ec (both mapping to L13).  In
> order to solve this, the algorithm first looks for the lowest line
> number following the user specified target as this is done before this
> patch (here the first line with associated code following L12 which is
> L13).  Once this is figured out, this new target (L13) is passed to the
> algorithm above (find_best_le_for_symtab_line).
>
> This is safe to do because we know that there is no line between the user
> target (included) and this new target.  Therefore, whatever
> find_best_le_for_symtab_line returns, it must map to lines numbers above
> (inclusive) the new target.
>
> Finally, the testsuite showed an interesting regression.  Let's suppose
> we have:
>
>      1 struct Foo { Foo () {} };
>      2 Foo f1;
>      3 void bar ()
>      4 {
>      5   /* break here.  */
>      6   do_something ();
>      7 }
>      8 Foo f2;
>
> Inspecting the linetable, we have a code block containing L2 and L8 (so
> over line 5).  If a user was to insert a breakpoint somewhere in L5, the
> algorithm shown above would also insert a breakpoint at L8.  To solve
> this, this patch uses the fact that L2 and L8 are, from dwarf's
> perspective in a artificial block which corresponds to the static
> initialization.  So if among multiple candidates, some are part of a
> virtual function and some are not, we only keep the ones in the non
> virtual function.
>
> This proposed new algorithm will not solve all possible cases.  For
> example, if the func function is inlined multiple times in the same block
> for different template parameters, the lookup will process all of those as
> part of the same scope.  Solving such situation would require heavier
> works.  I think that this change, while not covering every scenarios is
> still a step in the right direction.
>
> All feedbacks and thoughts are welcome.
>
> Regression tested on x86_64 GNU/Linux.
>
> Change-Id: I8291f9e1a3522934a0f418dfc565cebb86cff730

Hi Lancelot,

Thanks for working on this! I like the idea of the algorithm, but I 
think some explanations could use some work.

> ---
>   gdb/linespec.c                                | 93 ++++++++++++++++++-
>   gdb/symtab.c                                  | 45 +++++++++
>   gdb/symtab.h                                  | 14 +++
>   .../gdb.linespec/breakpoint-sliding.cc        | 70 ++++++++++++++
>   .../gdb.linespec/breakpoint-sliding.exp       | 82 ++++++++++++++++
>   5 files changed, 301 insertions(+), 3 deletions(-)
>   create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
>   create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
>
> diff --git a/gdb/linespec.c b/gdb/linespec.c
> index da9e35c5035..1e8679cd027 100644
> --- a/gdb/linespec.c
> +++ b/gdb/linespec.c
> @@ -2092,9 +2092,42 @@ create_sals_line_offset (struct linespec_state *self,
>         if (intermediate_results.empty () && best_entry != NULL)
>   	{
>   	  was_exact = false;
> -	  intermediate_results = decode_digits_ordinary (self, ls,
> -							 best_entry->line,
> -							 &best_entry);
> +	  /* We did not find an exact match for the line we are looking for.
> +	     As a consequence, we are now looking for the next best line.
> +	     However, the next best line does not have to be the same across
> +	     the entire program.  If we consider the following code:
> +
> +	       12 template<typename T>
> +	       13 T select (T a, T b)
> +	       14 {
> +	       15   // Break here.
> +	       16   if constexpr (std::is_integral_t<T>)
> +	       17     return a;
> +	       18   else
> +	       19     return b;
> +	       20 }
> +
> +	     For a function like the above, the next best line is L17 if T is
> +	     an integral type, but should be L19 if T is not.  As a
> +	     consequence we try to search for the next best line independently
> +	     in each code block (a code block being a contiguous section of a
> +	     line table terminated by an entry with line == 0).  */
> +	  for (const auto &elt : ls->file_symtabs)
This might be a bit nitpicky, but 'elt' was not an intuitive name at all 
for me., especially since you used an "auto" type. Maybe calling it 
"symtab" or something related to it would make reading the code that bit 
more intuitive.
> +	    {
> +	      program_space *pspace = elt->compunit ()->objfile ()->pspace;
> +	      for (const linetable_entry *le
> +		   : find_best_le_for_symtab_line (elt->linetable (),
> +						   best_entry->line))
> +		{
> +		  symtab_and_line sal;
> +		  sal.pspace = pspace;
> +		  sal.symtab = elt;
> +		  sal.line = le->line;
> +		  sal.explicit_line = true;
> +		  sal.pc = le->pc;
> +		  intermediate_results.push_back (std::move (sal));
> +		}
> +	    }
>   	}
>   
>         /* For optimized code, the compiler can scatter one source line
> @@ -2110,6 +2143,25 @@ create_sals_line_offset (struct linespec_state *self,
>         gdb::def_vector<int> filter (intermediate_results.size ());
>         gdb::def_vector<const block *> blocks (intermediate_results.size ());
>   
> +      /* Artificial code blocks can produce false positives.  Consider
> +
> +	   1 struct Foo { Foo () {} };
> +	   2 Foo f1;
> +	   3 void bar ()
> +	   4 {
> +	   5   // break here.
> +	   6   do_something ();
> +	   7 }
> +	   8 Foo f2;
> +
> +	 In this example, if the user asked for a breakpoint in L5, the above
> +	 algorithm will select L6 and L8.  L6 is selected because the we have
> +	 an artificial block (marked by DW_AT_artificial) generated to cover
> +	 the static initializations and this block covers code from line 2 to
> +	 line 8.  To avoid this false positive, reject candidates in artificial
> +	 blocks if we also have candidates in non artificial blocks.  */
> +      bool any_nonartificial = false;
> +

I found the placement of this comment very confusing. It happens right 
after the explanation of a completely different algorithm, but before 
the execution of that algorithm, so I spent a while trying to understand 
if you replaced that or how they would interact with each other.

I would suggest that you declare this variable before the declarations 
of the filter and block vectors, and have the comments happen inside the 
loop below, so that you don't have 2 explanations before both things 
happen at the same time.

>         for (i = 0; i < intermediate_results.size (); ++i)
>   	{
>   	  set_current_program_space (intermediate_results[i].pspace);
> @@ -2117,6 +2169,13 @@ create_sals_line_offset (struct linespec_state *self,
>   	  filter[i] = 1;
>   	  blocks[i] = block_for_pc_sect (intermediate_results[i].pc,
>   					 intermediate_results[i].section);
> +
> +	  if (!any_nonartificial && blocks[i] != nullptr)
> +	    {
> +	      struct symbol *func = block_containing_function (blocks[i]);
> +	      any_nonartificial
> +		= (func != nullptr && !func->is_artificial ());
> +	    }
>   	}
>   
>         for (i = 0; i < intermediate_results.size (); ++i)
> @@ -2171,6 +2230,34 @@ create_sals_line_offset (struct linespec_state *self,
>   		&& val.line < sym->line ())
>   	      continue;
>   
> +	    /* If we did not find an exact match, the above heuristic might
> +	       find some false positives.
> +
> +	       Lets consider (with Foo having a non trivial ctor):
> +
> +		    9 struct Foo { Foo () {} };
> +		   10 Foo f1;
> +		   11 int bar ()
> +		   12 {
> +		   13   // Break here.
> +		   14   return 42;
> +		   15 }
> +		   16 Foo f2;
> +
> +		When the user tries to insert a breakpoint it L13, the
> +		algorithm sees a code path from L10 to L16 (in static
> +		initialization), and will consider L16 as a candidate.
> +		However the user clearly intended to break in bar.  Therefore,
> +		if we see that there is at least 1 breakpoint in a non
> +		artificial function (bar), ignore the hit in the static
> +		initialization.
> +	       */

This comment confuses me a little. It is very close to the other one, 
and is the final step of the algorithm you outlined there, but you 
decide to re-explain everything again, using a different code sample and 
different line numbers. I think having it here causes more confusion, as 
the reader would expect some new insight  or twist in the algorithm, and 
spend a lot of time picking both comments apart to find their 
differences. Or at least that is what I did.

If there are differences, using the same code block and highlighting 
only what is new might be a better way to go, but if I understood it 
correctly, I don't see an actual need for this comment.

> +	    if (!was_exact
> +		&& any_nonartificial
> +		&& sym != nullptr
> +		&& sym->is_artificial ())
> +	      continue;
> +
>   	    if (self->funfirstline)
>   	      skip_prologue_sal (sal);
>   
> diff --git a/gdb/symtab.c b/gdb/symtab.c
> index 03aa2a96b87..1b03e6fce8e 100644
> --- a/gdb/symtab.c
> +++ b/gdb/symtab.c
> @@ -3541,6 +3541,51 @@ find_pcs_for_symtab_line (struct symtab *symtab, int line,
>     return result;
>   }
>   
> +/* See symtab.h.  */
> +
> +std::vector<linetable_entry *>
> +find_best_le_for_symtab_line (struct linetable *lt, int line)
> +{
> +  std::vector<linetable_entry *> res;
> +
> +  if (lt == nullptr || line < 0)
> +    return res;
Are we expecting the linetable to possibly be null, or the line possibly 
be negative? I would prefer asserts here, instead, if possible.
> +
> +  /* Tells if in the current block we found at least one PC mapping to a line
> +     before LINE.  */
> +  bool found_preceding = false;
> +  const int nitems = lt->nitems;
> +  std::vector<linetable_entry *> best_candidates;
> +
> +  for (int i = 0; i < nitems; ++i)
> +    {
> +      struct linetable_entry &item = lt->item[i];
> +      found_preceding
> +	= found_preceding || (item.line > 0 && item.line <= line);
> +
> +      if (item.is_stmt && item.line >= line
> +	  && (best_candidates.empty ()
> +	      || item.line <= best_candidates.front ()->line))
> +	{
> +	  if (!best_candidates.empty ()
> +	      && item.line < best_candidates.front ()->line)
> +	    best_candidates.clear ();
> +	  best_candidates.push_back (&item);
> +	}
> +      else if (item.line == 0)
> +	{
> +	  if (found_preceding)
> +	    res.insert (res.end (), best_candidates.begin (),
> +			best_candidates.end ());
> +
> +	  found_preceding = false;
> +	  best_candidates.clear ();
> +	}
> +    }
> +
> +  return res;
> +}
> +
>   \f
>   /* Set the PC value for a given source file and line number and return true.
>      Returns false for invalid line number (and sets the PC to 0).
> diff --git a/gdb/symtab.h b/gdb/symtab.h
> index ac902a4cbbe..94bea02cb43 100644
> --- a/gdb/symtab.h
> +++ b/gdb/symtab.h
> @@ -2624,6 +2624,20 @@ void iterate_over_symtabs (const char *name,
>   std::vector<CORE_ADDR> find_pcs_for_symtab_line
>       (struct symtab *symtab, int line, struct linetable_entry **best_entry);
>   
> +/* Given a linetable, look for entries which can map to line LINE.
> +
> +   The result is the set of entries which:
> +   1) Are part of a block (i.e. a sequence of entries terminated by an entry
> +      with line == 0) containing at least one statement entry with line above
> +      or equal to LINE, and at least one statement entry with line below or
> +      equal to LINE.
> +   2) Are statement entries (is_stmt == 1)
> +   3) Have line equal to the lowest statement line number greater or equal to
> +      LINE within the block they are part of.  */

I found this explanation a little confusing. I think the following would 
make the comment easier to understand:

/* Given a linetable LT, return the best entries to map to line LINE.
     An entry must follow these conditions to be valid:
     1) It must be a part of a block (i.e. a sequence of entries 
terminated by an entry with line == 0) containing at least one statement 
above LINE and one equal to or below LINE.
     2) The entry must be a statement.
     3) The line associated with this statement must be equal to or 
higher than LINE
     The best entry for any given block will then be the one with the 
lowest line. */

This comment is obviously not perfect,  but I think it lays out better 
what are conditions for entries to be valid, and then how to evaluate 
these entries.

> +
> +std::vector<linetable_entry *> find_best_le_for_symtab_line
> +    (struct linetable *lt, int line);
> +
>   /* Prototype for callbacks for LA_ITERATE_OVER_SYMBOLS.  The callback
>      is called once per matching symbol SYM.  The callback should return
>      true to indicate that LA_ITERATE_OVER_SYMBOLS should continue
> diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
> new file mode 100644
> index 00000000000..3ae6b563cdf
> --- /dev/null
> +++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
> @@ -0,0 +1,70 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#if __cplusplus >= 201703L
> +#include <type_traits>
> +
> +template<typename T>
> +T
> +foo (T a, T b)
> +{
> +  T ret;
> +  /* In foo break here.  */
> +  if constexpr (std::is_integral_v<T>)
> +    ret = a; /* break in foo<int>.  */
> +  else
> +    ret = b; /* break in foo<NonIntegral>.  */
> +
> +  return ret;
> +}
> +
> +struct NonIntegral
> +{
> +  int m;
> +};
> +
> +#endif
> +
> +struct St
> +{
> +  St ()
> +  { /* Empty.  */ }
> +};
> +
> +St st1;
> +
> +static int
> +between_static_init ()
> +{
> +  /* between_static_init break here.  */
> +  return 42; /* break in between_static_init.  */
> +}
> +
> +St st2;
> +
> +int main ()
> +{
> +
> +#if __cplusplus >= 201703L
> +  foo<int> (1, 2);
> +  foo<NonIntegral> ({ 1 }, { 2 });
> +#endif
> +
> +  between_static_init ();
> +
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> new file mode 100644
> index 00000000000..9b79c419e9b
> --- /dev/null
> +++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> @@ -0,0 +1,82 @@
> +# Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# This testcase exercises breakpoint sliding when the user tries to set
> +# a breakpoint at a line number which has no code associated to it.  GDB
> +# should then place breakpoint at the first line following the one given
> +# by the user in each code block composing the program space, given that
> +# the code block "covers" the line asked by the user.
> +
> +if { [skip_cplus_tests] } {
> +  return -1
> +}
> +
> +standard_testfile .cc
> +
> +# Part of the test require c++17.  If the used compiler does not support
> +# it, fallback only doing part of the tests.
> +set skip_cpp17 0
> +
> +if {[prepare_for_testing "failed to prepare with c++17" $testfile \
> +			 $srcfile {debug c++ additional_flags=-std=c++17}]} {
> +    set skip_cpp17 1
> +    if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +			     {debug c++}]} {
> +	return -1
> +    }
> +}
> +
> +proc_with_prefix test_sliding_per_block {} {
> +    if {$::skip_cpp17} {
> +	return
> +    }
> +
> +    set loc [gdb_get_line_number "In foo break here."]
> +    gdb_test "break $loc" \
> +	"Breakpoint $::decimal.* \\(2 locations\\)" \
> +	"slide to if constexpr"
> +
> +    set loc_in_int_instance [gdb_get_line_number "break in foo<int>"]
> +    set loc_in_struct_instance [gdb_get_line_number "break in foo<NonIntegral>"]
> +    set seen_int_pb 0
> +    set seen_struct_bp 0
> +    gdb_test_multiple "info breakpoint" "info breakpoint" {
> +	-re "in foo<int>\\(int, int\\) at \[^\n\r\]*$::srcfile:$loc_in_int_instance" {
> +	    set seen_int_pb 1
> +	    verbose -log "found int"
This seems to be leftover from testing, I guess. Same for "found struct" 
below.
> +	    exp_continue
> +	}
> +	-re "in foo<NonIntegral>\\(NonIntegral, NonIntegral\\) at \[^\n\r\]*$::srcfile:$loc_in_struct_instance" {
> +	    set seen_struct_bp 1
> +	    verbose -log "found struct"
> +	    exp_continue
> +	}
> +	-re -wrap "" {
> +	    gdb_assert {$seen_int_pb && $seen_struct_bp} $gdb_test_name
> +	}
> +    }
> +}
> +
> +proc_with_prefix test_ignore_artificial_block {} {
> +    set loc  [gdb_get_line_number "between_static_init break here"]
> +    set expected_loc [gdb_get_line_number "break in between_static_init."]
> +    gdb_test "break $loc" \
> +	"Breakpoint $::decimal at $::hex: file .*/$::srcfile, line $expected_loc\\." \
> +	"ignore static block"
> +}
> +
> +clean_restart $::binfile
> +test_sliding_per_block
> +test_ignore_artificial_block
>
> base-commit: 5fb28d2607a8325559b44a5dc0c8760236c81218
> prerequisite-patch-id: 695c74160156ff2c66b3a88462be22a36c32f141

-- 
Cheers,
Bruno


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] gdb/linespec: Improve sliding breakpoints
  2022-08-26  9:42 ` Bruno Larsen
@ 2022-09-01  9:04   ` Lancelot SIX
  0 siblings, 0 replies; 3+ messages in thread
From: Lancelot SIX @ 2022-09-01  9:04 UTC (permalink / raw)
  To: Bruno Larsen, gdb-patches; +Cc: lsix


> Hi Lancelot,
> 
> Thanks for working on this! I like the idea of the algorithm, but I
> think some explanations could use some work.
> 

Hi,

Thanks a lot for the feedback! I'll prepare a V2 to address your comments.

Best,
Lancelot.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-01  9:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17 16:07 [PATCH] gdb/linespec: Improve sliding breakpoints Lancelot SIX
2022-08-26  9:42 ` Bruno Larsen
2022-09-01  9:04   ` Lancelot SIX

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).