[PATCH] Replace the symbol needs evaluator with a parser

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Replace the symbol needs evaluator with a parser
@ 2020-10-06 14:58 Zoran Zaric
  2020-10-07  3:04 ` Simon Marchi
  0 siblings, 1 reply; 3+ messages in thread
From: Zoran Zaric @ 2020-10-06 14:58 UTC (permalink / raw)
  To: gdb-patches; +Cc: Zoran Zaric

This patch addresses a design problem with the symbol_needs_eval_context
class. It exposes the problem by introducing two new testsuite test
cases.

To explain the issue, I first need to explain the dwarf_expr_context
class that the symbol_needs_eval_context class derives from.

The intention behind the dwarf_expr_context class is to commonize the
DWARF expression evaluation mechanism for different evaluation
contexts. Currently in gdb, the evaluation context can contain some or
all of the following information: architecture, object file, frame and
compilation unit.

Depending on the information needed to evaluate a given expression,
there are currently three distinct DWARF expression evaluators:

 - Frame: designed to evaluate an expression in the context of a call
   frame information (dwarf_expr_executor class). This evaluator doesn’t
   need a compilation unit information.

 - Location description: designed to evaluate an expression in the
   context of a source level information (dwarf_evaluate_loc_desc
   class). This evaluator expects all information needed for the
   evaluation of the given expression to be present.

 - Symbol needs: designed to answer a question about the parts of the
   context information required to evaluate a DWARF expression behind a
   given symbol (symbol_needs_eval_context class). This evaluator
   doesn’t need a frame information.

The functional difference between the symbol needs evaluator and the
others is that this evaluator is not meant to interact with the actual
target. Instead, it is supposed to check which parts of the context
information are needed for the given DWARF expression to be evaluated by
the location description evaluator.

The idea is to take advantage of the existing dwarf_expr_context
evaluation mechanism and to fake all required interactions with the
actual target, by returning back dummy values. The evaluation result is
returned as one of three possible values, based on operations found in a
given expression:

- SYMBOL_NEEDS_NONE,
- SYMBOL_NEEDS_REGISTERS and
- SYMBOL_NEEDS_FRAME.

The problem here is that faking results of target interactions can yield
an incorrect evaluation result.

For example, if we have a conditional DWARF expression, where the
condition depends on a value read from an actual target, and the true
branch of the condition requires a frame information to be evaluated,
while the false branch doesn’t, fake target reads could conclude that a
frame information is not needed, where in fact it is. This wrong
information would then cause the expression to be actually evaluated (by
the location description evaluator) with a missing frame information.
This would then crash the debugger.

The gdb.dwarf2/symbol_needs_eval_fail.exp test introduces this
scenario, with the following DWARF expression:

                   DW_OP_addr $some_variable
                   DW_OP_deref
                   DW_OP_bra 4  # conditional jump to DW_OP_bregx
                   DW_OP_lit0
                   DW_OP_skip 3  # jump to DW_OP_stack_value
                   DW_OP_bregx $dwarf_regnum 0
                   DW_OP_stack_value

This expression describes a case where some variable dictates the
location of another variable. Depending on a value of some_variable, the
variable whose location is described by this expression is either read
from a register or it is defined as a constant value 0. In both cases,
the value will be returned as an implicit location description on the
DWARF stack.

Currently, when the symbol needs evaluator fakes a memory read from the
address behind the some_variable variable, the constant value 0 is used
as the value of the variable A, and the check returns the
SYMBOL_NEEDS_NONE result.

This is clearly a wrong result and it causes the debugger to crash.

The scenario might sound strange to some people, but it comes from a
SIMD/SIMT architecture where $some_variable is an execution mask.  In
any case, it is a valid DWARF expression, and GDB shouldn't crash while
evaluating it. Also, a similar example could be made based on a
condition of the frame base value, where if that value is concluded to
be 0, the variable location could be defaulted to a TLS based memory
address.

The gdb.dwarf2/symbol_needs_eval_timeout.exp test introduces a second
scenario. This scenario is a bit more abstract due to the DWARF
assembler lacking the CFI support, but it exposes a different
manifestation of the same problem. Like in the previous scenario, the
DWARF expression used in the test is valid:

                       DW_OP_lit1
                       DW_OP_addr $some_variable
                       DW_OP_deref
                       DW_OP_skip 4  # jump to DW_OP_fbreg
                       DW_OP_drop
                       DW_OP_fbreg 0
                       DW_OP_dup
                       DW_OP_lit0
                       DW_OP_eq
                       DW_OP_bra -9  # conditional jump to DW_OP_drop
                       DW_OP_stack_value

Similarly to the previous scenario, the location of a variable A is an
implicit location description with a constant value that depends on a
value held by a global variable. The difference from the previous case
is that DWARF expression contains a loop instead of just one branch. The
end condition of that loop depends on the expectation that a frame base
value is never zero. Currently, the act of faking the target reads will
cause the symbol needs evaluator to get stuck in an infinite loop.

Somebody could argue that we could change the fake reads to return
something else, but that would only hide the real problem.

The general impression seems to be that the desired design is to have
one class that deals with parsing of the DWARF expression, while there
are virtual methods that deal with specifics of some operations.

Using an evaluator mechanism here doesn’t seem to be correct, because
the act of evaluation relies on accessing the data from the actual
target with the possibility of skipping the evaluation of some parts of
the expression.

To better explain the proposed solution for the issue, I first need to
explain a couple more details behind the current design:

There are multiple places in gdb that handle DWARF expression parsing
for different purposes. Some are in charge of converting the expression
to some other internal representation (decode_location_expression,
disassemble_dwarf_expression and dwarf2_compile_expr_to_ax), some are
analysing the expression for specific information
(compute_stack_depth_worker) and some are in charge of evaluating the
expression in a given context (dwarf_expr_context::execute_stack_op
and decode_locdesc).

The problem is that all those functions have a similar (large) switch
statement that handles each DWARF expression operation. The result of
this is a code duplication and harder maintenance.

As a step into the right direction to solve this problem (at least for
the purpose of a DWARF expression evaluation) the expression parsing was
commonized inside of an evaluator base class (dwarf_expr_context). This
makes sense for all derived classes, except for the symbol needs
evaluator (symbol_needs_eval_context) class.

As described previously the problem with this evaluator is that if the
evaluator is not allowed to access the actual target, it is not really
evaluating.

Instead, the desired function of a symbol needs evaluator seems to fall
more into expression analysis category. This means that a more natural
fit for this evaluator is to be a symbol needs analysis, similar to the
existing compute_stack_depth_worker analysis.

Another problem is that using a heavyweight mechanism of an evaluator
to do an expression analysis seems to be an unneeded overhead. It also
requires a more complicated design of the parent class to support fake
target reads.

The reality is that the whole symbol_needs_eval_context class can be
replaced with a lightweight recursive analysis function, that will give
more correct result without compromising the design of the
dwarf_expr_context class.

The downside of this approach is adding of one more similar switch
statement, but at least this way the new symbol needs analysis will be
a lightweight mechnism and it will provide a correct result for any
given DWARF expression.

A more desired long term design would be to have one class that deals
with parsing of the DWARF expression, while there would be a virtual
methods that deal with specifics of some DWARF operations. Then that
class would be used as a base for all DWARF expression parsing mentioned
at the beginning.

This however, requires a far bigger changes that are out of the scope
of this patch series.

To support the new implementation, I have also added two new self tests,
found in a file dwarf2/loc.c (named symbol_needs_cond_nonzero and
symbol_needs_cond_zero), which expose a similar problem to the one
described in the first testsuite test case. The difference is that the
global some_variable is replaced with a use of a
DW_OP_push_object_address operation. The idea is for the symbol needs
evaluator to return the SYMBOL_NEEDS_FRAME value, regardless of the
evaluation of that operation.

The new analysis requires for the py-framefilter-invalidarg.exp test to
be changed as well because the test expects a "dwarf expression stack
underflow" error to be reported for the invalid DWARF expression. The
new analysis function does not simulate a DWARF stack anymore, so the
"unhandled DWARF expression opcode" error is going to be reported
instead.

gdb/ChangeLog:

        * dwarf2/loc.c (class symbol_needs_eval_context): Remove.
        (dwarf2_get_symbol_read_needs): New function.
        (dwarf2_loc_desc_get_symbol_read_needs): Remove.
        (locexpr_get_symbol_read_needs): Use
        dwarf2_get_symbol_read_needs.
        (symbol_needs_cond_zero): New function.
        (symbol_needs_cond_nonzero): New function.
        (_initialize_dwarf2loc): Register selftests.

gdb/testsuite/ChangeLog:

        * gdb.python/py-framefilter-invalidarg.exp: Update expected
        backtrace pattern.
        * lib/dwarf.exp (_location): Handle DW_OP_fbreg.
        * gdb.dwarf2/symbol_needs_eval.c: New file.
        * gdb.dwarf2/symbol_needs_eval_fail.exp: New file.
        * gdb.dwarf2/symbol_needs_eval_timeout.exp: New file.
---
 gdb/dwarf2/loc.c                              | 568 ++++++++++++++----
 gdb/testsuite/gdb.dwarf2/symbol_needs_eval.c  |  25 +
 .../gdb.dwarf2/symbol_needs_eval_fail.exp     | 108 ++++
 .../gdb.dwarf2/symbol_needs_eval_timeout.exp  | 127 ++++
 .../gdb.python/py-framefilter-invalidarg.exp  |   2 +-
 gdb/testsuite/lib/dwarf.exp                   |   4 +
 6 files changed, 718 insertions(+), 116 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/symbol_needs_eval.c
 create mode 100644 gdb/testsuite/gdb.dwarf2/symbol_needs_eval_fail.exp
 create mode 100644 gdb/testsuite/gdb.dwarf2/symbol_needs_eval_timeout.exp

diff --git a/gdb/dwarf2/loc.c b/gdb/dwarf2/loc.c
index c18ac7087a..4aa1fc7c4a 100644
--- a/gdb/dwarf2/loc.c
+++ b/gdb/dwarf2/loc.c
@@ -2728,151 +2728,405 @@ dwarf2_compile_property_to_c (string_file *stream,
 			     data, data + size, per_cu, per_objfile);
 }
 
-\f
-/* Helper functions and baton for dwarf2_loc_desc_get_symbol_read_needs.  */
+/* Compute the correct symbol_needs_kind value for the location
+   expression in EXPR.  */
 
-class symbol_needs_eval_context : public dwarf_expr_context
+static enum symbol_needs_kind
+dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
+			     dwarf2_per_cu_data *per_cu,
+			     dwarf2_per_objfile *per_objfile,
+			     bfd_endian byte_order,
+			     int addr_size,
+			     int ref_addr_size,
+			     int depth = 0)
 {
-public:
-  symbol_needs_eval_context (dwarf2_per_objfile *per_objfile)
-    : dwarf_expr_context (per_objfile)
-  {}
+  const gdb_byte *expr_end = expr.data () + expr.size ();
+  const gdb_byte *op_ptr = expr.data ();
+  enum symbol_needs_kind symbol_needs = SYMBOL_NEEDS_NONE;
+  const int max_depth = 256;
 
-  enum symbol_needs_kind needs;
-  struct dwarf2_per_cu_data *per_cu;
+  if (depth > max_depth)
+    error (_("DWARF-2 expression error: Loop detected (%d)."), depth);
 
-  /* Reads from registers do require a frame.  */
-  CORE_ADDR read_addr_from_reg (int regnum) override
-  {
-    needs = SYMBOL_NEEDS_FRAME;
-    return 1;
-  }
+  depth++;
 
-  /* "get_reg_value" callback: Reads from registers do require a
-     frame.  */
+  while (op_ptr < expr_end)
+    {
+      enum dwarf_location_atom op
+	= (enum dwarf_location_atom) *op_ptr++;
+      uint64_t uoffset;
+      int64_t offset;
 
-  struct value *get_reg_value (struct type *type, int regnum) override
-  {
-    needs = SYMBOL_NEEDS_FRAME;
-    return value_zero (type, not_lval);
-  }
+      /* The DWARF expression might have a bug causing an infinite
+	 loop.  In that case, quitting is the only way out.  */
+      QUIT;
 
-  /* Reads from memory do not require a frame.  */
-  void read_mem (gdb_byte *buf, CORE_ADDR addr, size_t len) override
-  {
-    memset (buf, 0, len);
-  }
+      switch (op)
+	{
+	case DW_OP_lit0:
+	case DW_OP_lit1:
+	case DW_OP_lit2:
+	case DW_OP_lit3:
+	case DW_OP_lit4:
+	case DW_OP_lit5:
+	case DW_OP_lit6:
+	case DW_OP_lit7:
+	case DW_OP_lit8:
+	case DW_OP_lit9:
+	case DW_OP_lit10:
+	case DW_OP_lit11:
+	case DW_OP_lit12:
+	case DW_OP_lit13:
+	case DW_OP_lit14:
+	case DW_OP_lit15:
+	case DW_OP_lit16:
+	case DW_OP_lit17:
+	case DW_OP_lit18:
+	case DW_OP_lit19:
+	case DW_OP_lit20:
+	case DW_OP_lit21:
+	case DW_OP_lit22:
+	case DW_OP_lit23:
+	case DW_OP_lit24:
+	case DW_OP_lit25:
+	case DW_OP_lit26:
+	case DW_OP_lit27:
+	case DW_OP_lit28:
+	case DW_OP_lit29:
+	case DW_OP_lit30:
+	case DW_OP_lit31:
+	case DW_OP_stack_value:
+	case DW_OP_dup:
+	case DW_OP_drop:
+	case DW_OP_swap:
+	case DW_OP_over:
+	case DW_OP_rot:
+	case DW_OP_deref:
+	case DW_OP_abs:
+	case DW_OP_neg:
+	case DW_OP_not:
+	case DW_OP_and:
+	case DW_OP_div:
+	case DW_OP_minus:
+	case DW_OP_mod:
+	case DW_OP_mul:
+	case DW_OP_or:
+	case DW_OP_plus:
+	case DW_OP_shl:
+	case DW_OP_shr:
+	case DW_OP_shra:
+	case DW_OP_xor:
+	case DW_OP_le:
+	case DW_OP_ge:
+	case DW_OP_eq:
+	case DW_OP_lt:
+	case DW_OP_gt:
+	case DW_OP_ne:
+	case DW_OP_GNU_push_tls_address:
+	case DW_OP_nop:
+	case DW_OP_GNU_uninit:
+	case DW_OP_push_object_address:
+	  break;
 
-  /* Frame-relative accesses do require a frame.  */
-  void get_frame_base (const gdb_byte **start, size_t *length) override
-  {
-    static gdb_byte lit0 = DW_OP_lit0;
+	case DW_OP_form_tls_address:
+	  if (symbol_needs <= SYMBOL_NEEDS_REGISTERS)
+	    symbol_needs = SYMBOL_NEEDS_REGISTERS;
+	  break;
 
-    *start = &lit0;
-    *length = 1;
+	case DW_OP_convert:
+	case DW_OP_GNU_convert:
+	case DW_OP_reinterpret:
+	case DW_OP_GNU_reinterpret:
+	case DW_OP_addrx:
+	case DW_OP_GNU_addr_index:
+	case DW_OP_GNU_const_index:
+	case DW_OP_constu:
+	case DW_OP_plus_uconst:
+	case DW_OP_piece:
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  break;
 
-    needs = SYMBOL_NEEDS_FRAME;
-  }
+	case DW_OP_consts:
+	  op_ptr = safe_read_sleb128 (op_ptr, expr_end, &offset);
+	  break;
 
-  /* CFA accesses require a frame.  */
-  CORE_ADDR get_frame_cfa () override
-  {
-    needs = SYMBOL_NEEDS_FRAME;
-    return 1;
-  }
+	case DW_OP_bit_piece:
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  break;
 
-  CORE_ADDR get_frame_pc () override
-  {
-    needs = SYMBOL_NEEDS_FRAME;
-    return 1;
-  }
+	case DW_OP_deref_type:
+	case DW_OP_GNU_deref_type:
+	  op_ptr++;
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  break;
 
-  /* Thread-local accesses require registers, but not a frame.  */
-  CORE_ADDR get_tls_address (CORE_ADDR offset) override
-  {
-    if (needs <= SYMBOL_NEEDS_REGISTERS)
-      needs = SYMBOL_NEEDS_REGISTERS;
-    return 1;
-  }
+	case DW_OP_addr:
+	  op_ptr += addr_size;
+	  break;
 
-  /* Helper interface of per_cu_dwarf_call for
-     dwarf2_loc_desc_get_symbol_read_needs.  */
+	case DW_OP_const1u:
+	case DW_OP_const1s:
+	  op_ptr += 1;
+	  break;
 
-  void dwarf_call (cu_offset die_offset) override
-  {
-    per_cu_dwarf_call (this, die_offset, per_cu, per_objfile);
-  }
+	case DW_OP_const2u:
+	case DW_OP_const2s:
+	  op_ptr += 2;
+	  break;
 
-  /* Helper interface of sect_variable_value for
-     dwarf2_loc_desc_get_symbol_read_needs.  */
+	case DW_OP_const4s:
+	case DW_OP_const4u:
+	  op_ptr += 4;
+	  break;
 
-  struct value *dwarf_variable_value (sect_offset sect_off) override
-  {
-    return sect_variable_value (this, sect_off, per_cu, per_objfile);
-  }
+	case DW_OP_const8s:
+	case DW_OP_const8u:
+	  op_ptr += 8;
+	  break;
+
+	case DW_OP_reg0:
+	case DW_OP_reg1:
+	case DW_OP_reg2:
+	case DW_OP_reg3:
+	case DW_OP_reg4:
+	case DW_OP_reg5:
+	case DW_OP_reg6:
+	case DW_OP_reg7:
+	case DW_OP_reg8:
+	case DW_OP_reg9:
+	case DW_OP_reg10:
+	case DW_OP_reg11:
+	case DW_OP_reg12:
+	case DW_OP_reg13:
+	case DW_OP_reg14:
+	case DW_OP_reg15:
+	case DW_OP_reg16:
+	case DW_OP_reg17:
+	case DW_OP_reg18:
+	case DW_OP_reg19:
+	case DW_OP_reg20:
+	case DW_OP_reg21:
+	case DW_OP_reg22:
+	case DW_OP_reg23:
+	case DW_OP_reg24:
+	case DW_OP_reg25:
+	case DW_OP_reg26:
+	case DW_OP_reg27:
+	case DW_OP_reg28:
+	case DW_OP_reg29:
+	case DW_OP_reg30:
+	case DW_OP_reg31:
+	case DW_OP_regx:
+	case DW_OP_breg0:
+	case DW_OP_breg1:
+	case DW_OP_breg2:
+	case DW_OP_breg3:
+	case DW_OP_breg4:
+	case DW_OP_breg5:
+	case DW_OP_breg6:
+	case DW_OP_breg7:
+	case DW_OP_breg8:
+	case DW_OP_breg9:
+	case DW_OP_breg10:
+	case DW_OP_breg11:
+	case DW_OP_breg12:
+	case DW_OP_breg13:
+	case DW_OP_breg14:
+	case DW_OP_breg15:
+	case DW_OP_breg16:
+	case DW_OP_breg17:
+	case DW_OP_breg18:
+	case DW_OP_breg19:
+	case DW_OP_breg20:
+	case DW_OP_breg21:
+	case DW_OP_breg22:
+	case DW_OP_breg23:
+	case DW_OP_breg24:
+	case DW_OP_breg25:
+	case DW_OP_breg26:
+	case DW_OP_breg27:
+	case DW_OP_breg28:
+	case DW_OP_breg29:
+	case DW_OP_breg30:
+	case DW_OP_breg31:
+	case DW_OP_bregx:
+	case DW_OP_fbreg:
+	case DW_OP_call_frame_cfa:
+	case DW_OP_entry_value:
+	case DW_OP_GNU_entry_value:
+	case DW_OP_GNU_parameter_ref:
+	case DW_OP_regval_type:
+	case DW_OP_GNU_regval_type:
+	  symbol_needs = SYMBOL_NEEDS_FRAME;
+	  break;
 
-  /* DW_OP_entry_value accesses require a caller, therefore a
-     frame.  */
+	case DW_OP_implicit_value:
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  op_ptr += uoffset;
+	  break;
 
-  void push_dwarf_reg_entry_value (enum call_site_parameter_kind kind,
-				   union call_site_parameter_u kind_u,
-				   int deref_size) override
-  {
-    needs = SYMBOL_NEEDS_FRAME;
+	case DW_OP_implicit_pointer:
+	case DW_OP_GNU_implicit_pointer:
+	  op_ptr += ref_addr_size;
+	  op_ptr = safe_read_sleb128 (op_ptr, expr_end, &offset);
+	  break;
 
-    /* The expression may require some stub values on DWARF stack.  */
-    push_address (0, 0);
-  }
+	case DW_OP_deref_size:
+	case DW_OP_pick:
+	  op_ptr++;
+	  break;
 
-  /* DW_OP_addrx and DW_OP_GNU_addr_index doesn't require a frame.  */
+	case DW_OP_skip:
+	  op_ptr += 2;
+	  break;
 
-  CORE_ADDR get_addr_index (unsigned int index) override
-  {
-    /* Nothing to do.  */
-    return 1;
-  }
+	case DW_OP_bra:
+	  op_ptr += 2;
+	  break;
 
-  /* DW_OP_push_object_address has a frame already passed through.  */
+	case DW_OP_call2:
+	  {
+	    cu_offset cu_off
+	      = (cu_offset) extract_unsigned_integer (op_ptr, 2, byte_order);
+	    op_ptr += 2;
 
-  CORE_ADDR get_object_address () override
-  {
-    /* Nothing to do.  */
-    return 1;
-  }
-};
+	    auto get_frame_pc = [&symbol_needs] ()
+	      {
+		symbol_needs = SYMBOL_NEEDS_FRAME;
+		return 0;
+	      };
 
-/* Compute the correct symbol_needs_kind value for the location
-   expression at DATA (length SIZE).  */
+	    struct dwarf2_locexpr_baton baton
+	      = dwarf2_fetch_die_loc_cu_off (cu_off, per_cu,
+					     per_objfile,
+					     get_frame_pc);
 
-static enum symbol_needs_kind
-dwarf2_loc_desc_get_symbol_read_needs (const gdb_byte *data, size_t size,
-				       dwarf2_per_cu_data *per_cu,
-				       dwarf2_per_objfile *per_objfile)
-{
-  scoped_value_mark free_values;
+	    /* If SYMBOL_NEEDS_FRAME is returned from the previous call,
+	       we dont have to check the baton content.  */
+	    if (symbol_needs != SYMBOL_NEEDS_FRAME)
+	      {
+		gdbarch *arch = baton.per_objfile->objfile->arch ();
+		gdb::array_view<const gdb_byte> sub_expr (baton.data,
+							  baton.size);
+		symbol_needs
+		  = dwarf2_get_symbol_read_needs (sub_expr,
+						  baton.per_cu,
+						  baton.per_objfile,
+						  gdbarch_byte_order (arch),
+						  baton.per_cu->addr_size (),
+						  baton.per_cu->ref_addr_size (),
+						  depth);
+	      }
+	    break;
+	  }
 
-  symbol_needs_eval_context ctx (per_objfile);
+	case DW_OP_call4:
+	  {
+	    cu_offset cu_off
+	      = (cu_offset) extract_unsigned_integer (op_ptr, 4, byte_order);
+	    op_ptr += 4;
 
-  ctx.needs = SYMBOL_NEEDS_NONE;
-  ctx.per_cu = per_cu;
-  ctx.gdbarch = per_objfile->objfile->arch ();
-  ctx.addr_size = per_cu->addr_size ();
-  ctx.ref_addr_size = per_cu->ref_addr_size ();
+	    auto get_frame_pc = [&symbol_needs] ()
+	      {
+		symbol_needs = SYMBOL_NEEDS_FRAME;
+		return 0;
+	      };
+
+	    struct dwarf2_locexpr_baton baton
+	      = dwarf2_fetch_die_loc_cu_off (cu_off, per_cu,
+					     per_objfile,
+					     get_frame_pc);
+
+	    /* If SYMBOL_NEEDS_FRAME is returned from the previous call,
+	       we dont have to check the baton content.  */
+	    if (symbol_needs != SYMBOL_NEEDS_FRAME)
+	      {
+		gdbarch *arch = baton.per_objfile->objfile->arch ();
+		gdb::array_view<const gdb_byte> sub_expr (baton.data,
+							  baton.size);
+		symbol_needs
+		  = dwarf2_get_symbol_read_needs (sub_expr,
+						  baton.per_cu,
+						  baton.per_objfile,
+						  gdbarch_byte_order (arch),
+						  baton.per_cu->addr_size (),
+						  baton.per_cu->ref_addr_size (),
+						  depth);
+	      }
+	    break;
+	  }
 
-  ctx.eval (data, size);
+	case DW_OP_GNU_variable_value:
+	  {
+	    sect_offset sect_off
+	      = (sect_offset) extract_unsigned_integer (op_ptr,
+							ref_addr_size,
+							byte_order);
+	    op_ptr += ref_addr_size;
+
+	    struct type *die_type
+	      = dwarf2_fetch_die_type_sect_off (sect_off, per_cu,
+						per_objfile);
+
+	    if (die_type == NULL)
+	      error (_("Bad DW_OP_GNU_variable_value DIE."));
+
+	    /* Note: Things still work when the following test is
+	       removed.  This test and error is here to conform to the
+	       proposed specification.  */
+	    if (die_type->code () != TYPE_CODE_INT
+	       && die_type->code () != TYPE_CODE_PTR)
+	      error (_("Type of DW_OP_GNU_variable_value DIE must be "
+		       "an integer or pointer."));
+
+	    auto get_frame_pc = [&symbol_needs] ()
+	      {
+		symbol_needs = SYMBOL_NEEDS_FRAME;
+		return 0;
+	      };
 
-  bool in_reg = ctx.location == DWARF_VALUE_REGISTER;
+	    struct dwarf2_locexpr_baton baton
+	      = dwarf2_fetch_die_loc_sect_off (sect_off, per_cu,
+					       per_objfile,
+					       get_frame_pc, true);
 
-  /* If the location has several pieces, and any of them are in
-     registers, then we will need a frame to fetch them from.  */
-  for (dwarf_expr_piece &p : ctx.pieces)
-    if (p.location == DWARF_VALUE_REGISTER)
-      in_reg = true;
+	    /* If SYMBOL_NEEDS_FRAME is returned from the previous call,
+	       we dont have to check the baton content.  */
+	    if (symbol_needs != SYMBOL_NEEDS_FRAME)
+	      {
+		gdbarch *arch = baton.per_objfile->objfile->arch ();
+		gdb::array_view<const gdb_byte> sub_expr (baton.data,
+							  baton.size);
+		symbol_needs
+		  = dwarf2_get_symbol_read_needs (sub_expr,
+						  baton.per_cu,
+						  baton.per_objfile,
+						  gdbarch_byte_order (arch),
+						  baton.per_cu->addr_size (),
+						  baton.per_cu->ref_addr_size (),
+						  depth);
+	      }
+	    break;
+	  }
 
-  if (in_reg)
-    ctx.needs = SYMBOL_NEEDS_FRAME;
+	case DW_OP_const_type:
+	case DW_OP_GNU_const_type:
+	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  offset = *op_ptr++;
+	  op_ptr += offset;
+	  break;
+
+	default:
+	  error (_("Unhandled DWARF expression opcode 0x%x"), op);
+	}
+      /* If it is known that a frame information is
+	 needed we can stop parsing the expression.  */
+      if (symbol_needs == SYMBOL_NEEDS_FRAME)
+	break;
+    }
 
-  return ctx.needs;
+  return symbol_needs;
 }
 
 /* A helper function that throws an unimplemented error mentioning a
@@ -3714,9 +3968,15 @@ locexpr_get_symbol_read_needs (struct symbol *symbol)
   struct dwarf2_locexpr_baton *dlbaton
     = (struct dwarf2_locexpr_baton *) SYMBOL_LOCATION_BATON (symbol);
 
-  return dwarf2_loc_desc_get_symbol_read_needs (dlbaton->data, dlbaton->size,
-						dlbaton->per_cu,
-						dlbaton->per_objfile);
+  gdbarch *arch = dlbaton->per_objfile->objfile->arch ();
+  gdb::array_view<const gdb_byte> expr (dlbaton->data, dlbaton->size);
+
+  return dwarf2_get_symbol_read_needs (expr,
+				       dlbaton->per_cu,
+				       dlbaton->per_objfile,
+				       gdbarch_byte_order (arch),
+				       dlbaton->per_cu->addr_size (),
+				       dlbaton->per_cu->ref_addr_size ());
 }
 
 /* Return true if DATA points to the end of a piece.  END is one past
@@ -4734,6 +4994,77 @@ const struct symbol_computed_ops dwarf2_loclist_funcs = {
   loclist_generate_c_location
 };
 
+#if GDB_SELF_TEST
+
+namespace selftests {
+
+/* Unit test for the symbol needs check mechanism.
+
+   This mechanism is not expected to access the target. This
+   means that a conditional expressions that depends on a target
+   access needs to be evaluated in full.
+
+   This test is testing if the zero resulting branch is ignored.  */
+
+static void
+symbol_needs_cond_zero ()
+{
+  const gdb_byte dwarf_expr[] {
+    0x97,           /* DW_OP_push_object_address */
+    0x28, 0x5, 0x0, /* DW_OP_bra 5 */
+    0x70, 0x0,      /* DW_OP_breg0 0 */
+    0x2f, 0x1, 0x0, /* DW_OP_skip 1 */
+    0x30,           /* DW_OP_lit0 */
+    0x9f,           /* DW_OP_stack_value */
+  };
+
+  symbol_needs_kind symbol_needs
+    = dwarf2_get_symbol_read_needs (dwarf_expr,
+				    NULL /* per_objfile */,
+				    NULL /* per_cu */,
+				    BFD_ENDIAN_LITTLE,
+				    4 /* addr_size */,
+				    4 /* ref_addr_size */);
+
+  SELF_CHECK (symbol_needs == SYMBOL_NEEDS_FRAME);
+}
+
+/* Unit test for the symbol needs check mechanism.
+
+   This mechanism is not expected to access the target. This
+   means that a conditional expressions that depends on a target
+   access needs to be evaluated in full.
+
+   This test is testing if the nonzero resulting branch is ignored.  */
+
+static void
+symbol_needs_cond_nonzero ()
+{
+  const gdb_byte dwarf_expr[] {
+    0x97,           /* DW_OP_push_object_address */
+    0x30,           /* DW_OP_lit0 */
+    0x29,           /* DW_OP_eq */
+    0x28, 0x5, 0x0, /* DW_OP_bra 5 */
+    0x70, 0x0,      /* DW_OP_breg0 0 */
+    0x2f, 0x1, 0x0, /* DW_OP_skip 1 */
+    0x30,           /* DW_OP_lit0 */
+    0x9f,           /* DW_OP_stack_value */
+  };
+
+  symbol_needs_kind symbol_needs
+    = dwarf2_get_symbol_read_needs (dwarf_expr,
+				    NULL /* per_objfile */,
+				    NULL /* per_cu */,
+				    BFD_ENDIAN_LITTLE,
+				    4 /* addr_size */,
+				    4 /* ref_addr_size */);
+
+  SELF_CHECK (symbol_needs == SYMBOL_NEEDS_FRAME);
+}
+
+} // namespace selftests
+#endif /* GDB_SELF_TEST */
+
 void _initialize_dwarf2loc ();
 void
 _initialize_dwarf2loc ()
@@ -4762,4 +5093,11 @@ conversational style, when possible."),
 			   show_dwarf_always_disassemble,
 			   &set_dwarf_cmdlist,
 			   &show_dwarf_cmdlist);
+
+#if GDB_SELF_TEST
+  selftests::register_test ("symbol_needs_cond_zero",
+			    selftests::symbol_needs_cond_zero);
+  selftests::register_test ("symbol_needs_cond_nonzero",
+			    selftests::symbol_needs_cond_nonzero);
+#endif
 }
diff --git a/gdb/testsuite/gdb.dwarf2/symbol_needs_eval.c b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval.c
new file mode 100644
index 0000000000..9740944a73
--- /dev/null
+++ b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval.c
@@ -0,0 +1,25 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2017-2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int exec_mask = 1;
+
+int
+main (void)
+{
+  asm volatile ("main_label: .globl main_label");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_fail.exp b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_fail.exp
new file mode 100644
index 0000000000..00a57228fa
--- /dev/null
+++ b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_fail.exp
@@ -0,0 +1,108 @@
+# Copyright 2017-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test the symbol needs check mechanism if it assumes that faking
+# reads from a target is a safe thing to do.
+#
+# In particular, the test uses a relative branch DWARF operation to
+# hide a register read. If the target reads are indeed faked, the
+# result returned will be wrong.
+
+load_lib dwarf.exp
+
+# This test can only be run on targets which support DWARF-2 and use gas.
+if {![dwarf2_support]} {
+    return 0
+}
+
+# Choose suitable integer registers for the test.
+
+set dwarf_regnum 0
+
+if { [is_aarch64_target] } {
+    set regname x0
+} elseif { [is_aarch32_target]
+	   || [istarget "s390*-*-*" ]
+	   || [istarget "powerpc*-*-*"]
+	   || [istarget "rs6000*-*-aix*"] } {
+    set regname r0
+} elseif { [is_x86_like_target] } {
+    set regname eax
+} elseif { [is_amd64_regs_target] } {
+    set regname rax
+} else {
+    verbose "Skipping ${gdb_test_file_name}."
+    return
+}
+
+standard_testfile symbol_needs_eval.c ${gdb_test_file_name}-dw.S
+
+# Make some DWARF for the test.
+
+set asm_file [standard_output_file $srcfile2]
+Dwarf::assemble $asm_file {
+    global dwarf_regnum regname
+
+    set exec_mask_var [gdb_target_symbol exec_mask]
+
+    cu {} {
+	DW_TAG_compile_unit {
+	    {DW_AT_name symbol_needs_eval.c}
+	    {DW_AT_comp_dir /tmp}
+	} {
+	    declare_labels int_type_label
+
+	    # define int type
+	    int_type_label: DW_TAG_base_type {
+		{DW_AT_name "int"}
+		{DW_AT_encoding @DW_ATE_signed}
+		{DW_AT_byte_size 4 DW_FORM_sdata}
+	    }
+
+	    # define artificial variable a
+	    DW_TAG_variable {
+		{DW_AT_name a}
+		{DW_AT_type :$int_type_label}
+		{DW_AT_location {
+		    DW_OP_addr $exec_mask_var
+		    DW_OP_deref
+		    DW_OP_bra 4 # conditional jump to DW_OP_bregx
+		    DW_OP_lit0
+		    DW_OP_skip 3 # jump to DW_OP_stack_value
+		    DW_OP_bregx $dwarf_regnum 0
+		    DW_OP_stack_value
+		} SPECIAL_expr}
+		{external 1 flag}
+	    }
+	}
+    }
+}
+
+if { [prepare_for_testing ${testfile}.exp ${testfile} \
+     [list $srcfile $asm_file] {nodebug}] } {
+    return -1
+}
+
+# The variable's location expression requires a frame,
+# so an error should be reported.
+gdb_test "print/d a" "No frame selected." "variable a can't be printed"
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_test_no_output "set var \$$regname = 2" "init reg to 2"
+
+gdb_test "print/d a" " = 2" "a == 2"
diff --git a/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_timeout.exp b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_timeout.exp
new file mode 100644
index 0000000000..52dfb136fb
--- /dev/null
+++ b/gdb/testsuite/gdb.dwarf2/symbol_needs_eval_timeout.exp
@@ -0,0 +1,127 @@
+# Copyright 2017-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test the symbol needs check mechanism if it assumes that faking
+# reads from a target is a safe thing to do.
+#
+# In particular, the test uses a relative branch DWARF operation to
+# potentially cause an infinite loop, if the target reads are indeed
+# faked.
+
+load_lib dwarf.exp
+
+# This test can only be run on targets which support DWARF-2 and use gas.
+if {![dwarf2_support]} {
+    return 0
+}
+
+# Choose suitable integer registers for the test.
+
+set dwarf_regnum 0
+
+if { [is_aarch64_target] } {
+    set regname x0
+} elseif { [is_aarch32_target]
+	   || [istarget "s390*-*-*" ]
+	   || [istarget "powerpc*-*-*"]
+	   || [istarget "rs6000*-*-aix*"] } {
+    set regname r0
+} elseif { [is_x86_like_target] } {
+    set regname eax
+} elseif { [is_amd64_regs_target] } {
+    set regname rax
+} else {
+    verbose "Skipping ${gdb_test_file_name}."
+    return
+}
+
+standard_testfile symbol_needs_eval.c ${gdb_test_file_name}-dw.S
+
+# Make some DWARF for the test.
+
+set asm_file [standard_output_file $srcfile2]
+Dwarf::assemble $asm_file {
+    global dwarf_regnum regname
+
+    set exec_mask_var [gdb_target_symbol exec_mask]
+
+    cu {} {
+	DW_TAG_compile_unit {
+	    {DW_AT_name symbol_needs_eval.c}
+	    {DW_AT_comp_dir /tmp}
+	} {
+	    declare_labels int_type_label
+
+	    # define int type
+	    int_type_label: DW_TAG_base_type {
+		{DW_AT_name "int"}
+		{DW_AT_encoding @DW_ATE_signed}
+		{DW_AT_byte_size 4 DW_FORM_sdata}
+	    }
+
+	    # add info for variable exec_mask
+	    DW_TAG_variable {
+		{DW_AT_name exec_mask}
+		{DW_AT_type :$int_type_label}
+		{DW_AT_location {
+		    DW_OP_addr $exec_mask_var
+		} SPECIAL_expr}
+		{external 1 flag}
+	    }
+
+	    # add info for subprogram main
+	    DW_TAG_subprogram {
+		{MACRO_AT_func { main }}
+		{DW_AT_frame_base {
+		    DW_OP_regx $dwarf_regnum
+		} SPECIAL_expr}
+	    } {
+		# define artificial variable a
+		DW_TAG_variable {
+		    {DW_AT_name a}
+		    {DW_AT_type :$int_type_label}
+		    {DW_AT_location {
+			DW_OP_lit1
+			DW_OP_addr $exec_mask_var
+			DW_OP_deref
+			DW_OP_skip 4 # jump to DW_OP_fbreg
+			DW_OP_drop
+			DW_OP_fbreg 0
+			DW_OP_dup
+			DW_OP_lit0
+			DW_OP_eq
+			DW_OP_bra -9 # conditional jump to DW_OP_drop
+			DW_OP_stack_value
+		    } SPECIAL_expr}
+		    {external 1 flag}
+		}
+	    }
+	}
+    }
+}
+
+if { [prepare_for_testing ${testfile}.exp ${testfile} \
+	  [list $srcfile $asm_file] {nodebug}] } {
+    return -1
+}
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_test_no_output "set var \$$regname = 2" "init reg to 2"
+gdb_test_no_output "set var exec_mask = 0" "init exec_mask to 0"
+
+gdb_test "print/d a" " = 2" "a == 2"
diff --git a/gdb/testsuite/gdb.python/py-framefilter-invalidarg.exp b/gdb/testsuite/gdb.python/py-framefilter-invalidarg.exp
index beb66ab82d..23f457ee1e 100644
--- a/gdb/testsuite/gdb.python/py-framefilter-invalidarg.exp
+++ b/gdb/testsuite/gdb.python/py-framefilter-invalidarg.exp
@@ -63,4 +63,4 @@ set remote_python_file [gdb_remote_download host \
 			    ${srcdir}/${subdir}/${testfile}.py]
 gdb_test_no_output "source ${remote_python_file}" "load python file"
 
-gdb_test "bt" "niam \\(argc=<error reading variable: dwarf expression stack underflow>, argv=0x\[0-9a-f\]+\\) at py-framefilter-invalidarg.c:\[0-9\]+" "bt full with filters"
+gdb_test "bt" "niam \\(argc=<error reading variable: Unhandled dwarf expression opcode 0x0>, argv=0x\[0-9a-f\]+\\) at py-framefilter-invalidarg.c:\[0-9\]+" "bt full with filters"
\ No newline at end of file
diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp
index 5c84063105..a64a885e26 100644
--- a/gdb/testsuite/lib/dwarf.exp
+++ b/gdb/testsuite/lib/dwarf.exp
@@ -1001,6 +1001,10 @@ namespace eval Dwarf {
 		    _op .sleb128 [lindex $line 2]
 		}
 
+		DW_OP_fbreg {
+		    _op .sleb128 [lindex $line 1]
+		}
+
 		default {
 		    if {[llength $line] > 1} {
 			error "Unimplemented: operands in location for $opcode"
-- 
2.17.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Replace the symbol needs evaluator with a parser
  2020-10-06 14:58 [PATCH] Replace the symbol needs evaluator with a parser Zoran Zaric
@ 2020-10-07  3:04 ` Simon Marchi
  2020-10-07 17:21   ` Zaric, Zoran (Zare)
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Marchi @ 2020-10-07  3:04 UTC (permalink / raw)
  To: Zoran Zaric, gdb-patches

On 2020-10-06 10:58 a.m., Zoran Zaric wrote:
> This patch addresses a design problem with the symbol_needs_eval_context
> class. It exposes the problem by introducing two new testsuite test
> cases.
>
> To explain the issue, I first need to explain the dwarf_expr_context
> class that the symbol_needs_eval_context class derives from.
>
> The intention behind the dwarf_expr_context class is to commonize the
> DWARF expression evaluation mechanism for different evaluation
> contexts. Currently in gdb, the evaluation context can contain some or
> all of the following information: architecture, object file, frame and
> compilation unit.
>
> Depending on the information needed to evaluate a given expression,
> there are currently three distinct DWARF expression evaluators:
>
>  - Frame: designed to evaluate an expression in the context of a call
>    frame information (dwarf_expr_executor class). This evaluator doesn’t
>    need a compilation unit information.
>
>  - Location description: designed to evaluate an expression in the
>    context of a source level information (dwarf_evaluate_loc_desc
>    class). This evaluator expects all information needed for the
>    evaluation of the given expression to be present.
>
>  - Symbol needs: designed to answer a question about the parts of the
>    context information required to evaluate a DWARF expression behind a
>    given symbol (symbol_needs_eval_context class). This evaluator
>    doesn’t need a frame information.
>
> The functional difference between the symbol needs evaluator and the
> others is that this evaluator is not meant to interact with the actual
> target. Instead, it is supposed to check which parts of the context
> information are needed for the given DWARF expression to be evaluated by
> the location description evaluator.
>
> The idea is to take advantage of the existing dwarf_expr_context
> evaluation mechanism and to fake all required interactions with the
> actual target, by returning back dummy values. The evaluation result is
> returned as one of three possible values, based on operations found in a
> given expression:
>
> - SYMBOL_NEEDS_NONE,
> - SYMBOL_NEEDS_REGISTERS and
> - SYMBOL_NEEDS_FRAME.
>
> The problem here is that faking results of target interactions can yield
> an incorrect evaluation result.
>
> For example, if we have a conditional DWARF expression, where the
> condition depends on a value read from an actual target, and the true
> branch of the condition requires a frame information to be evaluated,
> while the false branch doesn’t, fake target reads could conclude that a
> frame information is not needed, where in fact it is. This wrong
> information would then cause the expression to be actually evaluated (by
> the location description evaluator) with a missing frame information.
> This would then crash the debugger.
>
> The gdb.dwarf2/symbol_needs_eval_fail.exp test introduces this
> scenario, with the following DWARF expression:
>
>                    DW_OP_addr $some_variable
>                    DW_OP_deref
>                    DW_OP_bra 4  # conditional jump to DW_OP_bregx
>                    DW_OP_lit0
>                    DW_OP_skip 3  # jump to DW_OP_stack_value
>                    DW_OP_bregx $dwarf_regnum 0
>                    DW_OP_stack_value
>
> This expression describes a case where some variable dictates the
> location of another variable. Depending on a value of some_variable, the
> variable whose location is described by this expression is either read
> from a register or it is defined as a constant value 0. In both cases,
> the value will be returned as an implicit location description on the
> DWARF stack.
>
> Currently, when the symbol needs evaluator fakes a memory read from the
> address behind the some_variable variable, the constant value 0 is used
> as the value of the variable A, and the check returns the
> SYMBOL_NEEDS_NONE result.
>
> This is clearly a wrong result and it causes the debugger to crash.
>
> The scenario might sound strange to some people, but it comes from a
> SIMD/SIMT architecture where $some_variable is an execution mask.  In
> any case, it is a valid DWARF expression, and GDB shouldn't crash while
> evaluating it. Also, a similar example could be made based on a
> condition of the frame base value, where if that value is concluded to
> be 0, the variable location could be defaulted to a TLS based memory
> address.
>
> The gdb.dwarf2/symbol_needs_eval_timeout.exp test introduces a second
> scenario. This scenario is a bit more abstract due to the DWARF
> assembler lacking the CFI support, but it exposes a different
> manifestation of the same problem. Like in the previous scenario, the
> DWARF expression used in the test is valid:
>
>                        DW_OP_lit1
>                        DW_OP_addr $some_variable
>                        DW_OP_deref
>                        DW_OP_skip 4  # jump to DW_OP_fbreg
>                        DW_OP_drop
>                        DW_OP_fbreg 0
>                        DW_OP_dup
>                        DW_OP_lit0
>                        DW_OP_eq
>                        DW_OP_bra -9  # conditional jump to DW_OP_drop
>                        DW_OP_stack_value
>
> Similarly to the previous scenario, the location of a variable A is an
> implicit location description with a constant value that depends on a
> value held by a global variable. The difference from the previous case
> is that DWARF expression contains a loop instead of just one branch. The
> end condition of that loop depends on the expectation that a frame base
> value is never zero. Currently, the act of faking the target reads will
> cause the symbol needs evaluator to get stuck in an infinite loop.
>
> Somebody could argue that we could change the fake reads to return
> something else, but that would only hide the real problem.
>
> The general impression seems to be that the desired design is to have
> one class that deals with parsing of the DWARF expression, while there
> are virtual methods that deal with specifics of some operations.
>
> Using an evaluator mechanism here doesn’t seem to be correct, because
> the act of evaluation relies on accessing the data from the actual
> target with the possibility of skipping the evaluation of some parts of
> the expression.
>
> To better explain the proposed solution for the issue, I first need to
> explain a couple more details behind the current design:
>
> There are multiple places in gdb that handle DWARF expression parsing
> for different purposes. Some are in charge of converting the expression
> to some other internal representation (decode_location_expression,
> disassemble_dwarf_expression and dwarf2_compile_expr_to_ax), some are
> analysing the expression for specific information
> (compute_stack_depth_worker) and some are in charge of evaluating the
> expression in a given context (dwarf_expr_context::execute_stack_op
> and decode_locdesc).
>
> The problem is that all those functions have a similar (large) switch
> statement that handles each DWARF expression operation. The result of
> this is a code duplication and harder maintenance.
>
> As a step into the right direction to solve this problem (at least for
> the purpose of a DWARF expression evaluation) the expression parsing was
> commonized inside of an evaluator base class (dwarf_expr_context). This
> makes sense for all derived classes, except for the symbol needs
> evaluator (symbol_needs_eval_context) class.
>
> As described previously the problem with this evaluator is that if the
> evaluator is not allowed to access the actual target, it is not really
> evaluating.
>
> Instead, the desired function of a symbol needs evaluator seems to fall
> more into expression analysis category. This means that a more natural
> fit for this evaluator is to be a symbol needs analysis, similar to the
> existing compute_stack_depth_worker analysis.
>
> Another problem is that using a heavyweight mechanism of an evaluator
> to do an expression analysis seems to be an unneeded overhead. It also
> requires a more complicated design of the parent class to support fake
> target reads.
>
> The reality is that the whole symbol_needs_eval_context class can be
> replaced with a lightweight recursive analysis function, that will give
> more correct result without compromising the design of the
> dwarf_expr_context class.
>
> The downside of this approach is adding of one more similar switch
> statement, but at least this way the new symbol needs analysis will be
> a lightweight mechnism and it will provide a correct result for any
> given DWARF expression.
>
> A more desired long term design would be to have one class that deals
> with parsing of the DWARF expression, while there would be a virtual
> methods that deal with specifics of some DWARF operations. Then that
> class would be used as a base for all DWARF expression parsing mentioned
> at the beginning.
>
> This however, requires a far bigger changes that are out of the scope
> of this patch series.
>
> To support the new implementation, I have also added two new self tests,
> found in a file dwarf2/loc.c (named symbol_needs_cond_nonzero and
> symbol_needs_cond_zero), which expose a similar problem to the one
> described in the first testsuite test case. The difference is that the
> global some_variable is replaced with a use of a
> DW_OP_push_object_address operation. The idea is for the symbol needs
> evaluator to return the SYMBOL_NEEDS_FRAME value, regardless of the
> evaluation of that operation.
>
> The new analysis requires for the py-framefilter-invalidarg.exp test to
> be changed as well because the test expects a "dwarf expression stack
> underflow" error to be reported for the invalid DWARF expression. The
> new analysis function does not simulate a DWARF stack anymore, so the
> "unhandled DWARF expression opcode" error is going to be reported
> instead.

For the record, I worked with Zoran before he sent this to the list, so
it already LGTM.  However, it would be nice to have another independent
pair of eyes on this if possible.

Simon

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] Replace the symbol needs evaluator with a parser
  2020-10-07  3:04 ` Simon Marchi
@ 2020-10-07 17:21   ` Zaric, Zoran (Zare)
  0 siblings, 0 replies; 3+ messages in thread
From: Zaric, Zoran (Zare) @ 2020-10-07 17:21 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

[AMD Official Use Only - Internal Distribution Only]

There seems to be a problem with my patch where I've changed the
gdb.python/py-framefilter-invalidarg.exp test to expect a different
error to be reported, because the original error expected a DWARF stack
to underflow and the evaluation to stop. For some reason the expression
describing the argc argument of the niam function ends with a 0 value
byte which makes the expression invalid. 

The new approach does not simulate a DWARF stack anymore, so the 0
value byte needs to be removed to make the expression valid again and
let the dwarf_evaluate_loc_desc evaluator report the error instead.

I will submit a version two of the patch shortly.

Zoran

-----Original Message-----
From: Simon Marchi <simark@simark.ca> 
Sent: Wednesday, October 7, 2020 4:05 AM
To: Zaric, Zoran (Zare) <Zoran.Zaric@amd.com>; gdb-patches@sourceware.org
Subject: Re: [PATCH] Replace the symbol needs evaluator with a parser

[CAUTION: External Email]

On 2020-10-06 10:58 a.m., Zoran Zaric wrote:
> This patch addresses a design problem with the 
> symbol_needs_eval_context class. It exposes the problem by introducing 
> two new testsuite test cases.
>
> To explain the issue, I first need to explain the dwarf_expr_context 
> class that the symbol_needs_eval_context class derives from.
>
> The intention behind the dwarf_expr_context class is to commonize the 
> DWARF expression evaluation mechanism for different evaluation 
> contexts. Currently in gdb, the evaluation context can contain some or 
> all of the following information: architecture, object file, frame and 
> compilation unit.
>
> Depending on the information needed to evaluate a given expression, 
> there are currently three distinct DWARF expression evaluators:
>
>  - Frame: designed to evaluate an expression in the context of a call
>    frame information (dwarf_expr_executor class). This evaluator doesn’t
>    need a compilation unit information.
>
>  - Location description: designed to evaluate an expression in the
>    context of a source level information (dwarf_evaluate_loc_desc
>    class). This evaluator expects all information needed for the
>    evaluation of the given expression to be present.
>
>  - Symbol needs: designed to answer a question about the parts of the
>    context information required to evaluate a DWARF expression behind a
>    given symbol (symbol_needs_eval_context class). This evaluator
>    doesn’t need a frame information.
>
> The functional difference between the symbol needs evaluator and the 
> others is that this evaluator is not meant to interact with the actual 
> target. Instead, it is supposed to check which parts of the context 
> information are needed for the given DWARF expression to be evaluated 
> by the location description evaluator.
>
> The idea is to take advantage of the existing dwarf_expr_context 
> evaluation mechanism and to fake all required interactions with the 
> actual target, by returning back dummy values. The evaluation result 
> is returned as one of three possible values, based on operations found 
> in a given expression:
>
> - SYMBOL_NEEDS_NONE,
> - SYMBOL_NEEDS_REGISTERS and
> - SYMBOL_NEEDS_FRAME.
>
> The problem here is that faking results of target interactions can 
> yield an incorrect evaluation result.
>
> For example, if we have a conditional DWARF expression, where the 
> condition depends on a value read from an actual target, and the true 
> branch of the condition requires a frame information to be evaluated, 
> while the false branch doesn’t, fake target reads could conclude that 
> a frame information is not needed, where in fact it is. This wrong 
> information would then cause the expression to be actually evaluated 
> (by the location description evaluator) with a missing frame information.
> This would then crash the debugger.
>
> The gdb.dwarf2/symbol_needs_eval_fail.exp test introduces this 
> scenario, with the following DWARF expression:
>
>                    DW_OP_addr $some_variable
>                    DW_OP_deref
>                    DW_OP_bra 4  # conditional jump to DW_OP_bregx
>                    DW_OP_lit0
>                    DW_OP_skip 3  # jump to DW_OP_stack_value
>                    DW_OP_bregx $dwarf_regnum 0
>                    DW_OP_stack_value
>
> This expression describes a case where some variable dictates the 
> location of another variable. Depending on a value of some_variable, 
> the variable whose location is described by this expression is either 
> read from a register or it is defined as a constant value 0. In both 
> cases, the value will be returned as an implicit location description 
> on the DWARF stack.
>
> Currently, when the symbol needs evaluator fakes a memory read from 
> the address behind the some_variable variable, the constant value 0 is 
> used as the value of the variable A, and the check returns the 
> SYMBOL_NEEDS_NONE result.
>
> This is clearly a wrong result and it causes the debugger to crash.
>
> The scenario might sound strange to some people, but it comes from a 
> SIMD/SIMT architecture where $some_variable is an execution mask.  In 
> any case, it is a valid DWARF expression, and GDB shouldn't crash 
> while evaluating it. Also, a similar example could be made based on a 
> condition of the frame base value, where if that value is concluded to 
> be 0, the variable location could be defaulted to a TLS based memory 
> address.
>
> The gdb.dwarf2/symbol_needs_eval_timeout.exp test introduces a second 
> scenario. This scenario is a bit more abstract due to the DWARF 
> assembler lacking the CFI support, but it exposes a different 
> manifestation of the same problem. Like in the previous scenario, the 
> DWARF expression used in the test is valid:
>
>                        DW_OP_lit1
>                        DW_OP_addr $some_variable
>                        DW_OP_deref
>                        DW_OP_skip 4  # jump to DW_OP_fbreg
>                        DW_OP_drop
>                        DW_OP_fbreg 0
>                        DW_OP_dup
>                        DW_OP_lit0
>                        DW_OP_eq
>                        DW_OP_bra -9  # conditional jump to DW_OP_drop
>                        DW_OP_stack_value
>
> Similarly to the previous scenario, the location of a variable A is an 
> implicit location description with a constant value that depends on a 
> value held by a global variable. The difference from the previous case 
> is that DWARF expression contains a loop instead of just one branch. 
> The end condition of that loop depends on the expectation that a frame 
> base value is never zero. Currently, the act of faking the target 
> reads will cause the symbol needs evaluator to get stuck in an infinite loop.
>
> Somebody could argue that we could change the fake reads to return 
> something else, but that would only hide the real problem.
>
> The general impression seems to be that the desired design is to have 
> one class that deals with parsing of the DWARF expression, while there 
> are virtual methods that deal with specifics of some operations.
>
> Using an evaluator mechanism here doesn’t seem to be correct, because 
> the act of evaluation relies on accessing the data from the actual 
> target with the possibility of skipping the evaluation of some parts 
> of the expression.
>
> To better explain the proposed solution for the issue, I first need to 
> explain a couple more details behind the current design:
>
> There are multiple places in gdb that handle DWARF expression parsing 
> for different purposes. Some are in charge of converting the 
> expression to some other internal representation 
> (decode_location_expression, disassemble_dwarf_expression and 
> dwarf2_compile_expr_to_ax), some are analysing the expression for 
> specific information
> (compute_stack_depth_worker) and some are in charge of evaluating the 
> expression in a given context (dwarf_expr_context::execute_stack_op
> and decode_locdesc).
>
> The problem is that all those functions have a similar (large) switch 
> statement that handles each DWARF expression operation. The result of 
> this is a code duplication and harder maintenance.
>
> As a step into the right direction to solve this problem (at least for 
> the purpose of a DWARF expression evaluation) the expression parsing 
> was commonized inside of an evaluator base class (dwarf_expr_context). 
> This makes sense for all derived classes, except for the symbol needs 
> evaluator (symbol_needs_eval_context) class.
>
> As described previously the problem with this evaluator is that if the 
> evaluator is not allowed to access the actual target, it is not really 
> evaluating.
>
> Instead, the desired function of a symbol needs evaluator seems to 
> fall more into expression analysis category. This means that a more 
> natural fit for this evaluator is to be a symbol needs analysis, 
> similar to the existing compute_stack_depth_worker analysis.
>
> Another problem is that using a heavyweight mechanism of an evaluator 
> to do an expression analysis seems to be an unneeded overhead. It also 
> requires a more complicated design of the parent class to support fake 
> target reads.
>
> The reality is that the whole symbol_needs_eval_context class can be 
> replaced with a lightweight recursive analysis function, that will 
> give more correct result without compromising the design of the 
> dwarf_expr_context class.
>
> The downside of this approach is adding of one more similar switch 
> statement, but at least this way the new symbol needs analysis will be 
> a lightweight mechnism and it will provide a correct result for any 
> given DWARF expression.
>
> A more desired long term design would be to have one class that deals 
> with parsing of the DWARF expression, while there would be a virtual 
> methods that deal with specifics of some DWARF operations. Then that 
> class would be used as a base for all DWARF expression parsing 
> mentioned at the beginning.
>
> This however, requires a far bigger changes that are out of the scope 
> of this patch series.
>
> To support the new implementation, I have also added two new self 
> tests, found in a file dwarf2/loc.c (named symbol_needs_cond_nonzero 
> and symbol_needs_cond_zero), which expose a similar problem to the one 
> described in the first testsuite test case. The difference is that the 
> global some_variable is replaced with a use of a 
> DW_OP_push_object_address operation. The idea is for the symbol needs 
> evaluator to return the SYMBOL_NEEDS_FRAME value, regardless of the 
> evaluation of that operation.
>
> The new analysis requires for the py-framefilter-invalidarg.exp test 
> to be changed as well because the test expects a "dwarf expression 
> stack underflow" error to be reported for the invalid DWARF 
> expression. The new analysis function does not simulate a DWARF stack 
> anymore, so the "unhandled DWARF expression opcode" error is going to 
> be reported instead.

For the record, I worked with Zoran before he sent this to the list, so it already LGTM.  However, it would be nice to have another independent pair of eyes on this if possible.

Simon

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-07 17:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 14:58 [PATCH] Replace the symbol needs evaluator with a parser Zoran Zaric
2020-10-07  3:04 ` Simon Marchi
2020-10-07 17:21   ` Zaric, Zoran (Zare)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).