[RFC] DW_OP_piece vs. DW_OP_bit

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
@ 2016-01-14 16:34 Andreas Arnez
  2016-01-16 13:27 ` Joel Brobecker
  2016-01-25 22:01 ` Matthew Fortune
  0 siblings, 2 replies; 7+ messages in thread
From: Andreas Arnez @ 2016-01-14 16:34 UTC (permalink / raw)
  To: gcc, gdb; +Cc: Ulrich Weigand

The following is mainly targeted at readers with a strong DWARF
background.  Anybody else may read for interest/amusement or safely
ignore.

After analyzing some test case failures in GCC and GDB I realized that
there are various problems with the handling of DWARF pieces
(particularly from registers) in the current implementations of GCC and
GDB.  I'm working on a fix for the GDB part, but first I'd like to check
whether I'm heading into the right direction -- or what the right
direction is supposed to be.  The article below outlines these issues
and the suggested solution options.

Any kind of feedback is greatly appreciated!

-- 
Andreas

-- >8 --
	    _______________________________________________

	     DW_OP_PIECE VS. DW_OP_BIT_PIECE ON A REGISTER

			     Andreas Arnez
	    _______________________________________________

			    <2016-01-14 Thu>

Table of Contents
_________________

1 Overview
2 Example Scenarios
.. 2.1 z/Architecture Floating-Point- and Vector Registers
.. 2.2 SPU Preferred Slot
3 Current State And Issues
.. 3.1 DW_OP_bit_piece on a Register
..... 3.1.1 DWARF Definition of DW_OP_bit_piece
..... 3.1.2 Current GCC Usage of DW_OP_bit_piece
..... 3.1.3 Current GDB Handling of DW_OP_bit_piece
.. 3.2 DW_OP_piece on a Register
..... 3.2.1 DWARF Definition of DW_OP_piece
..... 3.2.2 Current GCC Usage of DW_OP_piece
..... 3.2.3 Current GDB Handling of DW_OP_piece
4 Options
.. 4.1 Literal Interpretation
..... 4.1.1 Discussion
.. 4.2 Loose Interpretation
..... 4.2.1 Discussion
.. 4.3 Same Scheme for Bit- and Byte Pieces
..... 4.3.1 Discussion
5 Padding
.. 5.1 Stack Values
.. 5.2 Registers
.. 5.3 Padding: Options
..... 5.3.1 No padding support
..... 5.3.2 Padding support for integer stack values
..... 5.3.3 Padding support for stack values and registers
6 Summary of Open Questions

1 Overview
==========

  While trying to fix various problems with the handling of DWARF pieces
  in GDB, it turned out that there are some inconsistencies between the
  DWARF standard on one hand and the handling in GCC and GDB on the
  other hand.

  Questions that came up include: Which of a register's bits are
  designated by the DW_OP_piece and DW_OP_bit_piece operations?  What
  does the DWARF standard say?  How does GCC currently use these
  operations?  What do we actually want?

2 Example Scenarios
===================

  Scenarios involving DWARF pieces are probably more interesting on
  big-endian platforms.  Consider the following examples.

2.1 z/Architecture Floating-Point- and Vector Registers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  While the general (integer) registers on z/Architecture have a fairly
  strong natural right-alignment, this is not true for floating point
  registers and vector registers.  Instead, the following holds:
  - Values are usually left-aligned in floating-point registers.
  - Each 64-bit floating-point register is embedded left-aligned in a
    128-bit vector register, and both have the same DWARF register
    number.

  ,----
  | :<--      FP Register        -->:
  | :                               :
  | :    float                      :
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | |///|///|///|///|   |   |   |   |   |   |   |   |   |   |   |   |
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | |///|///|///|///|///|///|///|///|   |   |   |   |   |   |   |   |
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  |             double
  `----

  Now consider a pieced object with one of its pieces residing in a
  floating-point register.  If represented as a piece of the respective
  vector register, which offset should be used for DW_OP_bit_piece?  Can
  DW_OP_piece be used?

  Or consider multiple variables which are scheduled into a single
  vector register, due to straight-line vectorization.  How should the
  locations of these variables be represented in DWARF?

2.2 SPU Preferred Slot
~~~~~~~~~~~~~~~~~~~~~~

  SPU registers have a "preferred slot" for storing scalar values of a
  particular size:
  ,----
  |              char
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | | 0 |   |   |///|   |   |   |   | 8 |   |   |   |   |   |   | 15|
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | 
  |           short
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | |   |   |///|///|   |   |   |   |   |   |   |   |   |   |   |   |
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | 
  |        int
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | |///|///|///|///|   |   |   |   |   |   |   |   |   |   |   |   |
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | 
  |             long long
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | |///|///|///|///|///|///|///|///|   |   |   |   |   |   |   |   |
  | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  `----
  The DWARF location of such a scalar value would typically be described
  with DW_OP_reg<n> or DW_OP_regx, without a piece operation.  It is
  then up to the debugger to determine the placement of the value within
  the register.  In GDB this is implemented with the gdbarch method
  `value_from_register'.

  Now, how do the DWARF piece operations relate to these preferred
  slots?

3 Current State And Issues
==========================

  GCC and GDB currently have various problems and inconsistencies in
  their usage and handling of DW_OP_bit_piece and DW_OP_piece.
  Additionally, this area may be a bit under-specified in the DWARF
  standard.  The following sections give an overview of the current
  state.

3.1 DW_OP_bit_piece on a Register
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  DW_OP_bit_piece has been added to DWARF as a more general way of
  describing pieces than DW_OP_piece.  However, after its introduction
  in DWARF-3, it was supported neither by GCC nor GDB for quite some
  time, and its support is still incomplete and widely broken.

3.1.1 DWARF Definition of DW_OP_bit_piece
-----------------------------------------

  The standard says: "If the location is a register, the offset is from
  the least significant bit end of the register."

* 3.1.1.1 Which is a register's "least significant bit"?

  Some clarification may be needed here.  For instance, consider a
  register that can naturally hold various-sized integers, but with
  their least significant bits at different positions, such as an SPU
  register.  Or consider a vector register that can not naturally hold a
  single full-size integer.

  Maybe, assuming that each register has some defined byte order, the
  least significant bit refers to the appropriate bit in the register's
  first or last byte, for little- or big-endian targets, respectively.
  In other words, bit pieces with offset 0 would be taken from the left
  on little-endian targets and from the right on big-endian targets.

  Note that this definition precludes "register growth" beyond the right
  end in future versions of a big-endian architecture, such as with the
  floating point registers on z, when they were extended to vector
  registers.  On the other hand, extending an integer register from 32
  to 64 bit (or more) would work nicely.

  Also note that this definition implies opposite bit ordering on
  typical big-endian targets for register- versus memory locations.  For
  the latter the DWARF standard requests "using the bit numbering and
  direction conventions that are appropriate to the current language on
  the target system".  And on big-endian targets this typically means
  MSB0 bit order.

* 3.1.1.2 What is the rationale for this definition?

  It is not obvious why DW_OP_bit_piece is defined differently for
  registers as for memory locations.  One possible reason may be to
  allow appropriate register growth, but that goal is not met either, as
  illustrated by the example in 2.1.  Is there some other rationale?

3.1.2 Current GCC Usage of DW_OP_bit_piece
------------------------------------------

  GCC seems to generate DW_OP_bit_piece only for variables that have
  been optimized by SRA, in `dw_sra_loc_expr'.  For register locations
  the piece offset is then always zero.

  However, if the offset is zero and the size is a multiple of 8,
  DW_OP_bit_piece is not actually used, but DW_OP_piece is emitted
  instead, obviously under the assumption that this is a shorter way of
  describing the same piece.  Such "simplification" logic can be found
  in LLVM as well.

  GCC does not seem to emit DW_OP_bit_piece for multiple variables
  residing in a single vector register.  Instead, no DWARF location is
  emitted at all for these variables.

3.1.3 Current GDB Handling of DW_OP_bit_piece
---------------------------------------------

  GDB interprets the offset and size of DW_OP_bit_piece for registers as
  suggested above: Bit pieces are taken from the right on big-endian
  targets, and from the left on little-endian targets.

  However, the implementation in GDB has various issues that often lead
  to incorrect results when trying to access bit pieces.  For instance,
  when the bit size is not a multiple of 8, GDB usually corrupts the
  result.  Also, non-zero bit offsets are unsupported and silently
  replaced by zero.  These issues are mostly independent from the
  questions raised here, and work is under way to fix them.

3.2 DW_OP_piece on a Register
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  As opposed to DW_OP_bit_piece, the older DW_OP_piece operation
  describes byte-aligned pieces only and does not allow specifying an
  offset.  Issues with this operation in GCC and GDB are mostly
  restricted to "corner cases" like naturally left-aligned registers on
  big-endian platforms.

3.2.1 DWARF Definition of DW_OP_piece
-------------------------------------

  The standard says: "If the piece is located in a register, but does
  not occupy the entire register, the placement of the piece within that
  register is defined by the ABI."

  This seems to imply the following:
  1. The placement rule of DW_OP_piece is not necessarily related to the
     placement rule of DW_OP_bit_piece.  In particular, it would be
     invalid to assume that DW_OP_piece is equivalent to DW_OP_bit_piece
     with offset 0.  (But note that this is exactly what GCC and GDB
     currently assume.)
  2. The placement can depend on the register as well as on size of the
     piece.  For instance, DW_OP_piece could be defined in accordance
     with the SPU preferred slot.

* 3.2.1.1 What exactly may be defined by the ABI?

  The standard is not explicit about the flexibility granted to the
  ABI-specifics here.  In particular:
  - Do the pieces have to be contiguous?
  - Assuming that the placement can depend on the size, must a piece at
    least include all smaller pieces?
  - Does a single-byte piece have to be at either end of the register?

3.2.2 Current GCC Usage of DW_OP_piece
--------------------------------------

  GCC emits DW_OP_piece for a value that spans more than one register,
  or for a virtual concatenation such as a complex value, etc.

  One example is a `long double' value on z that has been scheduled into
  a floating-point register pair.  GCC represents this as a composition
  of two DW_OP_piece operations of size 8, each referring to the *left*
  end of the respective vector register.

  GCC also emits DW_OP_piece for a register when an SRA-optimized field
  with a multiple-of-8 bit size has been scheduled into it.  This seems
  to be under the assumption that DW_OP_piece is equivalent to
  DW_OP_bit_piece with offset 0.  One case where this is wrong is for an
  `__int128' bit field residing in a vector register on z.  Then the
  DW_OP_piece emitted by GCC refers to the *right* end of the vector
  register.

  Another case where this would be wrong is when, say, a 56-bit
  SRA-optimized bit field had been spilled into a floating-point
  register.  Then the actual bits reside at a non-zero offset from
  *either* end of the vector register.

3.2.3 Current GDB Handling of DW_OP_piece
-----------------------------------------

  GDB internally translates DW_OP_piece into DW_OP_bit_piece with offset
  0 and the appropriate bit size.  Then it takes the bitwise piece from
  the left end on little-endian targets and from the right on big-endian
  targets.  Otherwise no ABI-specific logic is applied.

  This is wrong for the floating-point-registers on z.  For instance,
  when a `complex float' value is pieced together from two
  floating-point registers, the pieces ought to be taken from the left
  end, but GDB takes them from the right end, typically yielding zero.

4 Options
=========

  There are several different ways of fixing the issues described above.
  However, the choice depends on how the DWARF standard is interpreted.
  Here are some suggestions.

4.1 Literal Interpretation
~~~~~~~~~~~~~~~~~~~~~~~~~~

  DW_OP_bit_piece: Start from the least significant bit of the first/last
                   register's byte on little-/big-endian targets,
                   respectively.
  DW_OP_piece: Apply an ABI-specific placement rule which may depend on
               the register and the piece length.

  Also, whenever possible, define the ABI-specific placement rule for
  DW_OP_piece such that DW_OP_regx is equivalent to DW_OP_regx followed
  by `DW_OP_piece(len)', where `len' is the size of the object's data
  type.  In the case of z this means to take pieces from the left end
  for FPRs/vector registers, and from the right end for general
  (integer) registers.  And for SPU registers it means to arrange the
  pieces according to the "preferred slots".

  But DW_OP_bit_piece, on the other hand, then uses offsets >= 64 to
  designate any piece of an FPR on z.  This is true even on older
  systems without vector registers, because the DWARF register numbers
  are the same -- offset zero is invalid there!

4.1.1 Discussion
----------------

  Pros:
  - This interpretation seems as close as possible to the current
    standard's wording.
  - DW_OP_piece can be used for cases like the SPU preferred slots.

  Cons:
  - The fact that FPRs on z start with a non-zero offset is rather odd
    and could even be considered a violation of the standard.  An
    alternative might be to change the DWARF output of future compilers,
    such as to assign new DWARF register numbers to vector registers.
    However, that would render existing compilers' DWARF locations
    invalid when vector registers are involved.
  - Registers can not grow at the end of their "least significant bit".
    The growth from 64-bit FPRs to 128-bit vector registers on z can be
    accommodated with some drawbacks as explained above, but in the
    future such cases would likely be handled by adding new DWARF
    register numbers.

  Resulting To-Dos:
  - In the DWARF standard, clear up the meaning of the least significant
    bit and highlight the difference between the placement rules of
    DW_OP_piece and DW_OP_bit_piece.  Add that DW_OP_piece shall be
    equivalent to *some* DW_OP_bit_piece operation, but not necessarily
    with offset 0.
  - In GCC, drop the translation from DW_OP_bit_piece to DW_OP_piece, or
    do it more carefully and involve ABI-specific logic.
  - Enable GCC to emit DW_OP_bit_piece with a non-zero offset, for
    instance in cases where an SRA-optimized bit field has been spilled
    into an FPR on z.
  - Equip GDB with ABI-specific logic for translating DW_OP_piece to the
    appropriate bit piece.

4.2 Loose Interpretation
~~~~~~~~~~~~~~~~~~~~~~~~

  DW_OP_bit_piece: Allow the ABI to define a per-register bit numbering
                   scheme.
  DW_OP_piece: Apply an ABI-specific placement rule.

  This is similar to the previous option but may avoid the issue with
  invalid offset 0 within FPRs on z, e.g., by starting from the
  register's *first* byte in this case.

  Bit numbering schemes for DW_OP_bit_piece may be:

   Architecture   Alignment  Bit order 
  -------------------------------------
   little-endian  left       LSB0      
   little-endian  right      MSB0      
   big-endian     right      LSB0      
   big-endian     left       MSB0      

  For instance, the right-aligned little-endian scheme starts with the
  most significant bit of the register's last byte.  The FPRs and vector
  registers on z might use the left-aligned big-endian scheme, which
  starts with the most significant bit of the first byte.

4.2.1 Discussion
----------------

  Pros:
  - Each register can grow in an ABI-defined direction.  This also
    avoids the issue with non-zero offsets for FPRs on z.
  - As with the "literal interpretation", DW_OP_piece can be used for
    the SPU preferred slots.

  Cons:
  - Not quite in line with the current standard's wording, unless
    assuming a fairly loose interpretation of "least significant bit".
  - Requires two different ABI-specific mappings for DWARF pieces
    instead of just one.  This affects DWARF producers and consumers
    alike.

  Resulting To-Dos:
  - Replace the DWARF standard's wording around the "least significant
    bit".  Enumerate the supported bit numbering schemes instead, and
    state that the ABI shall assign one of them to each register.
    Highlight the difference to DW_OP_piece.
  - Fix GCC's translation from DW_OP_bit_piece to DW_OP_piece.
  - In GCC, add an ABI-specific mapping from SRA-optimized bit fields to
    DWARF bit pieces.
  - Equip GDB with ABI-specific logic for translating DW_OP_piece to the
    appropriate bit piece, as well as for assigning a bit numbering
    scheme to each register.

4.3 Same Scheme for Bit- and Byte Pieces
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  DW_OP_bit_piece: Allow the ABI to define a per-register bit numbering
                   scheme.
  DW_OP_piece: Same as the appropriately sized DW_OP_bit_piece at offset
               0.

  This definition is similar to the loose interpretation, but maintains
  the assumption built into GCC, GDB, LLVM, and possibly other
  DWARF-aware programs, that DW_OP_piece is just a short-hand notation
  for DW_OP_bit_piece with offset 0.

  However, this definition does not support the use of DW_OP_piece for
  designating the SPU preferred slots.  DW_OP_bit_piece must then be
  used instead.

4.3.1 Discussion
----------------

  Pros:
  - Each register can grow in an ABI-defined direction.
  - Maintains the assumption built into (all?) DWARF-aware software,
    that DW_OP_piece translates to a DW_OP_bit_piece with offset 0.

  Cons:
  - Does not comply with the current standard's wording.
  - DW_OP_piece can not be used for the SPU preferred slots.

  Resulting To-Dos:
  - Replace the DWARF standard's wording around the "least significant
    bit" by an enumeration of supported bit numbering schemes and the
    statement that the ABI shall assign one of them to each register.
    Also define DW_OP_piece as DW_OP_bit_piece with offset 0.
  - In GCC, add an ABI-specific mapping from SRA-optimized bit fields to
    DWARF bit pieces.

5 Padding
=========

  A related question is how to deal with a DWARF piece operation that
  reaches fully or partially outside its underlying object.  This is
  currently not specified by the DWARF standard.  It should probably be
  stated that such a DWARF expression is invalid because pieces must be
  fully contained in their underlying objects.  Or otherwise it should
  be explained where the extra bits are inserted, and how their values
  are defined.

5.1 Stack Values
~~~~~~~~~~~~~~~~

  For example, consider the following location description:

  ,----
  | DW_OP_lit0 DW_OP_stack_value DW_OP_piece 936
  `----

  It *could* be argued that this is an efficient way of describing the
  value of a 936-byte structure containing all zeroes.

  Then, consequently, in this example...

  ,----
  | DW_OP_const1s -1 DW_OP_stack_value DW_OP_piece 80
  `----

  ...sign extension would be performed.

  One reason to support the validity of such location descriptions is
  that operations like DW_OP_lit<n> do not specify an object size, but
  their resulting value could be viewed as having *infinite* size
  instead.  Of course, a different point of view is that the literal is
  sign-extended to the special DWARF5 address type, thereby defining the
  object size.

  The examples above have an obvious natural alignment: The padding is
  inserted beyond the end of the most significant bit, such that pieces
  are right-aligned on big-endian targets and left-aligned on
  little-endian targets.  It is less obvious how this concept extends to
  non-integer stack values.

5.2 Registers
~~~~~~~~~~~~~

  For a location description like this there may or may not be a natural
  alignment and padding:

  ,----
  | DW_OP_reg5 DW_OP_bit_piece 1000 23
  `----

  To define the alignment, one option may be to follow the bit numbering
  scheme used for DW_OP_bit_piece on that particular register.  Padding
  bits would then be inserted beyond the highest-numbered bit.

  The value of the padding bits should probably always be set to zero,
  because it is generally unknown whether the register currently holds a
  signed or unsigned value.

5.3 Padding: Options
~~~~~~~~~~~~~~~~~~~~

  Depending on the DWARF standard interpretation, one of the following
  options could be pursued.

5.3.1 No padding support
------------------------

  With this option, the DWARF standard might leave this area
  unspecified, or perhaps explicitly state that pieces shall be fully
  contained in their underlying objects.

  DWARF consumers should then emit errors upon encountering invalid
  pieces.  GDB, for instance, does not currently emit such errors.

5.3.2 Padding support for integer stack values
----------------------------------------------

  With this option, pieces must still be contained in their underlying
  objects, except for integer stack values.  Those are appropriately
  sign-extended.  In particular, this might be an efficient way of
  describing large areas initialized with zero.

5.3.3 Padding support for stack values and registers
----------------------------------------------------

  With this option, pieces reaching outside their underlying objects
  would generally be padded, for stack values as well as for registers.

  Padding and alignment for non-integer stack values and for registers
  would have to be defined.  It seems that, unless a very simple (yet
  meaningful) scheme is found, it is probably not worth pursuing this.

6 Summary of Open Questions
===========================

  1. Out of the standard interpretations discussed under "options"
     (section 4) above, which do we want to settle on?  Or is the
     "preferred" interpretation missing from that list?
  2. Should pieces fully or partially outside their underlying objects
     be considered valid or invalid?  If valid, how should they be
     aligned and padded?  In any case, what is the suggested treatment
     by a DWARF consumer?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-01-14 16:34 [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register Andreas Arnez
@ 2016-01-16 13:27 ` Joel Brobecker
  2016-01-18 16:00   ` Andreas Arnez
  2016-01-25 22:01 ` Matthew Fortune
  1 sibling, 1 reply; 7+ messages in thread
From: Joel Brobecker @ 2016-01-16 13:27 UTC (permalink / raw)
  To: Andreas Arnez; +Cc: gcc, gdb, Ulrich Weigand

> After analyzing some test case failures in GCC and GDB I realized that
> there are various problems with the handling of DWARF pieces
> (particularly from registers) in the current implementations of GCC and
> GDB.  I'm working on a fix for the GDB part, but first I'd like to check
> whether I'm heading into the right direction -- or what the right
> direction is supposed to be.  The article below outlines these issues
> and the suggested solution options.

This is a very nice and detailed analysis of the current situation.
Thank You!

I admit that I read through the document fairly rapidly; it does
seem to me, at this point, that the first step might be to get
clarification from the DWARF committee? Or is input from the GCC/GDB
community going to help the discussion with the DWARF committee?

-- 
Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-01-16 13:27 ` Joel Brobecker
@ 2016-01-18 16:00   ` Andreas Arnez
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Arnez @ 2016-01-18 16:00 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gcc, gdb, Ulrich Weigand

On Sat, Jan 16 2016, Joel Brobecker wrote:

>> After analyzing some test case failures in GCC and GDB I realized that
>> there are various problems with the handling of DWARF pieces
>> (particularly from registers) in the current implementations of GCC and
>> GDB.  I'm working on a fix for the GDB part, but first I'd like to check
>> whether I'm heading into the right direction -- or what the right
>> direction is supposed to be.  The article below outlines these issues
>> and the suggested solution options.
>
> This is a very nice and detailed analysis of the current situation.
> Thank You!
>
> I admit that I read through the document fairly rapidly; it does
> seem to me, at this point, that the first step might be to get
> clarification from the DWARF committee? Or is input from the GCC/GDB
> community going to help the discussion with the DWARF committee?

I think it would be helpful to form an opinion within the GCC/GDB
community first.  Then we can open a DWARF issue with a specific change
request, if necessary.

FWIW, here's my (current) opinion:

I don't like option 4.2 ("loose interpretation"), because it doesn't
seem to have any significant advantages over 4.3 and is more complex.
I'm less sure about 4.3 versus 4.1.  Option 4.3 seems more intuitive and
may require a bit less code than 4.1, but is not compliant with the
current standards' wording and does not support the SPU "preferred
slots".

And regarding the padding support, I prefer 5.3.1 ("no padding
support").

--
Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-01-14 16:34 [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register Andreas Arnez
  2016-01-16 13:27 ` Joel Brobecker
@ 2016-01-25 22:01 ` Matthew Fortune
  2016-01-26 11:57   ` Andreas Arnez
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Fortune @ 2016-01-25 22:01 UTC (permalink / raw)
  To: Andreas Arnez, gcc, gdb; +Cc: Ulrich Weigand, Maciej Rozycki

Andreas Arnez <arnez@linux.vnet.ibm.com> writes:
> 6 Summary of Open Questions
> ===========================
> 
>   1. Out of the standard interpretations discussed under "options"
>      (section 4) above, which do we want to settle on?  Or is the
>      "preferred" interpretation missing from that list?
>   2. Should pieces fully or partially outside their underlying objects
>      be considered valid or invalid?  If valid, how should they be
>      aligned and padded?  In any case, what is the suggested treatment
>      by a DWARF consumer?

My dwarf knowledge is not brilliant but I have had to recently consider
it for MIPS floating point ABI changes aka FPXX and friends. I don't know
exactly where this fits in to your whole description but in case it has
a bearing on this we now have the following uses of DW_OP_piece:

1) double precision data split over two 32-bit FPRs
Uses a pair of 32-bit DW_OP_piece (ordered depending on endianness).

2) double precision data in one 64-bit FPR
No need for DW_OP_piece.

3) double precision data that may be in two 32-bit FPRs or may be in
   one 64-bit FPR depending on hardware mode
Uses a single 64-bit DW_OP_piece on the even numbered register. 

I'm guilty of not actually finishing this off and doing the GDB side but
the theory seemed OK when I did it! From your description this behaviour
best matches DW_OP_piece having ABI dependent behaviour which would make
it acceptable. These three variations can potentially exist in the same
program albeit that (1) and (3) can't appear in a single shared library
or executable. It's all a little strange but the whole floating point
MIPS o32 ABI is pretty complex now.

Matthew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-01-25 22:01 ` Matthew Fortune
@ 2016-01-26 11:57   ` Andreas Arnez
  2016-02-11 12:18     ` Matthew Fortune
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Arnez @ 2016-01-26 11:57 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: gcc, gdb, Ulrich Weigand, Maciej Rozycki

On Mon, Jan 25 2016, Matthew Fortune wrote:

> My dwarf knowledge is not brilliant but I have had to recently consider
> it for MIPS floating point ABI changes aka FPXX and friends. I don't know
> exactly where this fits in to your whole description but in case it has
> a bearing on this we now have the following uses of DW_OP_piece:
>
> 1) double precision data split over two 32-bit FPRs
> Uses a pair of 32-bit DW_OP_piece (ordered depending on endianness).
>
> 2) double precision data in one 64-bit FPR
> No need for DW_OP_piece.
>
> 3) double precision data that may be in two 32-bit FPRs or may be in
>    one 64-bit FPR depending on hardware mode
> Uses a single 64-bit DW_OP_piece on the even numbered register. 

Hm, so in 32-bit hardware mode the DWARF consumer is expected to treat
the DW_OP_piece on the even numbered register as if it were two pieces
from two consecutive registers?  Or should we rather consider the even
numbered register to be 64 bit wide, where the right half shadows the
next odd-numbered register?  If so, I believe you generally want pieces
from FPRs to be taken from the left, correct?

> I'm guilty of not actually finishing this off and doing the GDB side but
> the theory seemed OK when I did it! From your description this behaviour
> best matches DW_OP_piece having ABI dependent behaviour which would make
> it acceptable. These three variations can potentially exist in the same
> program albeit that (1) and (3) can't appear in a single shared library
> or executable. It's all a little strange but the whole floating point
> MIPS o32 ABI is pretty complex now.

I don't quite understand why (1) and (3) can not coexist in the same
shared library or executable.  Can you elaborate a bit?

I'm curious about the interaction with vector registers.  AFAIK, vector
registers on MIPS also embed the FPRs (left or right?).  Are the same
DWARF register numbers used for both?  And when taking a 64-bit
DW_OP_piece from a vector register, would this piece correspond to the
embedded FPR?

Also, how do you think that the following should be represented in
DWARF:

* Odd-sized bit field in one of a vector register's elements;

* odd-sized bit field spilled into an FPR;

* single-precision `complex float' living in two consecutive 64-bit
  FPRs.

Thanks for your input!

--
Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-01-26 11:57   ` Andreas Arnez
@ 2016-02-11 12:18     ` Matthew Fortune
  2016-02-11 17:04       ` Andreas Arnez
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Fortune @ 2016-02-11 12:18 UTC (permalink / raw)
  To: Andreas Arnez; +Cc: gcc, gdb, Ulrich Weigand, Maciej Rozycki

Sorry for the slow response...

Andreas Arnez <arnez@linux.vnet.ibm.com> writes:
> On Mon, Jan 25 2016, Matthew Fortune wrote:
> 
> > My dwarf knowledge is not brilliant but I have had to recently
> > consider it for MIPS floating point ABI changes aka FPXX and friends.
> > I don't know exactly where this fits in to your whole description but
> > in case it has a bearing on this we now have the following uses of
> DW_OP_piece:
> >
> > 1) double precision data split over two 32-bit FPRs
> > Uses a pair of 32-bit DW_OP_piece (ordered depending on endianness).
> >
> > 2) double precision data in one 64-bit FPR
> > No need for DW_OP_piece.
> >
> > 3) double precision data that may be in two 32-bit FPRs or may be in
> >    one 64-bit FPR depending on hardware mode
> > Uses a single 64-bit DW_OP_piece on the even numbered register.
> 
> Hm, so in 32-bit hardware mode the DWARF consumer is expected to treat
> the DW_OP_piece on the even numbered register as if it were two pieces
> from two consecutive registers?

Yes.

> Or should we rather consider the even
> numbered register to be 64 bit wide, where the right half shadows the
> next odd-numbered register?  If so, I believe you generally want pieces
> from FPRs to be taken from the left, correct?

No I think this is backwards it is the left half that shadows the next
register and pieces are taken from the right. I've attempted a description
below to see if it helps.

I don't believe (in the MIPS case) we could unconditionally view the even
numbered register as 64-bit or 32-bit as the shadowing onto the next
register only exists in some hardware modes.

The size of a register has to be determined from the current hardware mode
and then the logic would be to read as much as possible from the referenced
register and use it as the lower bits of the overall value. Then continue
reading consecutive registers filling the next most significant bits
until the full size of the DW_OP_piece has been read. This for MIPS
FP registers is endian agnostic as the higher numbered register always
has the most significant bits. For GPRs the even numbered register will
provide either the most or least significant bits depending on endian but
we have no reason to use this paradoxical DW_OP_piece for GPRs as they
have compile time deterministic size.

> > I'm guilty of not actually finishing this off and doing the GDB side
> > but the theory seemed OK when I did it! From your description this
> > behaviour best matches DW_OP_piece having ABI dependent behaviour
> > which would make it acceptable. These three variations can potentially
> > exist in the same program albeit that (1) and (3) can't appear in a
> > single shared library or executable. It's all a little strange but the
> > whole floating point MIPS o32 ABI is pretty complex now.
> 
> I don't quite understand why (1) and (3) can not coexist in the same
> shared library or executable.  Can you elaborate a bit?

Oops. Sorry it is (1) and (2) that can't coexist. I'm not sure you
really want to know the gory details but the explanation is below if
you're feeling brave.

The reason these can't co-exist is really just because there would need
to be yet another ABI variant/ELF marker to record the requirements of
such an executable. It would be a combination of FP64A and FP32 and that
would mandate a hardware mode of FR=1 FRE=1 which is the one mode that
we desperately do not want to be in as it uses kernel emulation to make
it all work. The combination of FP64A and FP32 ABIs is supported to
enable some level of transition from original o32 (FP32) through to FP64
without requiring moving everything to FPXX first. We allow this across
a shared library boundary to give just enough support for software to
transition. The aim is to encourage the full transition to FPXX rather
than going through a period of creating binaries that will always need
kernel emulation regardless of the host architecture.

> I'm curious about the interaction with vector registers.  AFAIK, vector
> registers on MIPS also embed the FPRs (left or right?).

Probably best to compare NEON with other SIMD. MSA works like NEON on
AArch64 rather than AArch32. I.e. it widens each register to 128-bit
and uses the same DWARF registers as FPRs. A DW_OP_piece therefore
corresponds to part of the 128-bit register. 

> Are the same DWARF register numbers used for both?

Yes. I think we can get away with using the same dwarf numbers as the
FPRs sit in the LSB of the vector register regardless of endian or
double/single data but that is a moot point, see below.

> And when taking a 64-bit DW_OP_piece from a vector register, would
> this piece correspond to the embedded FPR?

Strict architecture definition says no as the register sets do not
necessarily have to be the same. In reality all the implementations I
know of do simply have the FPU and SIMD unit use the same physical
register set. However we would/should never consider a DW_OP_piece
on a vector register to refer to the underlying FPR as there are
situations that the least significant 32-bits of an odd numbered
FPR do NOT correspond to the single precision value in the same
numbered register. This is the key to the FRE mode of execution and
it means we only treat an FP register as having the type/value that was
last written to it; if written as a vector then it should be accessed
as a vector, if written as a float/scalar then accessed as such. This
isn't really anything new as MIPS has always been strict about the
dynamic 'type' of an FPR and accessing then consistently.

A concrete example ($w is just an alias for $f):

LDI.W $w1, 1
MFC1 $4, $f1
COPY_S.W $5, $w1[0]

$4 and $5 can have different values in one of the hardware modes.

> Also, how do you think that the following should be represented in
> DWARF:

I think I've formed an opinion on this but it may be naïve! It matches
4.3 I believe from your original description but I offer a potential
solution to the SPU issue.

Concepts like SPU preferred slots should be represented as different
dwarf registers depending on the notional size of the register that
the dwarf producer assumed. This means that LSB is deterministically
known to be within that preferred slot and 0 is simply the LSB. The
maximum extent of that register is also capped so a piece of the
'word' sized dwarf register can only contain 32-bits; beyond that it
spills to the next register in an ABI dependent manner.

What I am saying above is that I think we should step back from the
idea that physical register size matters when generating dwarf and
instead ensure that dwarf registers only refer to the visible/valid
portion of a register. I.e. The dwarf consumer's interpretation of
hardware registers needs to be able to find the right portion of a
physical register for any given dwarf register and the remaining
bits are ignored. This allows any architectural register to extend
in any direction as long as the dwarf consumer is updated, the
producer should be shielded from some detail (which is what I think
I learnt from my MIPS FPXX trauma).

> * Odd-sized bit field in one of a vector register's elements;

I'd go for the bit position to be determined by identifying the
element as <element-bitsize>*<element-number> + <bit position>
where bit position is 0 for the LSB irrespective of endian.

> * odd-sized bit field spilled into an FPR;

Given what we have for MIPS then I'd have to say either a
paradoxical DW_OP_bit_piece where the bits beyond the size of the
specified FPR are taken from the next register in an ABI dependent
manner... or 2+ DW_OP_bit_piece referring the various portions
again having 0 for the LSB regardless of endian.

> * single-precision `complex float' living in two consecutive 64-bit
>   FPRs.

Not sure here... two 32-bit DW_OP_piece but in the presence of SPU
preferred slots this would need to use the dwarf register numbers for
'32-bit' SPU registers or it wouldn't know the correct position for
LSB. You have made me wonder what happens for this case in MIPS FPXX
though!

The more I try and think about this then the more problems I see, I
really don't know if my comments are helpful. Sorry if they're not.

Thanks,
Matthew

> 
> Thanks for your input!
> 
> --
> Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
  2016-02-11 12:18     ` Matthew Fortune
@ 2016-02-11 17:04       ` Andreas Arnez
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Arnez @ 2016-02-11 17:04 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: gcc, gdb, Ulrich Weigand, Maciej Rozycki

On Thu, Feb 11 2016, Matthew Fortune wrote:

> No I think this is backwards it is the left half that shadows the next
> register and pieces are taken from the right. I've attempted a description
> below to see if it helps.
>
> I don't believe (in the MIPS case) we could unconditionally view the even
> numbered register as 64-bit or 32-bit as the shadowing onto the next
> register only exists in some hardware modes.
>
> The size of a register has to be determined from the current hardware mode
> and then the logic would be to read as much as possible from the referenced
> register and use it as the lower bits of the overall value. Then continue
> reading consecutive registers filling the next most significant bits
> until the full size of the DW_OP_piece has been read. This for MIPS
> FP registers is endian agnostic as the higher numbered register always
> has the most significant bits. For GPRs the even numbered register will
> provide either the most or least significant bits depending on endian but
> we have no reason to use this paradoxical DW_OP_piece for GPRs as they
> have compile time deterministic size.

Hm, so in the shadowed case, assuming that the DWARF consumer has loaded
the register file into a byte array via ptrace, which bytes would the
DW_OP_bit_piece offsets for FPR n correspond to?  Is it like this for
little-endian?

        FPR n            FPR n+1
  +---+---+---+---+ +---+---+---+---+
  |   |   |   |   | |   |   |   |   |
  +---+---+---+---+ +---+---+---+---+
   0   8   16  24    32  40  48  56

(In which case pieces would be taken from the left.)  Or different?  And
for big-endian?

--
Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-11 17:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 16:34 [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register Andreas Arnez
2016-01-16 13:27 ` Joel Brobecker
2016-01-18 16:00   ` Andreas Arnez
2016-01-25 22:01 ` Matthew Fortune
2016-01-26 11:57   ` Andreas Arnez
2016-02-11 12:18     ` Matthew Fortune
2016-02-11 17:04       ` Andreas Arnez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).