public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [000/nnn] poly_int: representation of runtime offsets and sizes
@ 2017-10-23 16:57 Richard Sandiford
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
                   ` (107 more replies)
  0 siblings, 108 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 16:57 UTC (permalink / raw)
  To: gcc-patches

This series adds support for offsets and sizes that are a runtime
invariant rather than a compile time constant.  It's based on the
patch posted here:

  https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00406.html

The rest of the covering note is split into:

- Summary   (from the message linked above)
- Tree representation
- RTL representation
- Compile-time impact
- Typical changes
- Testing


Summary
=======

The size of an SVE register in bits can be any multiple of 128 between
128 and 2048 inclusive.  The way we chose to represent this was to have
a runtime indeterminate that counts the number of 128 bit blocks above
the minimum of 128.  If we call the indeterminate X then:

* an SVE register has 128 + 128 * X bits (16 + 16 * X bytes)
* the last int in an SVE vector is at byte offset 12 + 16 * X
* etc.

Although the maximum value of X is 15, we don't want to take advantage
of that, since there's nothing particularly magical about the value.

So we have two types of target: those for which there are no runtime
indeterminates, and those for which there is one runtime indeterminate.
We decided to generalise the interface slightly by allowing any number
of indeterminates, although some parts of the underlying implementation
are still limited to 0 and 1 for now.

The main class for working with these runtime offsets and sizes is
"poly_int".  It represents a value of the form:

  C0 + C1 * X1 + ... + Cn * Xn

where each coefficient Ci is a compile-time constant and where each
indeterminate Xi is a nonnegative runtime value.  The class takes two
template parameters, one giving the number of coefficients and one
giving the type of the coefficients.  There are then typedefs for the
common cases, with the number of coefficients being controlled by
the target.

poly_int is used for things like:

- the number of elements in a VECTOR_TYPE
- the size and number of units in a general machine_mode
- the offset of something in the stack frame
- SUBREG_BYTE
- MEM_SIZE and MEM_OFFSET
- mem_ref_offset

(only a selective list).

The patch that adds poly_int has detailed documentation, but the main
points are:

* there's no total ordering between poly_ints, so the best we can do
  when comparing them is to ask whether two values *might* or *must*
  be related in a particular way.  E.g. if mode A has size 2 + 2X
  and mode B has size 4, the condition:

    GET_MODE_SIZE (A) <= GET_MODE_SIZE (B)

  is true for X<=1 and false for X>=2.  This translates to:

    may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true
    must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false

  Of course, the may/must distinction already exists in things like
  alias analysis.

* some poly_int arithmetic operations (notably division) are only possible
  for certain values.  These operations therefore become conditional.

* target-independent code is exposed to these restrictions even if the
  current target has no indeterminates.  But:

  * we've tried to provide enough operations that poly_ints are easy
    to work with.

  * it means that developers working with non-SVE targets don't need
    to test SVE.  If the code compiles on a non-SVE target, and if it
    doesn't use any asserting operations, it's reasonable to assume
    that it will work on SVE too.

* for target-specific code, poly_int degenerates to a constant if there
  are no runtime invariants for that target.  Only very minor changes
  are needed to non-AArch64 targets.

* poly_int operations should be (and in practice seem to be) as
  efficient as single-coefficient operations on non-AArch64 targets.


Tree representation
===================

The series uses a new POLY_INT_CST node to represent a poly_int value
at the tree level.  It is only used on targets with runtime sizes and
offsets; the associated test macro POLY_INT_CST_P is always false for
other targets.

The node has one INTEGER_CST per coefficient, which makes it easier
to refer to the same tree as a poly_wide_int, a poly_offset_int and
a poly_widest_int without copying the representation.

Only low-level routines use the tree node directly.  Most code uses:

- poly_int_tree_p (x)
    Return true if X is an INTEGER_CST or a POLY_INT_CST.

- wi::to_poly_wide (x)
- wi::to_poly_offset (x)
- wi::to_poly_widest (x)
    poly_int versions of the normal wi::to_wide etc. routines.  These
    work on both INTEGER_CSTs and POLY_INT_CSTs.

- poly_int_tree_p (x, &y)
    Test whether X is an INTEGER_CST or POLY_INT_CST and store its value
    in Y if so.  This is defined for Y of type poly_int64 and poly_uint64;
    the wi::to_* routines are more efficient than return-by-pointer for
    wide_int-based types.

- tree_to_poly_int64 (x)
- tree_to_poly_uint64 (x)
    poly_int versions of tree_to_shwi and tree_to_uhwi.  Again they work
    on both INTEGER_CSTs and POLY_INT_CSTs.

Many tree routines now accept poly_int operands, such as:

- build_int_cst
- build_int_cstu
- wide_int_to_tree
- force_fit_type


RTL representation
==================

The corresponding RTL representation is CONST_POLY_INT.  Again,
this is only used on targets with runtime sizes and offsets, with
the test macro CONST_POLY_INT_P returning false for other targets.

Since RTL does not have the equivalent of the tree-level distinction
between wi::to_wide, wi::to_offset and wi::to_widest, CONST_POLY_INT
just stores the coefficients directly as wide_ints, using the
trailing_wide_ints class for efficiency.  The main routines are:

- poly_int_rtx_p (x)
    Return true if X is CONST_SCALAR_INT_P or CONST_POLY_INT_P.

- wi::to_poly_wide (x, mode)
    Return the value of CONST_SCALAR_INT_P or CONST_POLY_INT_P X
    as a poly_wide_int.

- poly_int_rtx_p (x, &y)
    Return true if X is a CONST_INT or a CONST_POLY_INT,
    storing its value in Y if so.  This is defined only for Y of
    type poly_int64.  (poly_uint64 isn't much use for RTL,
    since constants have no inherent sign and are stored in sign-
    extended rather than zero-extended form.  wi::to_wide is more
    efficient than return-by-pointer when accessing an rtx as a
    poly_wide_int.)

- rtx_to_poly_int64 (x)
    A poly_int version of INTVAL, which works on both CONST_INT
    and CONST_POLY_INT.

- split_offset (x, &y)
    If X is a PLUS of X' a poly_int, store the poly_int in Y
    and return the X'.  Otherwise store 0 in Y and return X.

- split_offset_and_add (x, &y)
    If X is a PLUS of X' a poly_int, add the poly_int to Y
    and return the X'.  Otherwise leave Y alone and return X.

Many RTL routines now accept poly_int operands, such as:

- gen_int_mode
- trunc_int_for_mode
- plus_constant
- immed_wide_int_const


Compile-time impact
===================

The series seems to be compile-time neutral for release builds on
targets without runtime indeterminates, within a margin of about
[-0.1%, 0.1%].  Also, the abstraction of poly_int<1, X> is usually
compiled away.  E.g.:

  poly_wide_int
  foo (poly_wide_int x, tree y)
  {
    return x + wi::to_poly_wide (x);
  }

compiles to the same code as:

  wide_int
  foo (wide_int x, tree y)
  {
    return x + wi::to_wide (x);
  }

in release builds.  (I've tried various other combinations too.)


Typical changes
===============

Here's a table of the most common changes in the series.

----------------------------------------------------------------------
Before                                 After
----------------------------------------------------------------------
wi::to_wide (x)                        wi::to_poly_wide (x)
wi::to_offset (x)                      wi::to_poly_offset (x)
wi::to_widest (x)                      wi::to_poly_widest (x)
----------------------------------------------------------------------
unsigned HOST_WIDE_INT y;              poly_uint64 y;
if (tree_fits_uhwi_p (x))              if (poly_int_tree_p (x, &y))
  {                                      {
    x = tree_to_uhwi (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (tree_fits_shwi_p (x))              if (poly_int_tree_p (x, &y))
  {                                      {
    x = tree_to_shwi (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (cst_and_fits_in_hwi (x))           if (ptrdiff_tree_p (x, &y))
  {                                      {
    x = int_cst_value (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (CONST_INT_P (x))                   if (poly_int_rtx_p (x, &y))
  {                                      {
    x = INTVAL (x);
----------------------------------------------------------------------
if (offset < limit)                    if (must_lt (offset, limit))
  ...optimise...;                        ...optimise...;
----------------------------------------------------------------------
if (offset >= limit)                   if (may_ge (offset, limit))
  ...abort optimisation...;              ...abort optimisation...;
----------------------------------------------------------------------
if (offset >= limit)                   if (must_ge (offset, limit))
  ...treat as undefined...;              ...treat as undefined...;
----------------------------------------------------------------------
if (nunits1 == nunits2)                if (must_eq (nunits1, nunits2))
  ...treat as compatible...;             ...treat as compatible...;
----------------------------------------------------------------------
if (nunits1 != nunits2)                if (may_ne (nunits1, nunits2))
  ...treat as incompatible...;           ...treat as incompatible...;
----------------------------------------------------------------------
// Fold (eq op0 op1)                   // Fold (eq op0 op1)
if (op0 == op1)                        if (must_eq (op0, op1))
  ...fold to true...;                    ...fold to true...;
----------------------------------------------------------------------
// Fold (eq op0 op1)                   // Fold (eq op0 op1)
if (op0 != op1)                        if (must_ne (op0, op1))
  ...fold to false...;                   ...fold to false...;
----------------------------------------------------------------------


Testing
=======

Tested by compiling the testsuite before and after the series on:

    aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu arc-elf
    arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf
    cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf
    h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
    i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
    m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
    mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
    nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe
    powerpc-eabispe powerpc64-linux-gnu powerpc64le-linux-gnu
    powerpc-ibm-aix7.0 riscv32-elf riscv64-elf rl78-elf rx-elf
    s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
    sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
    xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
    x86_64-linux-gnu xtensa-elf

There were no differences in assembly output (except on
powerpc-ibm-aix7.0, where symbol names aren't stable).

Also tested normally on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linux-gnu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread